Hướng dẫn control browser with python

In this article, we are going to see how to control the web browser with Python using selenium.Selenium is an open-source tool that automates web browsers. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, etc.

Nội dung chính Show

Initialization and Authorization
Scraping Data
Scraping Information from Practice tab

To install this module, run these commands into your terminal:

pip install selenium

For automation please download the latest Google Chrome along with chromedriver from here.

Here we will automate the authorization at “https://auth.geeksforgeeks.org” and extract the Name, Email, Institute name from the logged-in profile.

Initialization and Authorization

First, we need to initiate the web driver using selenium and send a get request to the url and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button.

To send the user given email and password to the input tags respectively:

driver.find_element_by_name('user').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)

Identify the button tag and click on it using the CSS selector via selenium webdriver:

driver.find_element_by_css_selector(‘button.btn.btn-green.signin-button’).click()

Scraping Data

Scraping Basic Information from GFG Profile

After clicking on Sign in, a new page should be loaded containing the Name, Institute Name, and Email id. Identify the tags containing the above data and select them.

container = driver.find_elements_by_css_selector(‘div.mdl-cell.mdl-cell–9-col.mdl-cell–12-col-phone.textBold’)

Get the text from each of these tags from the returned list of selected css selectors:

name = container[0].text
try:
    institution = container[1].find_element_by_css_selector('a').text
except:
    institution = container[1].text
email_id = container[2].text

Finally, print the output:

print({"Name": name, "Institution": institution, "Email ID": email})

Scraping Information from Practice tab

Click on the Practice tab and wait for few seconds to load the page.

driver.find_elements_by_css_selector('a.mdl-navigation__link')[1].click()

Find the container containing all the information and select the grids using CSS selector from the container having information.

container = driver.find_element_by_css_selector(‘div.mdl-cell.mdl-cell–7-col.mdl-cell–12-col-phone.whiteBgColor.mdl-shadow–2dp.userMainDiv’)
grids = container.find_elements_by_css_selector(‘div.mdl-grid’)

Iterate each of the selected grids and extract the text from it and add it to a set/list for output.

res = set()
for grid in grids:
    res.add(grid.text.replace('\n',':'))

Below is the full implementation:

Python3

from selenium import webdriver

import time

if __name__ == '__main__':

email = ''

password = 'password'

options = webdriver.ChromeOptions()

options.add_argument("--start-maximized")

options.add_argument('--log-level=3')

driver = webdriver.Chrome(executable_path="C:/chromedriver/chromedriver.exe",

chrome_options=options)

driver.set_window_size(1920,1080)

time.sleep(5)

driver.find_element_by_name('user').send_keys(email)

driver.find_element_by_name('pass').send_keys(password)

driver.find_element_by_css_selector(

'button.btn.btn-green.signin-button').click()

time.sleep(5)

container = driver.find_elements_by_css_selector(

'div.mdl-cell.mdl-cell--9-col.mdl-cell--12-col-phone.textBold')

name = container[0].text

try:

institution = container[1].find_element_by_css_selector('a').text

except:

institution = container[1].text

email_id = container[2].text

print("Basic Info")

print({"Name": name,

"Institution": institution,

"Email ID": email})

driver.find_elements_by_css_selector(

'a.mdl-navigation__link')[1].click()

time.sleep(5)

container = driver.find_element_by_css_selector(

'div.mdl-cell.mdl-cell--7-col.mdl-cell--12-col-phone.\

whiteBgColor.mdl-shadow--2dp.userMainDiv')

grids = container.find_elements_by_css_selector(

'div.mdl-grid')

res = set()

for grid in grids:

res.add(grid.text.replace('\n',':'))

print("Practice Info")

print(res)

driver.close()

driver.quit()

Output:

Hướng dẫn control browser with python

To install this module, run these commands into your terminal:

Initialization and Authorization

Scraping Data

Scraping Information from Practice tab

Below is the full implementation:

Python3

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội