Hướng dẫn control browser with python

In this article, we are going to see how to control the web browser with Python using selenium.Selenium is an open-source tool that automates web browsers. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, etc.

To install this module, run these commands into your terminal:

pip install selenium

For automation please download the latest Google Chrome along with chromedriver from here.

Here we will automate the authorization at “//auth.geeksforgeeks.org” and extract the Name, Email, Institute name from the logged-in profile.

Initialization and Authorization

First, we need to initiate the web driver using selenium and send a get request to the url and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button.

To send the user given email and password to the input tags respectively:

driver.find_element_by_name['user'].send_keys[email]
driver.find_element_by_name['pass'].send_keys[password]

Identify the button tag and click on it using the CSS selector via selenium webdriver:

driver.find_element_by_css_selector[‘button.btn.btn-green.signin-button’].click[]

Scraping Data

Scraping Basic Information from GFG Profile

After clicking on Sign in, a new page should be loaded containing the Name, Institute Name, and Email id. Identify the tags containing the above data and select them.

container = driver.find_elements_by_css_selector[‘div.mdl-cell.mdl-cell–9-col.mdl-cell–12-col-phone.textBold’]

Get the text from each of these tags from the returned list of selected css selectors:

name = container[0].text
try:
    institution = container[1].find_element_by_css_selector['a'].text
except:
    institution = container[1].text
email_id = container[2].text

Finally, print the output:

print[{"Name": name, "Institution": institution, "Email ID": email}]

Scraping Information from Practice tab

Click on the Practice tab and wait for few seconds to load the page.

driver.find_elements_by_css_selector['a.mdl-navigation__link'][1].click[]

Find the container containing all the information and select the grids using CSS selector from the container having information.

container = driver.find_element_by_css_selector[‘div.mdl-cell.mdl-cell–7-col.mdl-cell–12-col-phone.whiteBgColor.mdl-shadow–2dp.userMainDiv’]
grids = container.find_elements_by_css_selector[‘div.mdl-grid’]

Iterate each of the selected grids and extract the text from it and add it to a set/list for output.

res = set[]
for grid in grids:
    res.add[grid.text.replace['\n',':']]

Below is the full implementation:

Python3

from selenium import webdriver

import time

if __name__ == '__main__':

email = ''

password = 'password'

options = webdriver.ChromeOptions[]

options.add_argument["--start-maximized"]

options.add_argument['--log-level=3']

driver = webdriver.Chrome[executable_path="C:/chromedriver/chromedriver.exe",

chrome_options=options]

driver.set_window_size[1920,1080]

time.sleep[5]

driver.find_element_by_name['user'].send_keys[email]

driver.find_element_by_name['pass'].send_keys[password]

driver.find_element_by_css_selector[

'button.btn.btn-green.signin-button'].click[]

time.sleep[5]

container = driver.find_elements_by_css_selector[

'div.mdl-cell.mdl-cell--9-col.mdl-cell--12-col-phone.textBold']

name = container[0].text

try:

institution = container[1].find_element_by_css_selector['a'].text

except:

institution = container[1].text

email_id = container[2].text

print["Basic Info"]

print[{"Name": name,

"Institution": institution,

"Email ID": email}]

driver.find_elements_by_css_selector[

'a.mdl-navigation__link'][1].click[]

time.sleep[5]

container = driver.find_element_by_css_selector[

'div.mdl-cell.mdl-cell--7-col.mdl-cell--12-col-phone.\

whiteBgColor.mdl-shadow--2dp.userMainDiv']

grids = container.find_elements_by_css_selector[

'div.mdl-grid']

res = set[]

for grid in grids:

res.add[grid.text.replace['\n',':']]

print["Practice Info"]

print[res]

driver.close[]

driver.quit[]

Output:

To install this module, run these commands into your terminal:

Initialization and Authorization

Scraping Data

Scraping Information from Practice tab

Below is the full implementation:

Python3

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề