Hướng dẫn control browser with python
In this article, we are going to see how to control the web browser with Python using selenium.Selenium is an open-source tool that automates web browsers. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, etc. To install this module, run these commands into your terminal:pip install selenium For automation please download the latest Google Chrome along with chromedriver from here. Here we will automate the authorization at “https://auth.geeksforgeeks.org” and extract the Name, Email, Institute name from the logged-in profile. Initialization and AuthorizationFirst, we need to initiate the web driver using selenium and send a get request to the url and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button. To send the user given email and password to the input tags respectively: driver.find_element_by_name('user').send_keys(email) driver.find_element_by_name('pass').send_keys(password) Identify the button tag and click on it using the CSS selector via selenium webdriver:
Scraping DataScraping Basic Information from GFG Profile After clicking on Sign in, a new page should be loaded containing the Name, Institute Name, and Email id. Identify the tags containing the above data and select them.
Get the text from each of these tags from the returned list of selected css selectors: name = container[0].text try: institution = container[1].find_element_by_css_selector('a').text except: institution = container[1].text email_id = container[2].text Finally, print the output: print({"Name": name, "Institution": institution, "Email ID": email}) Scraping Information from Practice tabClick on the Practice tab and wait for few seconds to load the page. driver.find_elements_by_css_selector('a.mdl-navigation__link')[1].click() Find the container containing all the information and select the grids using CSS selector from the container having information.
Iterate each of the selected grids and extract the text from it and add it to a set/list for output. res = set() for grid in grids: res.add(grid.text.replace('\n',':')) Below is the full implementation:Python3
Output: |