Practical 1: Data Science: Web Scraping

Maharshi Relia
3 min readJul 30, 2021

18IT110 Maharshi Chetan Relia | Practical 1 Web Scraping

What is Web Scraping ?

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere.

Fig 1. Functionality

Why is Web Scraping Used?

Certain websites may include information that you are unable to copy and paste. Web scrapping may assist you in extracting any type of data you want.

Scrapping the web takes care of that as well. You may save web data in a format like CSV if you extract web data with the aid of a web scrapping tool. You’d be able to obtain, analyze, and utilize the data as you choose after that.

As a result, web scrapping streamlines the data extraction process, automates it, and provides simple access to the scraped data by storing it in CSV format.

Fig 2. Applications

Top 8 Web Scraping Tools

  • ParseHub.
  • Scrapy.
  • OctoParse.
  • Scraper API.
  • Mozenda.
  • Webhose.io.
  • Content Grabber.
  • Common Crawl.

Only 4steps to onboard with web scraping

We will be using the three different python libraries namely Selenium to automate browser activities, BeautifulSoup for parsing HTML and XML packages and Pandas for storing the data in the desired format.

Step 1: Find the URL you want to scrape.

For this project, we will scrape listings of Laptops and its data.

Fig. 3

Step 2: Inspect the page.

The data is usually nested in tags. So, we inspect the page to see, under which tag the data we want to scrape is nested. To inspect the page, just right click on the element and click on “Inspect”.

<div class=”_4rR01T”>ASUS VivoBook Ultra 14 Core i3 11th Gen — (8 GB/512 GB SSD/Windows 10 Home) X413EA-EB322TS Thin and Li…</div>

Fig. 4

Step 3: Find the data and extract

We will extract name and price for the following.

Step 4: Writing the code

Check out here: https://github.com/maharshirelia/data-science-web-scraping

Result:

Fig. 5 Results

In this project, we learned how to fetch data from websites using python libraries.

--

--

Maharshi Relia

IT Consultant | UI-UX Designer | Web Developer | SEO Analyst & Executive | Marketing Executive | Passionate Hotelier