Back

Sanskar

Keen Learner and Exp... • 1d

Day 27 of learning python as a beginner. Topic: web scraping using beautiful soup. A few days ago I got introduced to requests library in python which can scan the html from websites. At that time I was confused on what might be the real life implications of it that's when many amazing people guided me that most of its implications are in web scraping (something which I wasn't aware about then). Web scraping is essentially extracting data from websites (in html format) and then parsing it to extract useful information. There are mainly two libraries used for web scraping 1. Beautiful Soup and 2. Selenium some say Scrapy is also good for this purpose. I have focused on beautiful soap and was successful in scraping data of a real estate website. First I used requests and File I/O to save the html data (many people say that there's no need for it however I think that one should save the data first in order to avoid unexpected errors from website or to avoid repeat scraping when you want to extract more information from the same data). At first the website was forbidding me for scraping html data therefore I gave a time delay of 2 second because sending too many requests to the server is a common signal that I am scraping data. Then I used fake user agent to create a realistic user agent and manipulated browser header so that the request seem more legitimate. Once I got all the HTML data saved in a file I used Beautiful Soup to parse the data (Beautiful soup converts raw html into structured parse tree). I identified my goal as extracting the email and phone number (which I hid obviously) from the website and for this purpose I used regular expressions (regrex [I finally got some understanding of this]) because it helps me create patterns which can be used to identify the text which I require (email and phone number) although I created the pattern of email myself however took AI's help to design the pattern of phone number (it was a bit challenging for me). I have performed all this on a single website and in future I have plans to do this in bulk (I may require proxies for those to avoid IP ban) and then I can enter all that data in the database using PostgreSQL. I also have to learn Selenium because I believe it may also have its own implications (correct me if I am wrong). And here's my code and it's result.

Reply
1

More like this

Recommendations from Medial

Sanskar

Keen Learner and Exp... • 15h

Day 28 of learning python as a beginner. Topic: web scraping with postgreSQL database. When I posted my first web scraping project I just had the result on console however I wanted it to be stored somewhere where it can be reviewed later that's whe

See More
Reply
1
Image Description

Tuhin Subhra Biswas

Building in 🥷🏻• Pr... • 1y

As there are irregularities in the NEET 2024 exam results, in order to investigate this, Harkirat Bhaiya shared his 1Hour long session of scraping data from NEET RESULT website and building a web where users can enter their roll number, application n

See More
1 Reply
3
7

Comet

#freelancer • 1y

Day 2 HTML QUIZ Which HTML tag is used to create a horizontal line?

Reply
2

Gaurav

Investment Analyst |... • 7m

[This post has been deleted by the creator]

Reply
2

Ayush Maurya

AI Pioneer • 7m

"Synthetic Data" is used in AI and LLM training !! • cheap • easy to produce • perfectly labelled data ~ derived from the real world data to replicate the properties and characteristics of the rela world data. It's used in training an LLM (LLMs

See More
Reply
4
Image Description
Image Description

Pradnyesh Mahajan

Hey I am on Medial • 10m

Finding it hard to design a basic website even after learning HTML,CSS, Javascript. If Layout is pre-planned then to it's hard to put that into the website as per my need.Also we can spend our whole time learning HTML,CSS, javascript in depth. Any su

See More
4 Replies
6
Image Description
Image Description

rakhi choudhary

Hey I am on Medial • 1y

I have a ideas for save data for example lets say I am teacher so i am so busy in my day to day life I can't used my full in a day, I can't used my 1.5gb data in a day so i want to save data further used so is there any business idea on this.

4 Replies
1
2

Download the medial app to read full posts, comements and news.