🚀 Medial Secures Investment on Shark Tank India - Fueling the Future of Professional Social Networking. 🔥

News

Messages

Try our Valuation Calculator →

Back

Sanskar

Keen Learner and Exp... • 4m

Day 27 of learning python as a beginner. Topic: web scraping using beautiful soup. A few days ago I got introduced to requests library in python which can scan the html from websites. At that time I was confused on what might be the real life implications of it that's when many amazing people guided me that most of its implications are in web scraping (something which I wasn't aware about then). Web scraping is essentially extracting data from websites (in html format) and then parsing it to extract useful information. There are mainly two libraries used for web scraping 1. Beautiful Soup and 2. Selenium some say Scrapy is also good for this purpose. I have focused on beautiful soap and was successful in scraping data of a real estate website. First I used requests and File I/O to save the html data (many people say that there's no need for it however I think that one should save the data first in order to avoid unexpected errors from website or to avoid repeat scraping when you want to extract more information from the same data). At first the website was forbidding me for scraping html data therefore I gave a time delay of 2 second because sending too many requests to the server is a common signal that I am scraping data. Then I used fake user agent to create a realistic user agent and manipulated browser header so that the request seem more legitimate. Once I got all the HTML data saved in a file I used Beautiful Soup to parse the data (Beautiful soup converts raw html into structured parse tree). I identified my goal as extracting the email and phone number (which I hid obviously) from the website and for this purpose I used regular expressions (regrex [I finally got some understanding of this]) because it helps me create patterns which can be used to identify the text which I require (email and phone number) although I created the pattern of email myself however took AI's help to design the pattern of phone number (it was a bit challenging for me). I have performed all this on a single website and in future I have plans to do this in bulk (I may require proxies for those to avoid IP ban) and then I can enter all that data in the database using PostgreSQL. I also have to learn Selenium because I believe it may also have its own implications (correct me if I am wrong). And here's my code and it's result.

Recommendations from Medial

Sanskar

Keen Learner and Exp... • 4m

Day 28 of learning python as a beginner. Topic: web scraping with postgreSQL database. When I posted my first web scraping project I just had the result on console however I wanted it to be stored somewhere where it can be reviewed later that's whe

Anonymous

Hey I am on Medial • 1y

Wattana Panich a restaurant in Bangkok has been cooking and serving from the same batch of soup for over 45 years. The broth of the soup is preserved each night and used the next day to make more soup forming a "perpetual stew."

2 Replies

Anonymous

Hey I am on Medial • 1y

Elevate Your Business with Precision Web Scraping Are you drowning in data overload? Why Choose Web Scraping? * Market Research: Gather competitor data, industry trends, and consumer insights for strategic decision-making. * Price Monitoring:

2 Replies

Tuhin Subhra Biswas

Building in 🥷🏻• Pr... • 1y

As there are irregularities in the NEET 2024 exam results, in order to investigate this, Harkirat Bhaiya shared his 1Hour long session of scraping data from NEET RESULT website and building a web where users can enter their roll number, application n

1 Reply

Rahul Sharma

Investment Analyst |... • 11m

[This post has been deleted by the creator]

Anonymous

Hey I am on Medial • 1y

Hi Guys, I want to start a startup on data as a service company here I want to provide web scraping, data mining, data analytics and related to data stuff. I am facing issues planning the client approach, how to find and go to clients. Any sugge

1 Reply

Comet

#freelancer • 1y

Day 2 HTML QUIZ Which HTML tag is used to create a horizontal line?

Anwin Babu

Web Developer in Tra... • 3m

can you please help what are the data scraping tools that u guys use

Ayush Maurya

AI Pioneer • 11m

"Synthetic Data" is used in AI and LLM training !! • cheap • easy to produce • perfectly labelled data ~ derived from the real world data to replicate the properties and characteristics of the rela world data. It's used in training an LLM (LLMs

Nandha Reddy

Cyber Security | Blo... • 6m

💡 Ever Wondered Why Apps Like Rapido Might Incur Costs for Your Searches? Every time you open an app like Rapido and input your pickup and destination locations, the app communicates with external services to fetch route details, estimated times, a