Keen Learner and Exp... • 1d
Day 27 of learning python as a beginner. Topic: web scraping using beautiful soup. A few days ago I got introduced to requests library in python which can scan the html from websites. At that time I was confused on what might be the real life implications of it that's when many amazing people guided me that most of its implications are in web scraping (something which I wasn't aware about then). Web scraping is essentially extracting data from websites (in html format) and then parsing it to extract useful information. There are mainly two libraries used for web scraping 1. Beautiful Soup and 2. Selenium some say Scrapy is also good for this purpose. I have focused on beautiful soap and was successful in scraping data of a real estate website. First I used requests and File I/O to save the html data (many people say that there's no need for it however I think that one should save the data first in order to avoid unexpected errors from website or to avoid repeat scraping when you want to extract more information from the same data). At first the website was forbidding me for scraping html data therefore I gave a time delay of 2 second because sending too many requests to the server is a common signal that I am scraping data. Then I used fake user agent to create a realistic user agent and manipulated browser header so that the request seem more legitimate. Once I got all the HTML data saved in a file I used Beautiful Soup to parse the data (Beautiful soup converts raw html into structured parse tree). I identified my goal as extracting the email and phone number (which I hid obviously) from the website and for this purpose I used regular expressions (regrex [I finally got some understanding of this]) because it helps me create patterns which can be used to identify the text which I require (email and phone number) although I created the pattern of email myself however took AI's help to design the pattern of phone number (it was a bit challenging for me). I have performed all this on a single website and in future I have plans to do this in bulk (I may require proxies for those to avoid IP ban) and then I can enter all that data in the database using PostgreSQL. I also have to learn Selenium because I believe it may also have its own implications (correct me if I am wrong). And here's my code and it's result.
Keen Learner and Exp... • 15h
Day 28 of learning python as a beginner. Topic: web scraping with postgreSQL database. When I posted my first web scraping project I just had the result on console however I wanted it to be stored somewhere where it can be reviewed later that's whe
See MoreBuilding in 🥷🏻• Pr... • 1y
As there are irregularities in the NEET 2024 exam results, in order to investigate this, Harkirat Bhaiya shared his 1Hour long session of scraping data from NEET RESULT website and building a web where users can enter their roll number, application n
See MoreHey I am on Medial • 10m
Finding it hard to design a basic website even after learning HTML,CSS, Javascript. If Layout is pre-planned then to it's hard to put that into the website as per my need.Also we can spend our whole time learning HTML,CSS, javascript in depth. Any su
See MoreDownload the medial app to read full posts, comements and news.