WebScraping Project

Project Description

This is a personal web scraping and data extraction project built to deepen my understanding of data collection, web automation, and analysis workflows. Using Selenium, BeautifulSoup, and Pandas, I scraped the IMDb Top 250 movies list directly from IMDb’s official website.

The script uses Selenium to render dynamic content, extracts movie details such as title, release year, duration, and IMDb rating using BeautifulSoup, and exports the results to a CSV file using Pandas. This project helped solidify my understanding of HTML parsing, dynamic website structure, and automating data workflows using Python.

Why I Built This Project

I built this project to understand and practice web scraping using real-world, dynamic content. IMDb’s Top 250 movies list provided an ideal case study to work with complex HTML structures, dynamic page content, and structured data extraction.

This project was a hands-on opportunity to strengthen my skills in Selenium for web automation, BeautifulSoup for HTML parsing, and Pandas for data cleaning and export. By collecting data directly from a live website and organising it into a CSV, I gained practical experience in one of the most essential tasks for any data analyst — building datasets from raw online information.

Project Details

Project Details	Descriptions
Date	May 2025
Type	Personal Project
Tech Stack	Python (Selenium, BeautifulSoup, Pandas), Excel/CSV
Data Source	IMDb Top 250 Movies (https://www.imdb.com/chart/top)
Backend	Python script (main.py)
Frontend	N/A (script-based project)
Output	CSV file containing structured movie data
Fields Extracted	Movie Rank, Title, Year, Duration, IMDb Rating
Data Storage	Exported to imdb_top_250_movies.csv (in project folder)

Project Description

Why I Built This Project

Project Details

Screenshots

GitHub Link