π§ Student Club Explorer
Student Club Explorer is a Python-based tool that explores and extracts information about student clubs from the University of Torontoβs club directory. It extracts club details, contact info, interests, and more β with support for keyword-based filtering.
π‘ Motivation
While working in an administrative role at the University of Toronto, my partner encountered a challenge: she needed to extract student club information β such as primary contact emails, origin campuses, and official website links β from the Student Organization Portal (SOP). She asked if there were any existing tools that could help with this task.
After understanding the requirements and evaluating the available options, it became clear that building a lightweight, customized script would be faster and more flexible than relying on a general-purpose tool. That's how Student Club Explorer came to life β a tool built to automate and streamline this data collection process.
π Intuition
The SOP provides a directory of student organizations, where each club is listed with a clickable name linking to its individual profile page. While the homepage displays basic information like the club name and campus, key details β such as the primary email address or external website β are only available on each clubβs individual page.
Upon inspection, these profile pages follow a recognizable structure: for instance, the contact email consistently appears under a "Contact" heading and before the next section like "Address". These structural patterns make the site highly amenable to HTML parsing and automation, making a custom scraper both practical and efficient for collecting structured data at scale.
π Features
- π Scrapes club names, campuses, websites, emails, and interest tags
- π Follows each clubβs individual profile page for deeper details
- π§ Filters clubs by interest keywords
- π€ Exports clean, UTF-8 encoded CSV files
- β³ Displays a real-time progress bar using
tqdm
π οΈ Technologies
- Python 3.11
requests
for HTTP requestsbeautifulsoup4
for HTML parsingtqdm
for progress barcsv
for output formatting
To see how specifically these libraries were used, feel free to visit the source code on GitHub.