ClusterBuster
ML web app that discovers, labels, and compares topic clusters inside any CSV/XLSX dataset — using BERTopic embeddings and GPT-4o-mini coherence scoring across 6 dimensions.
The Problem
Analysts working with thousands of text responses (survey data, feedback, research corpora) have no scalable way to discover what topics exist, how coherent the clusters are, or how filtering thresholds affect data quality — without an ML background.
The Constructor Tech Hackathon Fall 2025 challenged teams to build data analysis tools that make ML accessible to non-technical users.
The Solution
Upload a file, click Analyze. The pipeline runs BERTopic in both Loose and Strict modes in parallel, evaluates every cluster on 6 LLM-scored coherence dimensions, and renders a side-by-side comparison dashboard.
Key features:
- File Upload — Support for CSV/XLSX files up to 200MB with progress tracking
- 3-Stage ML Pipeline — Preprocessing → BERTopic Clustering → LLM Analysis runs automatically
- 6-Dimension Coherence Scoring — Overall, Semantic, Topical Focus, Lexical Cohesion, Informativeness, Outlier Presence
- Interactive Dashboard — Bar charts, box plots, histograms, scatter plots with dark mode toggle
Technical Implementation
The backend is built with Python and Flask, using BERTopic with sentence-transformer embeddings for topic discovery. GPT-4o-mini generates topic labels and coherence scores for each cluster.
The pipeline runs in two parallel modes: Loose filtering (more topics, lower quality threshold) and Strict filtering (fewer topics, higher quality). This lets analysts see exactly what they gain or lose by tightening quality filters.
The frontend features an interactive dashboard with multiple chart types built using Matplotlib and Seaborn, with a dark mode toggle that persists via localStorage.
Results & Impact
1st
Place at Constructor Tech
6
Coherence Dimensions
200MB
Max File Size
First place at Constructor Tech Hackathon Fall 2025. Built as part of a team under competitive time constraints. The judges praised the practical value for researchers and analysts without ML backgrounds.
Project Details
Timeline
48 hours (Hackathon)
Role
Full-Stack Developer
Team
Team Project
Event
Constructor Tech Hackathon Fall 2025
Tech Stack