Back to Work
Hackathon 1st Place — Constructor Tech 2025

ClusterBuster

ML web app that discovers, labels, and compares topic clusters inside any CSV/XLSX dataset — using BERTopic embeddings and GPT-4o-mini coherence scoring across 6 dimensions.

ClusterBuster

The Problem

Analysts working with thousands of text responses (survey data, feedback, research corpora) have no scalable way to discover what topics exist, how coherent the clusters are, or how filtering thresholds affect data quality — without an ML background.

The Constructor Tech Hackathon Fall 2025 challenged teams to build data analysis tools that make ML accessible to non-technical users.

The Solution

Upload a file, click Analyze. The pipeline runs BERTopic in both Loose and Strict modes in parallel, evaluates every cluster on 6 LLM-scored coherence dimensions, and renders a side-by-side comparison dashboard.

Key features:

  • File Upload — Support for CSV/XLSX files up to 200MB with progress tracking
  • 3-Stage ML Pipeline — Preprocessing → BERTopic Clustering → LLM Analysis runs automatically
  • 6-Dimension Coherence Scoring — Overall, Semantic, Topical Focus, Lexical Cohesion, Informativeness, Outlier Presence
  • Interactive Dashboard — Bar charts, box plots, histograms, scatter plots with dark mode toggle

Technical Implementation

The backend is built with Python and Flask, using BERTopic with sentence-transformer embeddings for topic discovery. GPT-4o-mini generates topic labels and coherence scores for each cluster.

The pipeline runs in two parallel modes: Loose filtering (more topics, lower quality threshold) and Strict filtering (fewer topics, higher quality). This lets analysts see exactly what they gain or lose by tightening quality filters.

The frontend features an interactive dashboard with multiple chart types built using Matplotlib and Seaborn, with a dark mode toggle that persists via localStorage.

Results & Impact

1st

Place at Constructor Tech

6

Coherence Dimensions

200MB

Max File Size

First place at Constructor Tech Hackathon Fall 2025. Built as part of a team under competitive time constraints. The judges praised the practical value for researchers and analysts without ML backgrounds.

Project Details

Timeline

48 hours (Hackathon)

Role

Full-Stack Developer

Team

Team Project

Event

Constructor Tech Hackathon Fall 2025


Tech Stack

Python Flask BERTopic GPT-4o-mini Matplotlib Seaborn Tailwind CSS