Datasaur
Automated Statistical Analysis Engine
A comprehensive survey data platform that automates complex statistical workflows, from raw data aggregation to advanced hypothesis testing and visualization.
My Role
Lead Architect & Creator
Stack
Python, Flask, MongoDB, Pandas, SciPy
Impact
Automated Stat-Testing • Multi-Format ETL • Self-Hosted Architecture

View 1 of 6

Interactive Gallery — Select or swipe to explore
System Architecture Log
PROJECT LOG // ALGORITHMIC // STATISTICAL PROCESSING
The Engineering Story
Datasaur was born out of a necessity to bridge the gap between raw survey data and academic-grade statistical insights. The challenge wasn't just displaying data, but architecting a system capable of performing complex mathematical computations on-the-fly.
Statistical Automation Pipeline
The core of the application is a robust processing engine built on Pandas and SciPy. I implemented automated workflows for non-parametric tests like Kruskal-Wallis and Mann-Whitney U, ensuring that the platform could intelligently suggest and execute the correct statistical test based on the data distribution.
Data Visualization & Export
To translate these numbers into insights, I built a visualization layer supporting everything from standard histograms to complex Box and Whisker plots. Using XlsxWriter, I developed a custom export engine that allowed users to pull processed data directly into professional-grade spreadsheets with pre-formatted statistical summaries.
Infrastructure & Monolithic Integrity
The project follows a classic monolithic architecture, which proved highly efficient for keeping memory-intensive dataframes close to the processing logic. Today, the platform is self-hosted using a Caddy reverse proxy and MongoDB Atlas, demonstrating the longevity and stability of a well-architected Flask ecosystem.