Back to web

Datasaur

Automated Statistical Analysis Engine

A comprehensive survey data platform that automates complex statistical workflows, from raw data aggregation to advanced hypothesis testing and visualization.

My Role

Lead Architect & Creator

Stack

Python, Flask, MongoDB, Pandas, SciPy

Impact

Automated Stat-Testing • Multi-Format ETL • Self-Hosted Architecture

Interactive Gallery — Select or swipe to explore

System Architecture Log

Traffic Flow
Service Node
graph LR subgraph Client_Layer [User Interface] A[Vanilla JS / Browser]:::traffic end subgraph Server_Layer [Application Logic] B[Caddy Reverse Proxy]:::node C[Flask / Python Monolith]:::node end subgraph Processing_Engine [Data Science Core] D[Pandas ETL]:::node E[SciPy / Pingouin Stats]:::node F[XlsxWriter Export]:::node end subgraph Storage [Data Persistence] G[MongoDB Atlas]:::node end A <-->|HTTPS| B B <-->|WSGI| C C <-->|Query/Write| G C ==>|Dataframes| D D --> E D --> F %% Styles %% classDef traffic fill:#2563eb,stroke:#3b82f6,color:#fff classDef node fill:#16a34a,stroke:#22c55e,color:#fff

PROJECT LOG // ALGORITHMIC // STATISTICAL PROCESSING

The Engineering Story

Datasaur was born out of a necessity to bridge the gap between raw survey data and academic-grade statistical insights. The challenge wasn't just displaying data, but architecting a system capable of performing complex mathematical computations on-the-fly.

Statistical Automation Pipeline

The core of the application is a robust processing engine built on Pandas and SciPy. I implemented automated workflows for non-parametric tests like Kruskal-Wallis and Mann-Whitney U, ensuring that the platform could intelligently suggest and execute the correct statistical test based on the data distribution.

Data Visualization & Export

To translate these numbers into insights, I built a visualization layer supporting everything from standard histograms to complex Box and Whisker plots. Using XlsxWriter, I developed a custom export engine that allowed users to pull processed data directly into professional-grade spreadsheets with pre-formatted statistical summaries.

Infrastructure & Monolithic Integrity

The project follows a classic monolithic architecture, which proved highly efficient for keeping memory-intensive dataframes close to the processing logic. Today, the platform is self-hosted using a Caddy reverse proxy and MongoDB Atlas, demonstrating the longevity and stability of a well-architected Flask ecosystem.