Glass Box Spreadsheet
The world is full of data. Gen AI is great at using it — but it's not magical. Structuring data in the middle is the real supercharger. This is a system designed to make every step visible, reversible, and collaborative.
The Big Picture
A Full Data Pipeline, End to End
Find & Ingest
Process & Aggregate
Analyze & Maintain
Each stage builds on the last — and every decision along the way is explainable, adjustable, and reversible. No black boxes. No surprises.
Core Principle
No magic. Visually explain every decision and step in every process.
Stage 1
Find Data
Users can upload unstructured or semi-structured data — PDFs, CSVs, emails, photos. An on-machine agent watches user activity and surfaces data opportunities like bank accounts or chat history.
Find Data: Discovery Controls
Discovery and ingestion are intentionally separated. Users control exactly what flows in — and what never does.
Flexible Ingestion
Upload alone has standalone value — the local agent is optional but powerful.
Local Approval Gate
Optional pre-upload approval step gives users full control before anything leaves the device.
Per-Source Rules
Purchases: auto-ingest. Work chats: manual approval. Sensitive content: never ingest. Users define the rules.
Discovery Feed
Surface wild-card data ideas for users to accept or reject — like a suggestion inbox for their data universe.
Stage 2
Ingest Data
Raw data gets shaped into structured tables. The system automatically groups related data, identifies what doesn't fit, designs a schema, and imports everything cleanly.
Group into tables
Related records are clustered together intelligently.
Identify outliers
What doesn't fit the pattern gets flagged immediately.
Design schema
Column types, names, and relationships are proposed automatically.
Import into schema
Data is loaded with full traceability back to source.
Stage 3
Review Data
The system shows its work. Every ingestion choice is explained with sample data, so users can see exactly what happened — and change it.
Explain Choices
Show why each field was mapped the way it was — no silent assumptions.
Sample Data Flow
Concrete examples trace each row through the ingestion pipeline.
Confidence Scores
Low-confidence signals (e.g. OCR, loosely-structured data) are surfaced explicitly.
User Override
Change the schema, remap columns, or reject the proposed structure entirely.
Stage 4
Derive Data
Add computed columns using spreadsheet formulas or AI extraction. Every derivation step is fully introspectable — sample data shown at each stage, outliers highlighted, enrichment suggested.
  • Intermediate step visibility — see values at every point in a formula chain
  • AI enrichment suggestions — previewed before committing
  • Correction zoom — fix one row, then decide if the fix should apply broadly
Stage 5
Associate Data
The system finds natural links between tables and shows matched records side by side — so you can see exactly how the join is made. AI describes what the relationship means and how it might be useful.
Stage 5 Detail
Fuzzy Matching — Expressed as Rules
Matching isn't always exact. When the system detects possible duplicates or near-matches, it surfaces a human-readable rule for approval.

"I think Apple and Apple Inc. are the same entity. Should I always ignore ' Inc.' suffixes when matching? You can apply this globally, refine it, or dismiss it."
Rules become visible, editable, and disableable — so matching logic is never hidden inside the engine.
Stage 6
Aggregate Data
Stack filters, sorts, and aggregations — then see intermediate results at every layer. Write-through support means bulk edits propagate to source rows where possible.
1
Add Aggregations
SUM, COUNT, GROUP BY — described in plain language by AI.
2
Stack Operations
Layer filters and sorts on top of each other, with intermediate snapshots.
3
Show Intermediate Results
See what the data looks like after each operation, not just the final output.
4
Write-Through Edits
Change all rows where X=Y to X=Z directly from the aggregation view.
Stage 7
Analyze Data
Ask anything in plain text. The system doesn't just return answers — it shows exactly how it got there: the SQL, the logic, the intermediate data, and the example rows at every step.
  • Interesting patterns, correlations, and trends surfaced proactively
  • SQL and code API interfaces for power users
  • AI uses SQL, code, and direct access — all explainable
  • Every answer is fully traceable to source data
End-to-End Data Lineage
Every top-level insight is traceable all the way back to the raw source — through every transformation, rule, and aggregation step. No answer exists without a full audit trail.
Stage 8
Maintain Data
Data goes stale. The system watches for it and warns you — and freshness propagates across joins and aggregations so you always know how current your analysis really is.
  • Freshness cutoff indicators on every table and aggregation
  • Heartbeat warnings when a source hasn't updated recently
  • Join freshness alignment — aggregates reflect the most recent shared cutoff
  • Suggested update paths — e.g., local agent to fetch new transaction data automatically
Core Principle
Always Safe to Change
New tables are cheap. History is always kept. No change is ever permanent. Rewind to any point, fork any operation, or point-revert a single step — without fear.
Reversibility & Safety — By Design
Rewind to Any Point
Change a decision mid-pipeline — e.g., "don't group these two things together" — and replay from there.
Destructive Changes Fork
Large or risky changes create a new table rather than overwriting. Iteration is cheap; regret is not.
No Irreversible Changes
Full visibility of every change, point reversion, and multi-level rewind built in from day one.
Before & After Always
Every suggestion or applied change shows a diff with sample data and outlier highlights. No surprises.
Transparency — The Full Principle Set
No Magic
Every decision is explained visually with sample data at every step.
Provide Options
When two reasonable approaches exist, do both and let the user choose.
Dual Interfaces
Structured UI and free-text input wherever possible — power users and beginners both served.
Suggestions Come With "Why"
Every AI suggestion explains the user value they'll get if they accept it.
AI & User — A True Partnership
The system is designed for collaborative intelligence — not AI replacing user judgment, but AI accelerating it.
AI Writes, User Iterates
AI produces SQL, Python, regex, and spreadsheet formulas. Users refine them. AI helps with the next iteration.
Pattern Memory
The system remembers user choices and suggests similar actions on new data and schemas.
Learn and Warn
If source data deviates from past patterns — new columns, different formats, outlier values — the system flags it before it causes downstream errors.
Learnings Become Rules
Every correction a user makes is an opportunity for the system to generalize. Learnings are surfaced explicitly — not buried in model weights.

Each rule is visible to the user, shown with sample data, and can be disabled or refined at any time. Rules are never silently applied.
The Glass Box Promise
Every step explained. Every decision reversible. Every path traceable. Structured data in the middle unlocks everything AI can do — and this system makes that structure safe, visible, and collaborative.
Transparent
No black boxes — ever.
Reversible
Nothing is permanent.
Collaborative
AI + user, together.
Traceable
End-to-end lineage always.