DataZoom

Live project memory

Live project memory

Last updated 5/3/2026

DataZoom — Live Project Memory

Document ID: TECH-MEM-001 Version: Current Last Synthesized From: Repository snapshot + commit log through 960f88c Project Codename: Dataroom / DataZoom (midwestco/datazoom)


1. Project Identity & Current Phase

DataZoom is an enterprise AI-powered document analysis platform built for legal and business due diligence workflows. The product enables natural language querying over uploaded documents using a local-first RAG pipeline, with multi-tenant organization isolation via Clerk Auth. The codebase lives at midwestco/datazoom and ships as a Next.js 15 monorepo (product/) backed by Supabase (PostgreSQL + pgvector) and a Python-based ingest/embedding worker fleet.

Current lifecycle phase: Production-grade, post-alpha. The system has cleared alpha (#109 Refactor/alpha) and staging (#110 Staging) gates and is operating in a live environment. Active development is concentrated on infrastructure reliability (CI/CD pipeline fixes), two major feature tracks (cap-table and e-signature), and a newly launched cloud-worker GPU fleet on RunPod/Fly.


2. What Has Been Built

2.1 Core Platform (Stable)

AreaStatusKey Files
Document ingest + chunkingStabledocker/Dockerfile.worker, docker/docker-compose.module1-ports.yml
Vector embeddings (384-dim, all-MiniLM-L6-v2)Stablepgvector IVFFlat index, document_chunks.embedding VECTOR(384)
RAG retrieval + citation layerStableproduct/lib/__tests__/rag-retrieval.test.ts, product/lib/__tests__/citation-system.test.ts
AI chat interface (/context)Stableproduct/app/(app)/context/ (8 components)
Document management (/documents)Stableproduct/app/(app)/documents/
Timeline extraction + visualizationStabletimeline_events table, idx_timeline_date index
Multi-tenant auth (Clerk)Stableproduct/app/api/clerk/proxy/route.ts
Analytics (Mixpanel)Stableproduct/app/api/ai/track-interaction/route.ts, archived tracking plan in docs/archive/MIXPANEL_TRACKING_PLAN.md
Due diligence checklistsStableproduct/lib/due-diligence/__tests__/, /api/business-types/[typeKey]/checklist
Advisor routes (risk memo, strategic options)Stableproduct/app/api/advisor/ (5 routes)
Activity feed + unified calendarCompleteproduct/app/(app)/activity/, product/app/api/activity/unified/ (4 sub-routes)
Admin pipeline dashboardStableproduct/app/(app)/admin/pipeline/page.tsx, product/app/api/admin/pipeline/

2.2 Features Shipped in Recent Sprints

E-Signature System (BR-ESG-001)

  • Full action plan in docs/action_plans/e_sign/ (9 steps including database, Resend email, template designer, signature pad, signing portal, stamping engine)
  • Status file: docs/action_plans/e_sign/STATUS.md
  • Build fixes shipped in #107 Signature build fixes; initial signature PR #106 Abdullah/signatures closed in favor of refactored approach
  • Conversion attribution tests in product/lib/signature/__tests__/conversion-attribution.test.ts
  • PR #108 (open) extends template management with PDF preview and schema normalization

Cap Table System (BR-CAP-001)

  • 14-step action plan fully documented: docs/cap-table/action_plans/01 through 14
  • Schema, RLS, calculation engine, read APIs, UI shell, extraction pipeline, review queue (approve/reject), Modal endpoint, observability, backfill, E2E smoke, feature flag/rollback, manual transaction controls, and snapshot read-model optimization
  • Test coverage: product/lib/cap-table/__tests__/ (calc, feature-gate, fixture-corpus, observability)
  • API surface: 10 routes under /api/cap-table/ including extract, auto-populate, review/[id]/approve, review/[id]/reject, transactions/[id]/void
  • Model router tested at product/lib/__tests__/model-router.test.ts

Cloud GPU Worker Fleet (TECH-GPU-001)

  • RunPod + Fly.io orchestrator: fly/orchestrator/Dockerfile, docker/cloud-worker/Dockerfile, docker/cloud-worker/fly.toml
  • Pod warmup status tracking (provisioning → warming → ready): commit 580b80b
  • BullMQ job queue with Redis (Upstash TLS): b718932
  • Admin UI for cloud pipeline: /api/admin/pipeline/cloud, /api/admin/routing-stats

Activity Tracking System (BR-ACT-001)

  • Materialized views, activity logger library, per-event logging for upload/delete/processing/risk/strategic analysis
  • Unified calendar API: product/app/api/activity/unified/calendar, day, feed, refresh
  • Full implementation in PR #104

Collaboration WebSocket Service

  • product/collaboration-ws/Dockerfile — standalone WS service
  • product/app/api/collaboration/token/route.ts — token issuance
  • Shipped in #111 feat: realtime status, chat improvements, Cloudflare deployment

3. What Is In Progress

3.1 PR #101 — Upload Fixes (OPEN, BLOCKED)

  • Status: Open, no merge. Long-standing. Upload pipeline has known issues.
  • Impact: Affects document ingestion reliability. Unresolved while other features shipped around it.
  • Risk: High. If upload is broken for some file types or sizes, the entire RAG pipeline is upstream-blocked.

3.2 PR #108 — Template Management Enhancement (OPEN)

  • Scope: PDF preview in template designer + schema normalization for e-sign templates
  • Relates to: docs/action_plans/e_sign/03_template_designer_ui.md
  • Blocking: Production rollout of full e-sign flow

3.3 CI/CD Pipeline Stabilization (TECH-CI-001)

  • Last 6 commits are exclusively CI fixes:
    • 960f88c — crane copy tag mismatch + retry logic for Fly registry
    • 07c91b9 — cloud worker build moved to GitHub Actions, push to ghcr.io
    • a693949 — retry on cloud worker image build
    • 2c3145ddocker login for crane auth (Fly + ghcr.io)
  • The build pipeline for ghcr.io/midwestco/datazoom-base, datazoom-worker, datazoom-gpu was non-functional and is being repaired incrementally
  • Workflow files: .github/workflows/build-images.yml, .github/workflows/knip.yml

3.4 Business Type Profiles (BR-BTP-001)

  • 8-step action plan in docs/action_plans/business_type_fix/
  • API routes exist: /api/business-type-profiles/[profileKey], /api/business-types/[typeKey]/checklist
  • Final integration and validation steps appear in progress

4. Active Decisions and Rationale

TECH-DEC-001 — Modal (Cloud) as Primary LLM, Ollama as Local Fallback

Decision: Qwen2.5:32B on Modal is the default production LLM. Ollama (ollama/ollama:latest, port 11434) is an optional local fallback configurable via OLLAMA_HOST. Rationale: Modal provides serverless GPU scaling without managing hardware. Local Ollama path preserves the "100% local" privacy guarantee for self-hosted customers. Status: Active. Model routing logic is tested in product/lib/__tests__/model-router.test.ts. The OLLAMA_NUM_PARALLEL: "4" and OLLAMA_MAX_LOADED_MODELS: "3" settings are tuned for concurrent chat sessions.

TECH-DEC-002 — RunPod + Fly.io for GPU Worker Orchestration

Decision: GPU-intensive embedding/reranking workers run on RunPod fleet orchestrated via a Fly.io service. Rationale: RunPod provides cost-effective GPU capacity; Fly handles ingress and orchestration lifecycle. BullMQ on Upstash Redis queues work items. Status: Active but stabilizing. TLS ssl_cert_reqs=None workaround was needed for Upstash (b718932). Pod status is now derived from cloudStatus as single source of truth (commit 088f7b4). Known issue: Orchestrator requires 2048MB for performance CPU tier (2c3145d).

TECH-DEC-003 — Clerk for Multi-Tenant Auth, Not Supabase Auth

Decision: Clerk handles authentication and organization (tenant) isolation. A proxy route (/api/clerk/proxy) bridges Clerk to Supabase row-level security. Rationale: Clerk provides superior multi-org UX and enterprise SSO out of the box. Supabase Auth was evaluated but replaced. Status: Active. Admin endpoints (/api/admin/pipeline/cloud) use Clerk auth directly (d037000withOrgAuth was intentionally not used here).

TECH-DEC-004 — pgvector IVFFlat with 384-Dimensional Embeddings

Decision: all-MiniLM-L6-v2 (sentence-transformers) produces 384-dim vectors; IVFFlat index with lists=100 serves similarity queries. Rationale: 384 dimensions balances quality vs. storage/query speed for legal document chunking. IVFFlat is appropriate for the current data volume. Watch: As corpus scales, HNSW indexing may outperform IVFFlat. No migration plan exists yet.

TECH-DEC-005 — BullMQ Over Direct HTTP Dispatch for Worker Jobs

Decision: Document processing jobs are queued through BullMQ (Redis-backed) rather than direct HTTP calls to workers. Rationale: Decouples ingest rate from worker capacity; enables retry semantics and dead-letter queues. Previous direct-dispatch pattern (docs/archive/completed_action_plans/modal_migration_optimization/05_direct_dispatch.md) was superseded. Status: Active. See product/app/api/advisor/process-queue/route.ts.

TECH-DEC-006 — Cap Table Feature Gating

Decision: Cap table functionality ships behind a feature flag with documented rollback procedures. Evidence: product/lib/cap-table/__tests__/feature-gate.test.ts, action plan step 12 (docs/cap-table/action_plans/12_rollout_feature_flag_and_rollback.md). Status: Active.


5. Recent Changes and Their Impact

CommitChangeImpact
960f88cCrane tag mismatch fix + Fly registry retryCI image builds should now succeed reliably
b718932BullMQ Redis TLS ssl_cert_reqs=NoneFixes silent connection failures to Upstash in production
6c89d22Critical TLS + race condition fixes (full pipeline audit)Broad stability improvement; multiple production bugs closed
580b80bPod warmup status trackingOperators can now see provisioning → warming → ready progression; eliminates blind spots in admin UI
088f7b4RunPod status from cloudStatus (single source of truth)Eliminates race condition where two status sources disagreed
d037000Cloud proxy uses Clerk auth directlyAdmin endpoint security model corrected; withOrgAuth was inappropriate for this context
4e7cd14Orchestrator memory floor at 2048MBPrevents OOM crashes on performance CPU tier
#112 (merged)Add @tiptap/y-tiptap dep + fix TS build errorUnblocks collaboration editor build
#111 (merged)Realtime status + Cloudflare deploymentLive pipeline status in UI; Cloudflare CDN layer added
#104 (merged)Daily activity tracking + API integrationActivity feed and unified calendar fully operational

6. Open Questions and Unresolved Trade-offs

OQ-001 — Upload Pipeline Reliability (PR #101)

Question: What is the root cause of upload failures? File size limits, MIME type handling, Supabase Storage policy, or the ingest worker handoff? Impact: High. Core feature. Status: PR open, no resolution date.

OQ-002 — IVFFlat vs. HNSW as Corpus Grows

Question: At what document count does IVFFlat become a query-latency bottleneck and HNSW migration becomes necessary? Impact: Medium-term performance. No baseline established. No migration path documented.

OQ-003 — E-Sign Production Readiness

Question: Is the stamping engine (docs/action_plans/e_sign/06_stamping_engine_and_finalization.md) complete and tested at production quality? Status: PR #108 still open. Conversion attribution tests exist but stamping finalization is unclear.

OQ-004 — Billing/Wallet Integration Completeness

Question: Is the billing wallet (docs/billing_wallet/) integrated with cap-table and advisor AI usage costs? Evidence: WALLET_IMPLEMENTATION_STATUS.md exists but current integration state unknown. Modal wallet deduction was planned (docs/archive/completed_action_plans/modal_migration_optimization/06_wallet_deduction.md).

OQ-005 — knip.yml Dead Code Audit Findings

Question: What did the knip dead-code analysis (..github/workflows/knip.yml) find? Are there orphaned routes or components from refactors? Impact: Codebase hygiene; 500 files with 100 components warrants periodic pruning.

OQ-006 — Collaboration WebSocket Scaling

Question: The product/collaboration-ws/Dockerfile is a standalone service. What is its scaling model? Does it have affinity requirements with the main Next.js process? Not documented.

OQ-007 — Multi-Org Scaling Plan Execution State

Question: docs/MULTI_ORG_SCALING_PLAN.md exists. Which phases have been executed? Not cross-referenced in commit history.


7. Technical Debt Inventory

IDDescriptionLocationPriorityNotes
DEBT-001PR #101 upload fixes languishing openproduct/app/api/ upload pathP0Core feature; blocks new users
DEBT-002No HNSW migration plan for pgvectordocuments DB schemaP1Will become urgent at scale
DEBT-003ssl_cert_reqs=None TLS workaround for UpstashBullMQ Redis clientP1Security posture; proper cert validation should replace
DEBT-004Hardcoded 2048MB orchestrator memory floorfly/orchestrator/DockerfileP2Should be configurable env var
DEBT-005Superseded docs still in docs/archive/superseded/ (10+ files)docs/archive/superseded/P3Cognitive overhead; should be deleted
DEBT-006.DS_Store committed to root.DS_StoreP3.gitignore entry missing or bypassed
DEBT-007__pycache__/modal_llm.cpython-313.pyc committedroot __pycache__/P3Python cache in VCS
DEBT-008Test coverage: only 35 test files for 500-file codebase (7%)product/lib/__tests__/P2Activity tracking, advisor, and analysis routes appear untested
DEBT-009LARGE_PRODUCT_FILES.md in root — implies files that can't be normally trackedLARGE_PRODUCT_FILES.mdP2Investigate LFS or splitting strategy
DEBT-010Business type profiles action plan partially executeddocs/action_plans/business_type_fix/P28 steps documented; completion state unclear

8. Key Learnings From Recent Development Sessions

LEARN-001 — Cloud Infrastructure Requires Explicit Auth Wiring

The Fly registry and ghcr.io have different auth handshakes. crane copy failed silently when using the wrong credential source. Fix: explicit docker login before crane operations. Applied in 2c3145d.

LEARN-002 — Single Source of Truth for Async Status Is Non-Negotiable

RunPod status derived from two places produced a race condition that made the admin dashboard show stale data. Consolidating onto cloudStatus (commit 088f7b4) resolved it. Pattern to enforce: every async resource has exactly one authoritative status field.

LEARN-003 — BullMQ + Upstash TLS Requires ssl_cert_reqs=None in Python redis.asyncio

Upstash's managed Redis uses a certificate chain that redis.asyncio rejects by default. The workaround (b718932) disables cert verification. This is a known Upstash community issue; track upstream for a proper fix.

LEARN-004 — Admin Endpoints Should Not Use withOrgAuth

withOrgAuth enforces organization membership checks. Admin pipeline endpoints are cross-org by definition. Using Clerk auth directly without org scoping is correct for these routes (d037000).

LEARN-005 — Cap Table Requires Calibration Before Backfill

docs/cap-table/calibration_report.md and calibration_threshold_policy.md exist because running the extraction pipeline on real documents without calibrating extraction confidence thresholds produced noisy results. Calibration step (action plan 10) gates the backfill step (action plan 10).

LEARN-006 — Modal LLM Concurrency Increase Has Wallet Implications

Increasing Modal concurrency limits (docs/archive/completed_action_plans/modal_migration_optimization/07_concurrency_limit_increase.md) directly increases cost per organization. Wallet deduction must be wired before concurrency is opened up.


9. Architectural Evolution

Phase 1 — Single-Tenant RAG MVP

Initial architecture: single Supabase instance, all documents in one namespace, Ollama local LLM, Python ingest script. Documented in docs/archive/superseded/RAG_SYSTEM_OVERVIEW.md and docs/archive/superseded/README.md.

Phase 2 — Multi-Tenant + Modal Migration

Clerk auth added for org isolation. LLM migrated from local Ollama to Modal (Qwen2.5:32B) for production quality. Document type classification added (equity, ip_assignment, financial, healthcare, agreement). Mixpanel analytics wired. Documented in docs/archive/completed_action_plans/modal_migration_optimization/.

Phase 3 — Feature Expansion (Current)

Three parallel feature tracks landed simultaneously:

  1. Cap Table — 14-step implementation with extraction pipeline, review queue, and manual controls
  2. E-Signature — Full signing workflow with template designer, signature capture, and PDF stamping
  3. Activity System — Materialized-view-backed audit log with unified calendar

Cloud GPU worker fleet introduced (RunPod + Fly.io + BullMQ) to handle burst embedding workloads without blocking the Next.js API tier.

Phase 4 — Stabilization (In Progress)

CI/CD pipeline being hardened (last 6 commits). Upload reliability (PR #101) is the last major functional gap. Business type profiles completing. Collaboration WebSocket service added for real-time features.


10. Team Conventions and Patterns

Naming Conventions

  • Action plans: docs/action_plans/<feature>/00_overview...NN_step.md — numbered, sequential, with README.md and START_HERE.md entry points
  • Completed action plans archived to docs/archive/completed_action_plans/
  • API routes follow Next.js App Router file convention: product/app/api/<resource>/route.ts and product/app/api/<resource>/[id]/action/route.ts
  • Test files co-located: product/lib/__tests__/ for integration tests, product/lib/<module>/__tests__/ for unit tests

Git Workflow

  • Feature branches merge via PR; staging gate (#110) precedes production merges
  • Commit prefixes in use: feat:, fix:, ci:, docs:, refactor:
  • Husky hooks enforced at pre-commit and pre-push (.husky/pre-commit, .husky/pre-push)
  • Knip dead-code analysis runs on CI (.github/workflows/knip.yml)

Docker / Infra Conventions

  • COMPOSE_PROFILES pattern: full, gpu, worker, infra — explicit profile activation required
  • Shared logging config via YAML anchor x-logging: &default-logging (json-file, 50MB/5 files)
  • Shared env via x-env: &default-env pointing to .env.services
  • Images published to ghcr.io/midwestco/datazoom-{base,worker,gpu}

Code Patterns

  • Dynamic item routing via product/app/(app)/[type]/[id]/ with type-specific handlers (company-handler.tsx, document-handler.tsx, finding-handler.tsx, option-handler.tsx) and views — clean strategy pattern for polymorphic entity display
  • Provider pattern for page-level state: analysis-provider.tsx, documents-provider.tsx, activity-provider.tsx
  • Supabase client via @supabase/supabase-js ^2.95.3
  • UI components via shadcn/ui + @radix-ui/react-context-menu

11. Integration Points and Current Health

IntegrationPurposeHealthNotes
Supabase (PostgreSQL + pgvector)Primary datastore + vector search✅ Stablepgvector IVFFlat index active
Supabase StorageDocument file storage⚠️ SuspectPR #101 upload fixes open
ClerkAuth + multi-tenancy✅ StableProxy route at /api/clerk/proxy
Modal (Qwen2.5:32B)Production LLM inference✅ StableWallet deduction integration unclear
Ollama (local)Fallback LLM✅ AvailableOLLAMA_HOST env var; not required in production
BullMQ + Upstash RedisJob queue for ingest/embedding⚠️ TLS workaroundssl_cert_reqs=None in use
RunPod + Fly.ioGPU worker orchestration⚠️ StabilizingCI builds for cloud-worker image recently fixed
MixpanelProduct analytics✅ StableTrack-interaction route active
Resend (Email)E-sign notifications🔶 In progressAction plan step 2 documented
CloudflareCDN / edge✅ Recently added#111 merged
Collaboration WebSocketReal-time editing🔶 Newproduct/collaboration-ws/ — scaling model undocumented
SentryError tracking🔶 OptionalSENTRY_DSN listed as optional in .env.services.example

12. Performance Baselines and Optimization Opportunities

Known Baselines

  • Embedding dimensions: 384 (all-MiniLM-L6-v2) — fixed by model choice
  • IVFFlat lists: 100 — suitable for moderate corpus size
  • Ollama parallelism: 4 concurrent requests, max 3 loaded models
  • Orchestrator memory: 2048MB minimum on performance CPU tier

Optimization Opportunities

IDOpportunityEffortExpected Gain
PERF-001Migrate pgvector index from IVFFlat to HNSW at scaleMediumSub-10ms p99 similarity queries at >100k chunks
PERF-002Materialized view refresh scheduling for activity metrics (/api/activity/unified/refresh)LowEliminate cold query latency on activity feed
PERF-003Cap table snapshot read model (docs/cap-table/action_plans/14_snapshot_read_model_optimization.md)MediumAvoid full recalculation on every cap table read
PERF-004BullMQ batch advisor processing (/api/advisor/batch)Already builtConfirm batch size tuning against Modal concurrency limits
PERF-005Parallel queue processing (docs/archive/completed_action_plans/modal_migration_optimization/04_parallel_queue_processing.md)ImplementedValidate against current Modal wallet deduction rate
PERF-006OLLAMA_MAX_LOADED_MODELS: "3" — evaluate reducing to 2 if memory pressure observedLowReduce RAM usage per worker node

13. Immediate Priorities (Next Session Checklist)

  1. Resolve PR #101 — Diagnose and merge upload fixes. This is the highest-impact unblocked user-facing bug.
  2. Merge PR #108 — E-sign template PDF preview + schema normalization needed to close the e-sign feature track.
  3. Audit TLS workaround DEBT-003 — Determine if Upstash has released a proper cert bundle; remove ssl_cert_reqs=None.
  4. Verify CI green — Confirm build-images.yml produces valid image tags to ghcr.io after the crane/Fly fixes.
  5. Document collaboration-ws scaling model — Before the next load spike, define how many WebSocket instances run and whether they require sticky sessions.
  6. Close DEBT-006/007 — Remove .DS_Store and __pycache__/ from VCS; update .gitignore.
  7. Cross-reference billing wallet with cap-table and Modal — Ensure cost deduction fires before concurrency limit increases go live.

This document reflects the state of midwestco/datazoom as of commit 960f88c. It should be regenerated after each significant sprint or infrastructure change.