DataZoom

Context engine

Context engine

Last updated 5/3/2026

DataZoom Context Engine

Document ID: TECH-CTX-001 Version: 1.0 Product: DataZoom (midwestco/datazoom) Classification: Internal Engineering — AI Systems Architecture


Table of Contents

  1. Context Architecture Overview
  2. Domain Profile
  3. Knowledge Graph Structure
  4. Prompt Engineering Patterns
  5. Memory Management
  6. Context Density Strategies
  7. Agent Context Protocols
  8. RAG Pipeline as Context Infrastructure
  9. Cross-Document ID Reference Map

1. Context Architecture Overview

DataZoom manages AI context across three distinct layers: vector-indexed document memory (pgvector), structured relational state (PostgreSQL + Supabase), and session-scoped conversation threads stored in the conversations and messages tables. These layers are not isolated — they compose at query time into a unified context payload that is passed to the LLM.

1.1 Layer Map

┌────────────────────────────────────────────────────────────────┐
│  LAYER 1 — Session Context (Ephemeral)                        │
│  Tables: conversations, messages                               │
│  Scope: Per-user, per-organization, per-thread                │
│  Retrieval: Direct DB lookup by conversation_id               │
└────────────────────────┬───────────────────────────────────────┘
                         │ combined at query time
┌────────────────────────▼───────────────────────────────────────┐
│  LAYER 2 — Semantic Document Memory (Persistent)              │
│  Tables: document_chunks (embedding VECTOR(384))              │
│  Index: ivfflat, cosine ops, 100 lists                        │
│  Retrieval: pgvector similarity search via /api/.../query     │
└────────────────────────┬───────────────────────────────────────┘
                         │ filtered by org + document scope
┌────────────────────────▼───────────────────────────────────────┐
│  LAYER 3 — Structured Knowledge (Persistent)                  │
│  Tables: documents, timeline_events, cap_table transactions   │
│  Retrieval: SQL join, metadata filter, GIN index on parties[] │
└────────────────────────────────────────────────────────────────┘

1.2 Context Assembly Flow

When a user submits a query through the chat interface (product/app/(app)/context/page.tsx), the following sequence assembles context before any LLM call is made:

  1. Thread resolution — The active conversation thread is identified via the thread selector (product/app/(app)/context/components/thread-selector.tsx). Prior messages in the thread constitute the short-term conversational context.

  2. Semantic retrieval — The query is embedded using sentence-transformers (all-MiniLM-L6-v2, 384-dimensional vectors). The embedding is used to perform a cosine similarity search against document_chunks.embedding within the organization's document scope.

  3. Structured fact injection — For queries involving dates, parties, or equity events, the timeline_events table and documents.parties GIN index augment the semantic results with structured facts. This prevents hallucination of known factual records.

  4. Context packet construction — Retrieved chunks, structured facts, and the conversation thread are concatenated into a single context payload. The model router (product/lib/__tests__/model-router.test.ts) selects the appropriate LLM backend (Modal/Qwen2.5:32B for cloud; Ollama locally) based on query complexity and GPU availability.

  5. LLM dispatch — The assembled packet is sent to either the Modal cloud endpoint or the local Ollama service (port 11434). The LLM proxy service runs on port 8001 inside the datazoom-base container and mediates all model calls.

1.3 Tenant Isolation in Context

Every context retrieval operation is scoped to the authenticated organization via Clerk auth. The /api/product/app/api/clerk/proxy/route.ts endpoint validates org membership before any document or chunk access. This means two tenants sharing identical document filenames will never receive cross-contaminated context — the pgvector query filters on organization_id before computing similarity.


2. Domain Profile

2.1 Product Domain

DataZoom operates in the legal and business due diligence domain. The primary use case is natural language analysis of legal documents — equity agreements, IP assignments, healthcare contracts, financial instruments — to surface facts, risks, and strategic options without requiring legal expertise from the user.

2.2 Key Entities

Entity IDEntity NameStorage LocationDescription
ENT-001Documentdocuments tableRoot entity. Carries document_type, parties[], key_terms[], summary, full_text
ENT-002Document Chunkdocument_chunks tableSegmented unit of a document with a 384-dim embedding
ENT-003Timeline Eventtimeline_events tableExtracted date-anchored event with impact classification
ENT-004Conversation Threadconversations tableA named context thread scoped to an org and topic
ENT-005Messagemessages tableIndividual turn within a conversation, user or assistant
ENT-006OrganizationClerk + Supabase RLSTenant boundary; all entities are org-scoped
ENT-007Cap Table Transactioncap_table/transactionsEquity ownership record with approval workflow
ENT-008Due Diligence Checklist/api/business-types/[typeKey]/checklistBusiness-type-specific document requirement list
ENT-009Findingfinding-handler.tsx, finding-view.tsxAI-surfaced risk or notable clause from document analysis
ENT-010Strategic Option/api/advisor/strategic-optionsLLM-generated strategic recommendation with risk memo

2.3 Document Type Taxonomy

The documents.document_type field constrains classification to a controlled vocabulary defined in docs/MASTER_DOCUMENT_TYPES_CATALOG.md. Valid values observed in the schema:

  • equity — Equity agreements, stock purchase agreements, SAFEs
  • ip_assignment — Intellectual property transfer documents
  • financial — Financial statements, term sheets
  • healthcare — Healthcare-specific compliance and service agreements
  • agreement — General commercial contracts

Document type drives checklist selection (/api/business-types/[typeKey]/checklist/route.ts) and influences which prompt template is applied during analysis.

2.4 Domain Terminology Glossary

TermDefinition in DataZoom Context
RAGRetrieval-Augmented Generation — the core pattern for grounding LLM answers in uploaded documents
ChunkA semantically bounded segment of a document, stored in document_chunks with an embedding
EmbeddingA 384-dimensional float vector produced by all-MiniLM-L6-v2 representing semantic content
CitationA traceable reference linking an LLM answer back to a specific chunk; enforced by citation-system.test.ts
Due Diligence (DD)Structured review process using business-type-specific checklists
FindingAn AI-identified risk, gap, or notable clause surfaced during document analysis
Cap TableCapitalization table tracking equity ownership; managed through a dedicated extraction and review pipeline
ThreadA named conversation context scoping related questions to a topic or document set
Strategic OptionAn LLM-generated recommendation produced by the advisor pipeline (/api/advisor/strategic-options)
Risk MemoFormal risk summary generated by /api/advisor/risk-memo
ivfflatPostgreSQL index type (Inverted File Flat) used for approximate nearest-neighbor vector search

3. Knowledge Graph Structure

DataZoom's knowledge graph is implicit — encoded in relational foreign keys, GIN-indexed arrays, and vector similarity — rather than a dedicated graph database. The following describes how documents, IDs, and cross-references form a queryable knowledge structure.

3.1 Entity Relationship Graph

Organization (Clerk org_id)
    │
    ├── Document [ENT-001] (UUID)
    │       ├── document_type (controlled vocab)
    │       ├── parties[] (GIN-indexed TEXT[])
    │       ├── key_terms[] (GIN-indexed TEXT[])
    │       ├── summary (LLM-generated)
    │       │
    │       ├── Document Chunk [ENT-002] (UUID, FK → document_id)
    │       │       ├── content (TEXT)
    │       │       ├── embedding (VECTOR(384))
    │       │       └── metadata (JSONB)
    │       │
    │       └── Timeline Event [ENT-003] (UUID, FK → document_id)
    │               ├── event_date (DATE)
    │               ├── event_type (controlled vocab)
    │               ├── parties_involved (TEXT[])
    │               └── impact (critical|high|medium|low)
    │
    ├── Conversation Thread [ENT-004]
    │       └── Message [ENT-005] (ordered turns)
    │
    ├── Cap Table Transaction [ENT-007]
    │       └── Review Record (approve/reject workflow)
    │
    └── Due Diligence Checklist [ENT-008]
            └── Finding [ENT-009] (linked to document chunks)

3.2 Cross-Reference Mechanisms

Party-based cross-document linking. The documents.parties field is a GIN-indexed TEXT[]. A query like WHERE parties && ARRAY['Acme Corp'] retrieves all documents mentioning a given party, enabling cross-document analysis of a single entity's obligations and history without an explicit graph join.

Date-anchored event correlation. The timeline_events table indexed by event_date allows temporal correlation across documents. The idx_timeline_type index supports filtering by event_type, enabling queries such as "all equity changes in Q1 2024 across all uploaded documents."

Semantic proximity graph (implicit). Documents whose chunks have high cosine similarity to a query vector form an implicit neighborhood. The ivfflat index with 100 lists provides approximate retrieval across this neighborhood at query time. This is the primary mechanism for multi-document synthesis.

Checklist-to-document linkage. The /api/business-types/[typeKey]/checklist endpoint maps a typeKey (e.g., equity, healthcare) to a structured requirement list. Findings (ENT-009) produced during analysis are linked back to specific document chunks via the citation system (tested in product/lib/__tests__/citation-system.test.ts), completing the chain from checklist requirement → document evidence → cited finding.

3.3 Documentation Knowledge Graph

The docs/ directory itself constitutes a human-readable knowledge graph with structured cross-references. Key nodes:

NodePathFunction
Documentation Indexdocs/DOCUMENTATION_INDEX.mdRoot index; entry point for all documentation
RAG Systemdocs/RAG_SYSTEM.mdCanonical reference for retrieval architecture
AI Servicesdocs/AI_SERVICES.mdModel configuration, Modal vs. Ollama routing
Master Document Typesdocs/MASTER_DOCUMENT_TYPES_CATALOG.mdControlled vocabulary for document classification
Action Plansdocs/action_plans/Structured implementation plans with status tracking
Archivedocs/archive/Superseded documents preserved for historical context

Action plans follow a numbered sequence (00_, 01_, ...) within each feature area (e.g., docs/cap-table/action_plans/, docs/activity_page/action_plans/) and cross-reference each other via START_HERE.md files that serve as context entrypoints for any agent beginning work in that domain.


4. Prompt Engineering Patterns

4.1 RAG Grounding Pattern

The foundational prompt pattern enforces that every LLM response is grounded in retrieved document content. Based on the citation enforcement work documented in docs/archive/completed_action_plans/document_refinement_v1/07_citation_enforcement_COMPLETE.md, prompts follow this structure:

SYSTEM:
You are a legal document analysis assistant. Answer questions using ONLY the 
provided document context. For every factual claim, cite the source chunk using 
[Document: {filename}, Chunk: {chunk_index}] notation. If the answer cannot be 
found in the provided context, state that explicitly — do not speculate.

CONTEXT:
[Retrieved chunks injected here — ordered by cosine similarity score]

CONVERSATION HISTORY:
[Prior messages in the active thread]

USER QUERY:
{user_question}

Citation enforcement is validated by product/lib/__tests__/citation-system.test.ts, ensuring responses that lack source references are flagged rather than surfaced to users.

4.2 Document Summary Pattern

During ingestion, each document receives an LLM-generated summary stored in documents.summary. This is a one-shot extraction prompt (not RAG), applied to the full document text:

SYSTEM:
Extract a structured summary from the following legal/business document. 
Identify: document type, all named parties, effective date, key terms, 
and 3-5 material obligations or provisions.

DOCUMENT TEXT:
{full_text}

OUTPUT FORMAT: JSON with fields: type, parties[], effective_date, key_terms[], summary_prose

The structured output populates documents.document_type, documents.parties, documents.key_terms, and documents.summary atomically.

4.3 Timeline Extraction Pattern

The timeline pipeline (idx_timeline_type, idx_timeline_date indexes) is fed by a structured extraction prompt:

SYSTEM:
You are extracting a chronological event list from a legal document.
For each event you identify, output: date (ISO 8601), event_type 
(equity_change|agreement_signed|ip_assignment|other), description, 
parties_involved[], and impact (critical|high|medium|low).

DOCUMENT:
{full_text}

Return a JSON array of events. If a date is approximate, use the first day 
of the applicable month or year.

4.4 Strategic Advisor Pattern

The advisor pipeline (/api/product/app/api/advisor/route.ts, /api/advisor/strategic-options/route.ts, /api/advisor/risk-memo/route.ts) uses a multi-stage pattern documented in docs/archive/completed_action_plans/modal_migration_optimization/03_single_stage_strategic_options.md:

STAGE 1 — Risk Identification:
Given the following document findings and timeline, identify the top risks 
facing this organization. Classify each risk by: category, severity, 
likelihood, and affected parties.

STAGE 2 — Strategic Options Generation:
For each identified risk, generate 2-3 strategic options. For each option, 
specify: action, owner, timeline, cost estimate, and expected outcome.

STAGE 3 — Risk Memo Synthesis:
Synthesize the risks and strategic options into an executive risk memo 
suitable for a board or investor audience.

The batch endpoint (/api/advisor/batch/route.ts) and queue processor (/api/advisor/process-queue/route.ts) manage parallel execution of this pipeline across multiple documents.

4.5 Due Diligence Matching Pattern

The DD match panel (product/app/(app)/[type]/[id]/views/dd-match-panel.tsx) uses a checklist-grounded prompt:

SYSTEM:
You are performing due diligence against the following checklist requirements.
For each requirement, assess: present (yes/no/partial), evidence (chunk citation), 
and gap_severity (critical|major|minor|none).

CHECKLIST:
{business_type_checklist}

DOCUMENT CONTEXT:
{retrieved_chunks}

4.6 Clause Comparison Pattern

The clause comparison endpoints (/api/clauses/compare/route.ts, /api/clauses/compare/[id]/route.ts) apply a diff-style prompt:

Compare the following two clause texts. Identify: 
(1) substantive differences in obligations or rights,
(2) missing protections present in Clause A but absent in Clause B,
(3) risk delta — which version is more favorable to {party} and why.

CLAUSE A: {clause_a_text}
CLAUSE B: {clause_b_text}

4.7 Question Templates

Pre-built question templates (product/app/(app)/context/components/question-templates.tsx) lower the prompt engineering burden for end users by providing domain-appropriate starting queries:

  • "What are the key obligations of [Party] under this agreement?"
  • "Identify all equity transfer events and their effective dates."
  • "Are there any non-compete clauses? What are their terms?"
  • "Summarize the IP assignment provisions across all uploaded documents."

These templates are not static strings — they are parameterized by the current document scope and active parties, injecting entity context before the user even types.


5. Memory Management

5.1 Session Context Window

DataZoom manages LLM context windows explicitly because the Qwen2.5:32B model (deployed via Modal) and local Ollama models have bounded context sizes. The conversation thread in product/app/(app)/context/components/conversation-panel.tsx does not naively append all prior messages. Instead, the following strategy applies:

Recent-first truncation. The context assembly includes the most recent N messages from the thread, where N is bounded by a token budget. Older messages are excluded once the budget is exceeded.

Decision log persistence. The product/app/(app)/context/components/decision-log.tsx and save-decision-form.tsx components allow users to explicitly persist key decisions from a conversation. These saved decisions are injected into future context packets as high-priority facts, surviving beyond the rolling message window. This is the primary cross-session memory mechanism.

Suggested follow-ups. After each LLM response, the product/app/(app)/context/components/suggested-followups.tsx component surfaces continuations. These are generated with awareness of the current thread context and help maintain coherent inquiry chains without requiring the user to re-establish context manually.

5.2 Cross-Session Persistence

MechanismStoragePersistenceNotes
Conversation threadsconversations tableIndefiniteNamed, retrievable by thread selector
Saved decisionsdecision_log (inferred)IndefiniteExplicitly preserved by user action
Document summariesdocuments.summaryIndefiniteGenerated once at ingestion; reused in all subsequent context
Timeline eventstimeline_events tableIndefiniteExtracted once; available as structured facts in all queries
Chunk embeddingsdocument_chunks.embeddingIndefinitePersisted vectors; no re-embedding needed per query

5.3 Summarization Strategy

DataZoom avoids re-summarizing documents on every query. The documents.summary field is populated once during the ingestion pipeline and reused. This summary, combined with key_terms[] and parties[], constitutes a compressed document representation that can be included in context without loading the full full_text field.

For long conversation threads that exceed the context window, the system relies on the Decision Log as a manual summarization mechanism rather than automatic thread compression. This trades automation for accuracy — users confirm what is worth preserving rather than trusting an automated summarizer to select salient points.

5.4 Embedding Lifecycle

Embeddings are generated by sentence-transformers (all-MiniLM-L6-v2) during document ingestion by the worker service (docker/Dockerfile.worker). The 384-dimensional vectors are stored in document_chunks.embedding and indexed via ivfflat. Re-embedding is triggered only when document content changes, controlled by the analysis regeneration endpoint (/api/analysis/regenerate/route.ts). Embedding status monitoring was historically tracked in docs/archive/check_embedding_status.sql.


6. Context Density Strategies

Context density — the ratio of useful signal to total tokens in an LLM context packet — is a first-class concern in DataZoom's architecture, given the cost and latency of operating Qwen2.5:32B via Modal.

6.1 Chunk-Level Density

Document chunks in document_chunks are sized to balance retrieval precision against context efficiency. The all-MiniLM-L6-v2 model produces meaningful embeddings for chunks of 128–512 tokens. Chunks that are too short lack semantic coherence; chunks that are too long dilute the embedding's specificity. The chunk_index integer tracks position within the document, enabling the retrieval layer to include adjacent chunks (context window expansion) when a single chunk is insufficient.

The product/lib/__tests__/rag-retrieval-enhanced.test.ts and product/lib/__tests__/rag-retrieval.test.ts test files validate that retrieval returns the highest-signal chunks for a given query, not merely the most similar ones by raw cosine distance.

6.2 Metadata Pre-filtering

Before cosine similarity search, the vector query is pre-filtered using structured metadata:

  • Document type filter — If the query context implies a specific document type (e.g., cap table questions filter to document_type = 'equity'), only chunks from matching documents are searched.
  • Party filter — Queries mentioning a specific party name can pre-filter on documents.parties using the GIN index, reducing the search space before vector comparison.
  • Date range filter — Timeline-anchored queries can restrict to documents with effective_date within a range.

This pre-filtering is a critical density strategy: it ensures the top-K retrieved chunks are drawn from the most relevant document subset rather than the entire corpus, preventing the context window from being filled with topically adjacent but contextually irrelevant content.

6.3 Structured Fact Injection vs. Raw Text

Where a structured fact is available (e.g., a cap table transaction record, a timeline event with explicit event_date and parties_involved), DataZoom injects the structured record rather than the raw chunk text. A structured fact like:

{
  "event_date": "2024-03-15",
  "event_type": "equity_change",
  "description": "Series A closing: 2,000,000 shares issued to Acme Ventures",
  "parties_involved": ["Acme Ventures", "DataZoom Inc."],
  "impact": "critical"
}

...consumes far fewer tokens than the equivalent clause in the original agreement, while conveying the same factual content. The LLM's task becomes synthesis and explanation rather than extraction, improving both efficiency and accuracy.

6.4 Summary-First Context

When a query is broad (e.g., "Give me an overview of all documents"), the context packet leads with documents.summary fields rather than raw chunks. This allows the LLM to synthesize across many documents without exhausting the context window on full-text retrieval. The strategic overview component (product/app/(app)/context/components/strategic-overview.tsx) specifically uses this pattern.

6.5 Model Routing for Density Efficiency

The model router (product/lib/__tests__/model-router.test.ts) selects between the cloud LLM (Modal/Qwen2.5:32B) and local Ollama based on query complexity. Simple factual queries (date lookups, party identification) are routed to lighter local models with smaller context requirements. Complex multi-document synthesis or strategic analysis is routed to the 32B cloud model. This ensures context density optimization is paired with appropriate model capacity.


7. Agent Context Protocols

DataZoom operates multiple specialized agents, each receiving a different context protocol.

7.1 Agent Inventory

Agent IDAgent NameEntry PointContext Protocol
AGT-001Document Ingestion Agentdocker/Dockerfile.workerFull document text + no prior context
AGT-002Embedding AgentWorker service, GPU profileRaw chunk text → vector output
AGT-003Chat Agent/api/context/ (inferred)Thread history + retrieved chunks + structured facts
AGT-004Advisor Agent/api/advisor/route.tsDocument summaries + findings + checklist gaps
AGT-005Cap Table Extraction Agent/api/cap-table/extract/route.tsEquity document chunks + extraction schema
AGT-006Timeline Extraction Agenttimeline_events pipelineFull document text + temporal extraction prompt
AGT-007Clause Comparison Agent/api/clauses/compare/route.tsTwo clause texts + party context
AGT-008Due Diligence Agent/api/business-types/[typeKey]/checklistChecklist + document chunks
AGT-009Analysis/Party Agent/api/analysis/party/route.tsParty-filtered document set + analysis prompt

7.2 Context Handoff Between Agents

Ingestion → Embedding (AGT-001 → AGT-002). The ingestion worker processes a raw uploaded file into cleaned text, splits it into chunks, and writes chunk records to document_chunks with embedding = NULL. The embedding agent reads unembedded chunks and populates the vector column. This is a one-way handoff via database state.

Ingestion → Chat (AGT-001 → AGT-003). Once embeddings are populated, the chat agent can retrieve chunks via similarity search. No direct handoff occurs — the database acts as the shared memory. The documents.summary generated during ingestion is immediately available to the chat agent.

Chat → Advisor (AGT-003 → AGT-004). When a user escalates from chat to advisory analysis, the conversation thread context (key decisions, identified risks from prior chat) can be included in the advisor's context packet. The strategic input component (product/app/(app)/context/components/strategic-input.tsx) provides the UI surface for this escalation, allowing users to annotate what from the chat session is relevant to carry forward.

Cap Table Extraction → Review Workflow (AGT-005 → human). The extraction agent (/api/cap-table/extract/route.ts) outputs candidate transactions that enter a human review queue (/api/cap-table/review/route.ts). The agent's context is preserved in the candidate record — the reviewer sees the source document chunk, the extracted fields, and the agent's confidence signal. Approved records (/api/cap-table/review/[id]/approve/route.ts) become canonical cap_table/transactions entries.

7.3 Shared Context Infrastructure

All agents share the following context infrastructure:

  • Supabase PostgreSQL — Authoritative state store; the single source of truth for all persistent context
  • pgvector index — Shared semantic retrieval layer available to any agent that needs document-grounded context
  • Clerk organization scope — Every agent call is bound to an org_id, ensuring no cross-tenant context contamination
  • BullMQ + Redis (Upstash) — The async job queue (docker/cloud-worker/) coordinates agent execution order and passes job payloads (which include document IDs and context pointers) between pipeline stages

7.4 Cloud vs. Local Agent Context

The docker-compose.yml defines two GPU profiles:

  • full profile — Runs ollama (local LLM), llm proxy (port 8001), embed service, and reranker locally. Agents in this mode operate with lower latency but constrained model capacity.
  • gpu profile — Runs embed + reranker + ollama. Cloud LLM calls go to Modal's Qwen2.5:32B endpoint. The admin pipeline routing stats endpoint (/api/admin/routing-stats/route.ts) tracks which agents are dispatching to which backend.

The cloud worker (docker/cloud-worker/Dockerfile, deployed via Fly.io) handles asynchronous heavy analysis tasks (advisor pipeline, batch extraction) independently from the web process, preventing context-heavy operations from blocking the interactive chat experience.


8. RAG Pipeline as Context Infrastructure

The RAG system is the central context infrastructure for DataZoom. docs/RAG_SYSTEM.md is the canonical reference. The following summarizes the pipeline's role as context management:

8.1 Ingestion Phase (Context Construction)

Upload → Storage (Supabase Storage)
      → Text Extraction (Python worker)
      → Chunking (fixed-size with overlap)
      → Embedding (all-MiniLM-L6-v2 → 384-dim)
      → document_chunks INSERT (content + embedding + metadata JSONB)
      → documents.summary UPDATE (LLM extraction)
      → timeline_events INSERT (temporal extraction)

Each step builds a persistent context artifact. The metadata JSONB column on document_chunks carries per-chunk signals (page number, section header, chunk type) that the retrieval layer uses for re-ranking.

8.2 Retrieval Phase (Context Assembly)

User Query
    → Embed query (all-MiniLM-L6-v2)
    → Pre-filter (document_type, parties[], effective_date)
    → pgvector cosine similarity search (ivfflat, top-K)
    → Re-ranking (if reranker service active)
    → Structured fact augmentation (timeline_events JOIN)
    → Thread history prepend
    → Context packet → LLM

The product/lib/__tests__/rag-retrieval-enhanced.test.ts test covers the full retrieval-plus-reranking path. The standard rag-retrieval.test.ts covers the base similarity search path.

8.3 Reranker Service

The GPU profile in docker-compose.yml includes a reranker service. Reranking applies a cross-encoder model to the top-K cosine-similar chunks, reordering them by relevance to the specific query (not just semantic proximity). This is a high-value context density mechanism: it promotes the most answer-relevant chunks to the top of the context packet, where LLMs tend to pay more attention.

8.4 Citation System

Every chunk included in a context packet carries a traceable identity: document_id (UUID) → documents.filename and chunk_index. The LLM is instructed to cite using this identity. The citation system (product/lib/__tests__/citation-system.test.ts) validates that:

  1. LLM responses include citation markers for factual claims
  2. Citation markers resolve to real chunks in the database
  3. Chunks cited exist in the organization's document scope

This closes the context loop: users can trace every AI-generated claim back to its source document and chunk, making the context chain auditable.


9. Cross-Document ID Reference Map

The following table maps every major system component to its ID in the DataZoom taxonomy, enabling cross-reference from any documentation node.

9.1 Business Requirements

IDRequirementOwner Component
BR-001Multi-tenant context isolationClerk auth + Supabase RLS
BR-002Every AI response must be source-citedCitation system (citation-system.test.ts)
BR-003Context must persist across sessionsconversations, decision_log tables
BR-004Document analysis must support 5 document typesdocuments.document_type controlled vocab
BR-005Cap table data requires human review before persistenceReview workflow (/api/cap-table/review/)
BR-006LLM responses must be grounded in uploaded documents onlyRAG retrieval + system prompt constraint

9.2 Technical Components

IDComponentPath
TECH-001Vector similarity indexCREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops)
TECH-002Embedding modelsentence-transformers/all-MiniLM-L6-v2 (384 dimensions)
TECH-003Primary LLM (cloud)Modal endpoint, Qwen2.5:32B
TECH-004Primary LLM (local)Ollama, port 11434, docker-compose.yml
TECH-005LLM proxy servicePort 8001, ghcr.io/midwestco/datazoom-base:latest
TECH-006Job queueBullMQ + Redis (Upstash), TLS via `ssl_cert_reqs=