Colony

Document agents

Document agents

Last updated 5/24/2026

Document Agents

Agent Thesis

Colony's agent runtime generates, consumes, and continuously revises a structured corpus of living documents. These documents are not static artifacts — they serve as operating memory that feeds back into subsequent agent turns. The GTM Orchestrator and its eight specialist agents (prospect, qualification, message-gen, recording, post-call, content, onboarding, analytics) each own a defined document surface: some produce primary artifacts (outreach sequences, Deployment Kits, daily briefs, case-study drafts), others produce intermediate records (qualification scorecards, ICP signals, post-call summaries) that downstream agents ingest as structured input. The Knowledge Core — a 10-domain pgvector store on Cloud SQL Postgres 16 (colony-39989:…:colony) — is the canonical shared memory layer. Every agent that writes a document commits a vector embedding into the Knowledge Core; every agent that reads context pulls exemplars from it via pgvector retrieval before generating output.

The documents break into three tiers:

  1. Operational documents — produced during live agent execution and consumed by the next step in the same pipeline (qualification scorecards → message-gen prompts, meeting signals → CRM field patches, prospect candidate lists → approval queue).
  2. Durable artifacts — long-lived deliverables written to Google Cloud Storage (gs://colony-assets, CMEK + signed URLs) or synced to external systems (Pipedrive deal fields, Notion Knowledge Core via docs/phase2/action_plans/15_notion_kc_sync.md). Examples: the 8-asset Deployment Kit generated on Closed-Won, content pillar posts, case-study drafts.
  3. System intelligence documents — the daily brief, action plans under docs/phase1/action_plans/ and docs/phase2/action_plans/, runbooks under docs/phase1/runbooks/ and docs/phase2/runbooks/, and E2E test specs under docs/phase1/testing/ and docs/phase2/testing/. These govern how the agent runtime itself is built and verified.

Agent Roles

AgentInputsOutputsReview Gate
GTM Orchestrator (docs/phase1/action_plans/19_orchestrators.md)Founder plain-English prompt via /overview SSE chat; Knowledge Core exemplars (pgvector); org-scoped circuit-breaker state from api_keys vaultMulti-turn tool-call transcript streamed to UI; routing decisions dispatched as Inngest events to specialist agentsHuman approval required before message-gen sequences are enqueued (approval queue — docs/phase1/action_plans/21_approval_queue.md); auto-approve rules gated by docs/phase2/action_plans/17_outreach_pace_auto_approve.md
Campaign Orchestrator (docs/phase1/action_plans/19_orchestrators.md)Approved prospect list; ICP score from qualification agent; sequence template (T1–T5, 5 angles)Outbound sequence document: per-contact message variants written to sequences table; batch run logs to runners/.artifacts/pipedrive-batch/Angle selection and send schedule reviewed in HOT reply queue before final dispatch via Unipile
Prospect Agent (docs/phase1/action_plans/11_prospect_agent.md)Pipedrive LeadBooster scraper output (docs/phase2/action_plans/02_pipedrive_leadbooster_scraper.md); Google Places (docs/phase2/action_plans/03_google_places_integration.md); SerpAPI (docs/phase2/action_plans/04_serpapi_integration.md); Unipile search (docs/phase2/action_plans/06_unipile_search_extension.md); CSV uploads (docs/phase2/action_plans/26_csv_upload_backboard_source.md); Playwright runner artifacts (runners/.artifacts/pipedrive-dry-run/)Candidate queue document: ranked JSON list of prospect candidates committed to candidate_queue table; dry-run captures (candidates.json, captures.json, final.png)Candidate approval queue (docs/phase2/action_plans/09_candidate_approval_queue.md) — owner role must promote or reject before prospects enter active sequences
Qualification Agent (docs/phase1/action_plans/12_qualification_agent.md)Candidate queue entries; public data adapters (docs/phase2/action_plans/05_public_data_adapters.md); healthcare ICP signal extraction (docs/phase2/action_plans/22_healthcare_icp_signal_extraction.md); anti-ICP hard prefilter (docs/phase2/action_plans/16_anti_icp_hard_prefilter.md)ICP scorecard per contact: structured JSON with signal fields written to contacts table and Pipedrive custom fields (docs/phase1/runbooks/pipedrive-fields.md); matching algorithm output (docs/phase2/action_plans/08_matching_algorithm.md)Scores below ICP threshold trigger circuit breaker (docs/phase1/action_plans/22_circuit_breakers.md); owner reviews borderline cases in approval queue
Message-Gen Agent (docs/phase1/action_plans/13_message_generator_agent.md)Qualified prospect record; ICP scorecard; Knowledge Core exemplars (pgvector, 1536-dim embeddings); outbound angle selection (signal-, pain-, referral-, pattern-break-, insight-led)Draft outreach messages for T1–T5 sequence steps; multi-channel variants (docs/phase2/action_plans/23_multi_channel_sequence_engine.md) stored in sequence_steps table; sender mailbox rotation metadata (docs/phase2/action_plans/25_sender_mailbox_rotation.md)All generated messages enter approval queue before send; auto-approve applies only when pace rules (docs/phase2/action_plans/17_outreach_pace_auto_approve.md) and circuit-breaker state allow
Recording Intelligence Agent (docs/phase1/action_plans/14_recording_intelligence_agent.md)Gemini Meet Notes ingested from Google Drive (docs/phase1/action_plans/09_google_drive_integration.md); recording pipeline runbook (docs/phase2/runbooks/03_recording_pipeline.md)Extracted signal document: structured JSON of key moments, objections, next steps written to meeting_signals table; CRM field patches pushed to Pipedrive; Knowledge Core vector upsertSignals routed to post-call agent automatically; human reviewer (member role) can edit extracted fields before CRM sync completes
Post-Call Agent (docs/phase1/action_plans/15_post_call_agent.md)Meeting signal document from recording agent; contact and deal context from Pipedrive (bi-sync); humanized post-call writer config (docs/phase2/action_plans/18_humanized_post_call_writer.md)Post-call summary document: human-readable narrative + action items written to call_summaries table and GCS (gs://colony-assets); Google Chat push notification (docs/phase2/action_plans/20_google_chat_push.md)Member can approve or revise before summary is committed to Knowledge Core and sent to stakeholders via Resend
Content Agent (docs/phase1/action_plans/16_content_agent.md)6 content pillars defined in Knowledge Core; pipeline influence attribution data; Knowledge Core exemplarsContent draft documents per pillar: long-form posts, case-study shortlists, one-pagers stored in GCS (gs://colony-assets) and indexed in content_pieces tableOwner/admin reviews drafts; attribution → pipeline influence scoring visible in analytics before publish
Onboarding Agent (docs/phase1/action_plans/17_onboarding_agent.md)Closed-Won deal trigger from Pipedrive (9-stage pipeline); stakeholder map and ICP scorecard; Knowledge Core8-asset Deployment Kit: pitch deck, one-pager, case-study shortlist, pricing reference, rollout calendar, stakeholder map, risk register, KPI dashboard — all written to GCS (gs://colony-assets) under signed URLs; asset manifest recorded in deployment_kits tableAdmin role must approve full kit before delivery; individual assets can be revised by member role via the command interface
Analytics Agent (docs/phase1/action_plans/18_analytics_agent.md)Pipeline stage data from Pipedrive (bi-sync); sequence send/reply metrics; content attribution records; Knowledge CoreDaily brief document: pipeline-today snapshot, alerts, outbound queue status, yesterday's output — delivered via Resend email and surfaced at /overview; cost dashboard data (docs/phase2/runbooks/05_cost_dashboard.md)Daily brief is auto-published on schedule; KPI threshold breaches surface as alerts requiring owner acknowledgment

Document Loop

Generated documents do not terminate at delivery — they re-enter the agent pipeline as source material for the next execution cycle. The loop operates across three feedback paths:

Path 1 — Knowledge Core ingestion. Every durable artifact produced by the content agent, post-call agent, onboarding agent, or message-gen agent is embedded (1536-dimension vectors via pgvector on colony-39989:…:colony) and stored in the Knowledge Core. On the next agent turn, the GTM Orchestrator and all specialist agents query the Knowledge Core using pgvector similarity search before generating any output, ensuring that previously approved messages, case studies, and call summaries directly shape new drafts. The Notion KC sync (docs/phase2/action_plans/15_notion_kc_sync.md) propagates this corpus to a human-readable Notion workspace, allowing founders to annotate entries that flow back as updated embeddings.

Path 2 — Pipedrive bi-sync. ICP scorecards, meeting signals, and post-call summaries written by qualification, recording, and post-call agents are pushed to Pipedrive custom fields (per docs/phase1/runbooks/pipedrive-fields.md). On subsequent prospect and qualification runs, the discovery orchestrator (docs/phase2/action_plans/10_discovery_orchestrator.md) reads these enriched Pipedrive records to avoid duplicating outreach and to calibrate the matching algorithm (docs/phase2/action_plans/08_matching_algorithm.md) against real deal outcomes.

Path 3 — Daily brief as orchestration trigger. The analytics agent's daily brief (generated each morning, delivered via Resend, visible at /overview) is not merely a report — it contains a prioritized action queue. When the GTM Orchestrator processes the brief at session start, it uses alert items (e.g., HOT reply queue depth, pipeline stage stalls, circuit-breaker trips) as explicit inputs to determine which specialist agents to invoke and in what order during that day's chat session. This closes the loop: yesterday's agent output determines today's agent instructions.

Runner artifacts under runners/.artifacts/pipedrive-batch/ and runners/.artifacts/pipedrive-dry-run/ (batch logs, summary.json, candidates.json) are written by the Playwright generic runner (docs/phase2/action_plans/07_playwright_generic_runner.md) and consumed by the discovery orchestrator on the next batch cycle, enabling incremental prospecting without re-scraping known contacts.


Governance

Human review tiers. Colony enforces three Clerk RBAC roles (admin, owner, member) with distinct document authorities:

  • Admin — approves the full Deployment Kit before delivery, manages Knowledge Core domain definitions, and can retire or archive any document type from the GCS bucket or content_pieces / deployment_kits tables.
  • Owner — reviews the candidate approval queue, approves outbound sequences beyond auto-approve thresholds, acknowledges KPI breach alerts from the daily brief, and authorizes Pipedrive field schema changes via docs/phase1/runbooks/pipedrive-fields.md.
  • Member — edits extracted meeting signal fields before CRM sync, revises post-call summaries, and marks individual Deployment Kit assets as ready for delivery.

Approval queue. All agent-generated messages and candidate promotions pass through the approval queue (docs/phase1/action_plans/21_approval_queue.md) before any external action (send via Unipile, push to Pipedrive, deliver to customer). The auto-approve gate (docs/phase2/action_plans/17_outreach_pace_auto_approve.md) applies only when per-org circuit breakers (docs/phase1/action_plans/22_circuit_breakers.md) report green and the daily send pace is within configured limits.

Circuit breakers. Per-org circuit-breaker state (stored in the api_keys vault, KMS-encrypted) halts document promotion to external systems when error rates or bounce thresholds are exceeded. Phase 2 extensions are specified in docs/phase2/action_plans/11_circuit_breakers_phase2.md and verified by docs/phase2/testing/11_breakers_phase2.spec.ts and docs/phase2/testing/11_e2e_breakers.spec.ts.

Observability and audit. All agent LLM calls are traced through Langfuse (platform key in GCP Secret Manager, read at deploy time via gcloud run deploy --update-secrets=…). Sentry captures runtime errors from Inngest durable functions. The .codacy/ toolchain (ESLint via .codacy/tools-configs/eslint.config.mjs, Semgrep via .codacy/tools-configs/semgrep.yaml, Trivy via .codacy/tools-configs/trivy.yaml) runs static analysis on all code that generates or processes documents. The .github/workflows/rbac-check.yml CI workflow enforces that no document-writing path bypasses role checks.

Document retirement. Action plans in docs/phase1/action_plans/ and docs/phase2/action_plans/ are considered executed and immutable once the corresponding E2E test spec in docs/phase1/testing/ or docs/phase2/testing/ passes in CI (.github/workflows/playwright-runner.yml). Retired documents remain in version control for audit lineage but are marked superseded in the phase README (docs/phase1/README.md, docs/phase2/README.md). Durable artifacts in GCS (gs://colony-assets) are subject to the DR recovery runbook (docs/phase2/runbooks/04_dr_recovery.md) and the cost dashboard (docs/phase2/runbooks/05_cost_dashboard.md) for lifecycle and cost governance.