IAS Docs

Context Lake

The unified knowledge layer that aggregates project context from repositories, integrations, intakes, and human input so agents and humans share a single source of truth.

The Context Lake is the unified knowledge layer at the center of IAS. It aggregates and organizes project context from multiple sources -- repository documentation, tickets, conversations, integration events, and code metadata -- so that agents and humans share a single, consistent view of a project's state.

Why the Context Lake exists

AI agents are most effective when they have rich, relevant context. Without a centralized knowledge layer, context is scattered across Git repos, Google Drive, Notion, Jira, Slack threads, and individual developers' heads. Agents either get incomplete context (and make mistakes) or get firehosed with everything (and lose focus).

The Context Lake solves this by providing a structured, policy-aware aggregation point:

  • Agents read from the Context Lake to understand project constraints, prior decisions, and current goals before taking action. Context is assembled into context packs -- curated snapshots tailored to the task at hand.
  • Humans use the Context Lake (via the Console UI) to review what agents know, fill gaps, correct inaccuracies, and verify that context is accurate and current.
  • Integrations write to the Context Lake as events occur -- documents tracked, tickets synced, messages sent, replies received.

Data sources

The Context Lake aggregates context from six primary sources:

Repository scaffold

When you add a repository and run Build Context, IAS creates a scaffolding directory (docs/ias/) in the repo. This becomes the canonical source of truth for project-level context:

  • project-context.md -- project description, constraints, architecture patterns, team conventions
  • decisions/ -- records of decisions made during goal execution, with rationale
  • gaps.md -- known knowledge gaps and open questions
  • runs/ -- execution history and artifacts from completed goals

The repository scaffold is always the authoritative source. The Context Lake aggregates it alongside other sources but does not replace it.

Build Context output

When Build Context runs, it extracts code metadata, architecture patterns, dependency information, and structural insights from the codebase. These extracted facts become context items that agents reference when planning work.

Intakes

Goal descriptions, requirements, and attached documents submitted through the Console. Each intake creates context items that capture what the team wants to achieve and any supporting material provided.

Integration events

Data from connected integrations: tracked Google Drive documents, Notion pages, Confluence pages, synced Jira tickets, Gmail threads, and Slack messages. Each integration event flows through the data policy pipeline before storage.

Decision requests

Questions asked by agents and their resolutions. When an agent surfaces a decision request and a human answers it, both the question and the answer become part of the Context Lake. Over time, this builds an institutional memory of how the team makes decisions.

Human annotations

Manual context additions made through the Console UI. Team members can add notes, upload documents, or create external links directly in the Context Lake to fill gaps that automated sources do not cover.

Context items and knowledge artifacts

The Context Lake organizes information into two main categories:

Context items are discrete pieces of knowledge with a source, timestamp, and relevance scope. Each item has a type that indicates its origin:

TypeSourceExample
noteManual entryArchitecture note added by a team member
external_linkManual entryLink to an external design document
uploadFile uploadUploaded PDF or image
drive_docGoogle Drive integrationTracked Google Doc
notion_pageNotion integrationTracked Notion page
confluence_pageConfluence integrationTracked Confluence page

Context items can be pinned (to prioritize them in context packs), filtered by type or source, and refreshed from their external origin when applicable.

Knowledge artifacts are higher-level, curated documents that synthesize multiple context items into a coherent narrative. They have a stable key and a revision history, making them versionable over time. Examples include the project context document and build context summaries. Knowledge artifacts are typically created by internal operators with space write access.

Data policy modes

Every workspace has a configurable data policy that controls what content the Context Lake stores in the cloud. This is critical for teams working with sensitive or client data.

internal_ok

Full content storage is permitted. The Context Lake stores complete text from intakes, repository labels, context artifacts, and integration events. This mode is appropriate for internal teams working on proprietary projects where cloud storage of content is acceptable.

metadata_only

Only structural metadata is stored -- labels, references, timestamps, and identifiers. The actual content of documents, messages, and code is not persisted in the cloud. This mode is appropriate for sensitive or client work where content must not leave the local environment.

In metadata_only mode:

  • Integration-ingested items store title = null and summary = null unless those fields are explicitly allowlisted.
  • Context pack previews respect the policy and return redacted values.
  • External document bodies are not cached in cloud storage.

Allowlist mechanism

Even in metadata_only mode, certain fields may need to be stored for the system to remain usable. The allowlist lets you selectively permit specific fields while keeping everything else redacted.

Default allowlist entries:

FieldWhat it permits
repo.labelRepository display name (e.g., acme/web-app) -- needed for navigation and identification
intake.titleGoal title text -- needed to identify intakes in lists and dashboards
intake.rawContentGoal description content -- needed for agents to understand what work is requested

You can configure additional allowlist entries in the workspace settings. If you see unexpected redaction in context pack previews, the correct fix is to adjust the workspace policy (either switching to internal_ok or adding the relevant field to the allowlist), not to work around it per-item.

Policy enforcement

Data policy is enforced at two points, providing defense in depth:

Write-time enforcement

When data is written to the Context Lake -- by an agent, integration, or human -- the system checks the workspace's data policy and filters content before storage. In metadata_only mode, only allowlisted fields pass through; everything else is stripped before it reaches the database.

This means that even if a bug or misconfiguration were to bypass read-time checks, disallowed content would not exist in storage.

Read-time enforcement

Queries that assemble context for display or agent consumption also respect the data policy. Even if a workspace previously stored fields that are now disallowed (e.g., due to a policy change), those fields are redacted on read. This handles legacy data and provides an additional safety layer.

How agents use the Context Lake

When an agent starts working on a goal, IAS assembles a context pack -- a curated snapshot of relevant Context Lake items tailored to the specific task. The context pack includes:

  • Project-level context from the repository scaffold
  • Relevant context items (decisions, integration data, human notes)
  • Knowledge artifacts that apply to the current workstream
  • Any pinned items that the team has marked as high-priority

The context pack is policy-aware: it only includes content that the workspace data policy permits. Agents receive this pack as part of their job payload, giving them a focused, relevant view of the project without needing to search through raw data themselves.

Retention

Context Lake data persists until explicitly removed:

  • Workspace deletion removes all associated data.
  • Manual cleanup by a workspace administrator removes individual items or artifacts through the Console UI.
  • Disconnect integration removes the integration but tracked context items remain (they become orphaned from their source but retain their cached content).

There is no automatic expiration or TTL on context items. The Context Lake is designed to accumulate knowledge over time, making agents progressively more effective as project history builds up.

Cached content and freshness

Integration-backed context items can cache external document bodies (stored as blobs) and text extracts (stored for search and preview). The current model stores these durably, with on-demand refresh available for tracked documents.

The recommended approach to managing cached content:

  • Pointer-first -- always keep the pointer to the external source plus a content hash, even if the cached body is purged.
  • Refresh on demand -- use the "Refresh from Drive/Notion/Confluence" button on individual context items to pull the latest content.
  • Purge rather than delete -- if you want to remove cached text but keep the pointer, prefer purging the cached text over deleting the entire context item.

Relationship to repository context

The Context Lake complements but does not replace the repository-level context in docs/ias/. The repository remains the canonical source of truth for:

  • Project constraints and metadata (project-context.md)
  • Agent decisions and their rationale (decisions/)
  • Knowledge gaps and open questions (gaps.md)
  • Run artifacts and execution history (runs/)

The Context Lake aggregates this repository context with information from other sources -- integrations, intakes, human input -- to provide a richer, cross-cutting view. Think of the repository as the ground truth and the Context Lake as the wider lens that brings in everything else.

On this page