datasources Cosmos DB container and serves as the single source of truth for what datasources exist and who can use them.
Three-Tier Architecture
Datasources operate across three distinct tiers — from global definitions down to per-chat runtime configuration:Datasource Origin (DataSourceOrigin)
Every catalog entry has an origin field that determines who created it and what rules apply:
| Origin | Description | CRUD | Deletable | Locked Fields |
|---|---|---|---|---|
builtIn | System-bootstrapped templates and capabilities. Seeded at startup via seedDatasources. | Name, description, tags, ACL editable only | No | type, origin, indexName, searchEndpoint, connection fields |
admin | Admin-authored external connections (SharePoint, Flow Retriever, etc.). Created via the Datasources admin UI. | Full CRUD | Yes | None |
builtIn: boolean flag with a more expressive enum that supports future origin types.
Three Kinds of Datasources
The catalog contains three conceptually different kinds of datasources, all represented asIDataSource entities:
Datasource Lifecycle by Kind
Each kind follows a different lifecycle from catalog definition through to runtime: Kind 1 — Virtual (no infrastructure):Retrieval Pipeline
The origin-based classification above describes who creates a datasource. The retrieval pipeline describes how data flows from a source into the RAG context at query time. Every datasource follows one of three paths:llmknowledge uses the model’s training data; websearch injects live web results (Tavily, Brave, etc.).
Physical (Index) — Documents live in storage (Azure Blob, OneDrive, or SharePoint) and are searchable via an Azure AI Search index. The index is auto-created per chat/user/library.
Physical (Connection) — Documents are retrieved from an external system through a registered Data Connection (IDataConnection):
| Path | Type | Connection Target | How It Works |
|---|---|---|---|
| Vector → Connection | vectorretriever | Pinecone, Qdrant, Weaviate, Chroma, Redis, pgvector, etc. | vectorConnectionId resolves to a Data Connection → direct vector similarity search → documents |
| Connection → Flow Retriever | flowretriever | SQL databases, REST APIs, custom logic | flowId resolves to a headless retriever flow → flow executes with connection params → documents |
Security Model
The chat security guard (chatSecurityGuard.ts) validates datasource access differently per kind:
| Kind | Validation | Details |
|---|---|---|
| Kind 1 (Virtual) | Trusted | No index to protect. Enabled/disabled per chat config. |
| Kind 2 (File Storage) | Prefix / ownership | Personal/Workspace: index name prefix must match user’s UPN. Shared: user must be entitled to the chat that owns the index. |
| Kind 3 (Admin External) | Catalog ACL | datasourceId is looked up in the catalog → entitlement service checks the user’s access against the catalog entity’s ACL. |
Built-In Datasource IDs
The following UUIDs are reserved for built-in datasource templates (prefix0a):
| ID | Name | Type |
|---|---|---|
0a..001 | LLM Knowledge | llmknowledge |
0a..002 | Workspace | workspace |
0a..003 | Shared Files | shared |
0a..004 | Personal Files | personal |
0a..005 | Web Search | websearch |
0a..006 | Flow Retriever | flowretriever |
0a..007 | Vector Store | vectorretriever |
0a..008 | SharePoint | sharepoint |
Admin UI
The Datasources admin page (Admin → Datasource Catalog [#/admin/datasources]) presents all catalog entries in a data grid:
- Built-in entries show a lock icon and cannot be deleted. Only name, description, tags, and permissions are editable.
- Admin entries support full CRUD including connection details, type, and permissions.
- Workspace Datasource Settings configure which catalog entries are activated by default for workspace and shared chats (
IAppSettings.workspaceDataSources[]anddefaultChatDataSources[]).
Chat Types & Data Sources
Chats in Findable use a multi-source RAG architecture. Each chat’sdataSources array (IDataSourceConfig[]) is the sole source of truth for where documents come from. A chat with an empty array operates in LLM-only mode (no retrieval).
Data Source Types (DATASOURCE_TYPE)
Each entry in the dataSources array has a type that determines storage, indexing, and retrieval behaviour:
| Type | Enum Value | Storage | Search Index | Use Case |
|---|---|---|---|---|
| Shared | shared | Azure Blob → Shared/{folderName}/ | Auto-created per folder | Team knowledge bases with uploaded documents |
| Personal | personal | Azure Blob → Personal/{sanitized-upn}/ | Auto-created per user | Private user files (persisted chats) |
| Workspace | workspace | Azure Blob Private/ or OneDrive | Resolved at runtime from user UPN | Ephemeral session workspace files |
| SharePoint | sharepoint | SharePoint Online | Auto-created per library | Index and search SharePoint document libraries |
| Flow Retriever | flowretriever | N/A (virtual) | N/A — flow returns documents directly | Headless retriever flows that query databases, APIs, or custom logic |
| Vector Retriever | vectorretriever | External vector store | N/A — queries vector DB directly | Query Pinecone, Qdrant, Weaviate, Chroma, Redis, pgvector, etc. |
| Web Search | websearch | N/A (live results) | N/A | Ground responses in real-time web search results |
| LLM Knowledge | llmknowledge | N/A | N/A | LLM-only mode — no retrieval, uses model’s training data |
IDataSourceConfig Fields
Every data source entry supports the following fields:
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for this source |
datasourceId | string? | FK to IDataSource.id in the datasource catalog (provenance link) |
indexName | string | Azure Search index name. PERSONAL: persisted from UPN + chatTitle. WORKSPACE: empty — resolved at runtime. |
searchEndpoint | string? | Search endpoint ID (uses default if not set) |
weight | number? | Result weighting 0.0–1.0 (default: 1.0) |
maxResults | number? | Max documents from this source |
filter | string? | Static OData filter expression |
label | string? | Display name (e.g. “Company Docs”, “My Files”) |
type | DATASOURCE_TYPE? | Source type — determines storage and retrieval |
enabled | boolean? | Whether this source is active (default: true) |
enableAclFiltering | boolean? | Enable native ACL filtering on the search index |
allowUserToggle | boolean? | End users can disable this source at runtime from the sidebar |
allowUserWeightEdit | boolean? | End users can adjust the source weight at runtime |
allowUserMaxResultsEdit | boolean? | End users can adjust max results at runtime |
selectedFolder | string? | Blob folder name for file-backed sources |
Multi-Source Result Merging
When a chat has multiple data sources, results are merged using theresultMergeStrategy field (RESULT_MERGE_STRATEGY enum) on IChatTabDBItem:
| Strategy | Enum Value | Description |
|---|---|---|
| Interleave | interleave | Round-robin results from each source (default) |
| Weighted | weighted | Score-based ranking with per-source weight values |
| Sequential | sequential | Results from source 1, then source 2, etc. |
deduplicateResults boolean flag removes duplicate documents that appear across multiple sources.
Chat Classification (CHAT_KIND)
CHAT_KIND is a virtual, non-persisted runtime label derived at query time by getChatKind(chat). It is never stored in Cosmos DB and must not be used for ACL or storage decisions — those are governed by EntityScope.
| Kind | Derivation | Description |
|---|---|---|
CHAT_KIND.SHARED | scope === EntityScope.Shared | ACL-governed shared chat |
CHAT_KIND.WORKSPACE | isWorkspace === true (server-set) | Ephemeral workspace session |
CHAT_KIND.PERSONAL | All other cases | Personal persisted chat |
isSharedChat(), isWorkspaceChat(), isPersonalChat() from @eaai/shared rather than reading the enum value directly.
Key Files
| File | Purpose |
|---|---|
shared/src/domain/datasources/datasource.ts | IDataSource interface, DataSourceOrigin enum, BUILT_IN_LOCKED_FIELDS |
src/bootstrap/bootstrapdatasources.ts | Built-in datasource definitions and default workspace/chat settings |
src/datasources/seedDatasources.ts | Bootstrap seed logic with migration support |
src/datasources/datasourcesrouter.ts | CRUD API with origin-based guards |
src/azure/ai/chatSecurityGuard.ts | Runtime datasource validation per kind |
client/src/components/Datasources/ | Admin UI — catalog grid, editor dialog, workspace settings |