Vector Store
Vector store implementation for document embeddings
Vector Store Implementation Guide#
✅ Implementation Complete#
This implementation provides pgvector support with multi-tenant vector storage and similarity search for your TypeORM + Supabase Postgres application.
This document describes the pgvector and LangChain integration for embedding storage and similarity search.
Setup#
1. Environment Variables#
Add the following to your .env file:
# Required
OPENAI_API_KEY=your-openai-api-key
EMBEDDING_DIM=1536 # Default dimensions for text-embedding-3-small
# Database (should already be configured)
DB_HOST=your-supabase-host
DB_PORT=5432
DB_USERNAME=your-username
DB_PASSWORD=your-password
DB_NAME=your-database
2. Run Migrations#
# Run the embeddings table migration
npm run migration:run
# Optional: Run RLS migration for Supabase deployments
# Edit the RLS migration file first to match your JWT structure
Usage#
Basic Usage in Code#
import { vectorStoreService } from "./services/vector-store.service";
// Initialize (after DataSource is ready)
await vectorStoreService.initialize();
// Add embeddings
const chunks = [
{ content: "First chunk of text", metadata: { source: "doc1" } },
{ content: "Second chunk of text", metadata: { source: "doc1" } },
];
const ids = await vectorStoreService.addChunks(
organizationId,
documentId, // optional
chunks,
);
// Search for similar content
const results = await vectorStoreService.search(
organizationId,
"search query",
10, // top K results
);
Running the Example Script#
# Set environment variables
export ORGANIZATION_ID="your-org-uuid"
export DOCUMENT_ID="your-doc-uuid" # optional
# Run the example
cd server
npx ts-node scripts/ingest-example.ts
# With cleanup
CLEANUP=true npx ts-node scripts/ingest-example.ts
Architecture#
Database Schema#
The embeddings table structure:
id: UUID primary keyorganizationId: UUID for multi-tenancydocumentId: Optional UUID linking to documentscontent: The text contentmetadata: JSONB for additional dataembedding: vector(1536) for similarity search
Indexes#
- B-tree index on
organizationIdfor filtering - HNSW index on
embeddingusing cosine distance for similarity search
Performance Optimization#
For large-scale deployments:
- Partial HNSW indexes per large organization:
CREATE INDEX embeddings_embedding_hnsw_org_xyz
ON embeddings USING hnsw (embedding vector_cosine_ops)
WHERE "organizationId" = 'specific-org-uuid';
- LIST partitioning for massive scale (see migration comments)
API Reference#
VectorStoreService Methods#
initialize()#
Initialize the vector store. Must be called after DataSource is initialized.
addChunks(organizationId, docId, chunks)#
Add text chunks to the vector store.
- Returns: Array of embedding IDs
search(organizationId, query, k)#
Search for similar content within an organization.
- Returns: Array of results with similarity scores
deleteByDocumentId(organizationId, docId)#
Delete all embeddings for a document.
- Returns: Number of deleted rows
getStatistics(organizationId)#
Get embedding statistics for an organization.
- Returns: Statistics object
Switching Embedding Models#
To use a different embedding model:
- Update
.env:
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
EMBEDDING_DIM=3072 # For large model
- Create a new migration to update the vector dimension:
ALTER TABLE embeddings
ALTER COLUMN embedding TYPE vector(3072);
Switching Distance Metrics#
Currently using cosine distance (default). To switch:
For L2 (Euclidean) distance:#
-- Drop old index
DROP INDEX embeddings_embedding_hnsw;
-- Create new index with L2
CREATE INDEX embeddings_embedding_hnsw
ON embeddings USING hnsw (embedding vector_l2_ops);
-- Update queries to use <-> operator
For Inner Product:#
-- Create index with inner product
CREATE INDEX embeddings_embedding_hnsw
ON embeddings USING hnsw (embedding vector_ip_ops);
-- Update queries to use <#> operator
Troubleshooting#
Extension not found#
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
Performance issues#
- Check index usage:
EXPLAIN ANALYZE your_query - Consider partial indexes for large orgs
- Monitor
listsandef_constructionHNSW parameters
Multi-tenancy concerns#
- Always filter by
organizationIdfirst - Use RLS policies in Supabase deployments
- Consider partitioning for 1000+ organizations
Testing#
Run the integration tests:
npm test -- tests/integration/vector-store.test.ts
Security Considerations#
- Multi-tenancy: All queries are scoped by
organizationId - RLS: Optional Row-Level Security for Supabase
- API Keys: Store OpenAI keys securely
- Data deletion: Cascade delete with documents