Postgres + pgvector vs a specialised vector store: a 2025 benchmark
Before picking a vector store for the law firm PravoVector's RAG system we benchmarked four candidates on 1.2M chunks. Honest numbers: where pgvector wins and where it folds.
Creastra Digest
- Up to 2M vectors, pgvector with HNSW matches specialised stores on latency
- pgvector's edge: JOINs with relational data in one query — Pinecone and Weaviate cannot
- Above 5M vectors or with >5 metadata filters — move to Qdrant
Law firm PravoVector is building an in-house AI assistant for lawyers: ask a question, get an answer cited from your archive — 1.2M documents, 8.4M chunks with embeddings. «Where do we store it?» went from academic to «don't burn 4M ₽/year of infrastructure». We benchmarked four contenders: pgvector, Qdrant, Weaviate, Pinecone. Marketing-free numbers below.
Benchmark setup
- Embeddings: text-embedding-3-large, 3072 dimensions
- Volume: 1.2M chunks, average 380 tokens per chunk
- Queries: 5,000 real lawyer queries from the legacy system's logs
- Hardware: 16 vCPU, 64 GB RAM, NVMe SSD; cloud options used the vendor's default plan
- Metrics: p95 latency, recall@10, monthly cost at 10M queries
Latency results
pgvector with HNSW (m=16, ef_construction=200) — p95 38 ms. Qdrant — 24 ms. Weaviate — 31 ms. Pinecone (s1.x1) — 42 ms. At 1.2M scale all four are acceptable; the gap is irrelevant for an assistant where the LLM call already takes 1.5–3 seconds.
Recall and quality
All four hit recall@10 between 0.94 and 0.97 with proper index tuning. Subtlety: pgvector recall depends heavily on ef_search — we landed at 80, yielding 0.96. Qdrant ships at 0.97 with no tuning. Pinecone and Weaviate sit in between.
JOINs with metadata — the main pgvector argument
Every PravoVector chunk links to a case, client, jurisdiction, year, and a dozen more fields. Lawyers filter aggressively: «find precedents in the Saint-Petersburg Commercial Court between 2022 and 2024 where judge N ruled on Article 53.1 of the Civil Code». In pgvector that is one SQL query; in Pinecone it is two network hops and an app-side join.
-- pgvector: semantic search + relational filters in one query
SELECT c.id, c.text, d.case_no, d.judge_name,
1 - (c.embedding <=> $1) AS score
FROM chunks c
JOIN documents d ON d.id = c.document_id
WHERE d.court_code = 'SPB_ARB'
AND d.decision_date BETWEEN '2022-01-01' AND '2024-12-31'
AND d.judge_name = 'Ivanov I.I.'
AND d.articles @> ARRAY['CC-53.1']
ORDER BY c.embedding <=> $1
LIMIT 10;Monthly cost at 10M queries
- pgvector on self-hosted Postgres (RDS db.r6g.4xlarge) — roughly 980 USD
- Qdrant Cloud — about 1,240 USD
- Weaviate Cloud (standard) — about 1,380 USD
- Pinecone (s1.x1, 1 pod) — about 1,700 USD
- Saved the client 8,600 USD/year vs Pinecone — that money funded another part of the agent
When pgvector does not fit
Above 5M vectors HNSW indices balloon in RAM and build time grows non-linearly. For giant catalogues (millions of products with descriptions), several jurisdictions with 8+ filter fields at once, or strict p99 < 30 ms — go Qdrant. Pinecone fits when you are already AWS-locked and have no DevOps capacity.
What we picked
For PravoVector — pgvector. 1.2M chunks today, projected to 4M in 18 months, heavy metadata filters, a team already fluent in Postgres. The architecture leaves a migration path to Qdrant if volume crosses 5M or a sub-25 ms p95 use case emerges.