How much does a CREASTRA website cost?

Landing — from $3,000, corporate site — from $7,500, SaaS platform — from $18,000. Final estimate after a short brief.

How long does a project take?

Landing — 3-4 weeks, corporate site — 6-10 weeks, mobile MVP — 8-14 weeks. Timeline is fixed in the contract.

Do you do franchise packaging?

Yes. Full franchise pack: brand book, ops manuals, training, marketing kit, control system. We've launched franchises in HoReCa, EdTech, retail.

Is there a delivery guarantee?

Yes. Contract-backed, KPI-fixed, with checkpoint approvals. Development comes with a 12-month bug-fix warranty.

What locations do you serve?

Studio in Moscow, projects worldwide. Remote-first, regular calls, transparent task tracker.

← Back to blog

AI agentsNovember 4, 2025· 10 min read

Postgres + pgvector vs a specialised vector store: a 2025 benchmark

Before picking a vector store for the law firm PravoVector's RAG system we benchmarked four candidates on 1.2M chunks. Honest numbers: where pgvector wins and where it folds.

Creastra Digest

Up to 2M vectors, pgvector with HNSW matches specialised stores on latency
pgvector's edge: JOINs with relational data in one query — Pinecone and Weaviate cannot
Above 5M vectors or with >5 metadata filters — move to Qdrant

Law firm PravoVector is building an in-house AI assistant for lawyers: ask a question, get an answer cited from your archive — 1.2M documents, 8.4M chunks with embeddings. «Where do we store it?» went from academic to «don't burn 4M ₽/year of infrastructure». We benchmarked four contenders: pgvector, Qdrant, Weaviate, Pinecone. Marketing-free numbers below.

Benchmark setup

Embeddings: text-embedding-3-large, 3072 dimensions
Volume: 1.2M chunks, average 380 tokens per chunk
Queries: 5,000 real lawyer queries from the legacy system's logs
Hardware: 16 vCPU, 64 GB RAM, NVMe SSD; cloud options used the vendor's default plan
Metrics: p95 latency, recall@10, monthly cost at 10M queries

Latency results

pgvector with HNSW (m=16, ef_construction=200) — p95 38 ms. Qdrant — 24 ms. Weaviate — 31 ms. Pinecone (s1.x1) — 42 ms. At 1.2M scale all four are acceptable; the gap is irrelevant for an assistant where the LLM call already takes 1.5–3 seconds.

Recall and quality

All four hit recall@10 between 0.94 and 0.97 with proper index tuning. Subtlety: pgvector recall depends heavily on ef_search — we landed at 80, yielding 0.96. Qdrant ships at 0.97 with no tuning. Pinecone and Weaviate sit in between.

JOINs with metadata — the main pgvector argument

Every PravoVector chunk links to a case, client, jurisdiction, year, and a dozen more fields. Lawyers filter aggressively: «find precedents in the Saint-Petersburg Commercial Court between 2022 and 2024 where judge N ruled on Article 53.1 of the Civil Code». In pgvector that is one SQL query; in Pinecone it is two network hops and an app-side join.

-- pgvector: semantic search + relational filters in one query
SELECT c.id, c.text, d.case_no, d.judge_name,
       1 - (c.embedding <=> $1) AS score
FROM chunks c
JOIN documents d ON d.id = c.document_id
WHERE d.court_code = 'SPB_ARB'
  AND d.decision_date BETWEEN '2022-01-01' AND '2024-12-31'
  AND d.judge_name = 'Ivanov I.I.'
  AND d.articles @> ARRAY['CC-53.1']
ORDER BY c.embedding <=> $1
LIMIT 10;

Monthly cost at 10M queries

pgvector on self-hosted Postgres (RDS db.r6g.4xlarge) — roughly 980 USD
Qdrant Cloud — about 1,240 USD
Weaviate Cloud (standard) — about 1,380 USD
Pinecone (s1.x1, 1 pod) — about 1,700 USD
Saved the client 8,600 USD/year vs Pinecone — that money funded another part of the agent

When pgvector does not fit

Above 5M vectors HNSW indices balloon in RAM and build time grows non-linearly. For giant catalogues (millions of products with descriptions), several jurisdictions with 8+ filter fields at once, or strict p99 < 30 ms — go Qdrant. Pinecone fits when you are already AWS-locked and have no DevOps capacity.

What we picked

For PravoVector — pgvector. 1.2M chunks today, projected to 4M in 18 months, heavy metadata filters, a team already fluent in Postgres. The architecture leaves a migration path to Qdrant if volume crosses 5M or a sub-25 ms p95 use case emerges.

Benchmark setup

Embeddings: text-embedding-3-large, 3072 dimensions

Volume: 1.2M chunks, average 380 tokens per chunk

Queries: 5,000 real lawyer queries from the legacy system's logs

Hardware: 16 vCPU, 64 GB RAM, NVMe SSD; cloud options used the vendor's default plan

Metrics: p95 latency, recall@10, monthly cost at 10M queries

JOINs with metadata — the main pgvector argument

-- pgvector: semantic search + relational filters in one query SELECT c.id, c.text, d.case_no, d.judge_name, 1 - (c.embedding <=> $1) AS score FROM chunks c JOIN documents d ON d.id = c.document_id WHERE d.court_code = 'SPB_ARB' AND d.decision_date BETWEEN '2022-01-01' AND '2024-12-31' AND d.judge_name = 'Ivanov I.I.' AND d.articles @> ARRAY['CC-53.1'] ORDER BY c.embedding <=> $1 LIMIT 10;

When pgvector does not fit