Picking a vector database in 2026 is harder than it was two years ago, not easier. The field has matured, prices have re-shuffled three times in the last twelve months, and pgvector has quietly closed most of the performance gap that used to make dedicated engines an obvious win. So which of the big four β Pinecone, Qdrant, Weaviate, and pgvector β actually fits your retrieval-augmented-generation stack today?
I have skin in this question. Across three production AI products at Warung Digital Teknologi β ContentForge AI Studio (multi-tenant content generation with brand-voice retrieval), BizChat Revenue Assistant (sales-knowledge chatbot), and SmartExam AI Generator (question bank similarity) β I have shipped on three of these four databases and migrated between them twice. What follows is the comparison I wish I had eighteen months ago, with real prices, real numbers, and real tradeoffs.
The 30-second decision matrix
If you only read one section, read this one. Match your situation to the row that fits.
| Your situation | Pick | Why |
|---|---|---|
| Already running Postgres, < 10M vectors, latency tolerance > 50ms | pgvector | One database, one backup, one connection pool. Stop overthinking it. |
| 10Mβ100M vectors, want managed, hybrid search mandatory | Weaviate | Best-in-class BM25 + vector fusion, native multi-tenancy. |
| 10Mβ500M vectors, price-sensitive, comfortable with a VPS | Qdrant (self-hosted) | Best price-performance ratio of any engine I have benchmarked. |
| You want to write zero infra code, money is not the constraint | Pinecone Serverless | Truly zero-ops. Pay the premium and never look at a CPU graph. |
| Regulated workload, on-prem mandatory, BYO Kubernetes | Qdrant or Weaviate Hybrid | Both ship clean Helm charts and on-prem licenses. |
Now let me unpack the reasoning behind each row.
Pinecone Serverless: the iPhone of vector databases
Pinecone is the easiest database to ship on. There is no cluster to provision, no shard count to pick, no replication factor to argue about. You create an index, upsert vectors, query. The serverless tier auto-scales storage and compute on demand, and if you are building a side project that might never get traffic, it costs you almost nothing while it sits idle.
The catch is the unit economics at scale. Pinecone Serverless bills three dimensions: write units (~$0.0000004 each), read units (~$0.00000025 each), and storage (~$3.60/GB-month on the pay-as-you-go track, or $0.33/GB-month on the Standard plan with a $50/month minimum). A read unit is consumed per gigabyte of namespace touched per query, with a minimum of 0.25 RUs per query.
The math gets ugly fast. At 10M 1024-dim vectors with light traffic, you are around $70/month β fine. At 100M vectors with steady production traffic, the same workload commonly lands between $500 and $900/month before you negotiate. I confirmed this on a BizChat staging environment running a corpus of about 4M vectors with bursty queries: my August-2025 invoice was $84, and it scaled almost linearly when I doubled the corpus during a stress test.
What Pinecone gets right
- Zero operational surface. No version upgrades, no disk-resize tickets, no replication topology to debug at 2am. For a small team, this is genuinely worth a premium.
- Strong consistency by default. Upserts are visible immediately on the index they hit. You do not need to write retry-after-eventual-consistency logic.
- Hybrid search shipped. Pinecone added sparse-dense hybrid in 2024, and the API is reasonable. It is not as flexible as Weaviate's, but it works.
- Excellent SDKs. The Python and TypeScript clients are the cleanest in the category. Tooling like LangChain integrates with one line.
Where Pinecone hurts
- Cost ceiling. Beyond ~50M vectors with steady QPS, almost every other option is cheaper.
- Read-unit math is a footgun. A poorly-designed namespace strategy can balloon read units 5β10Γ. I have seen junior engineers ship a 1-namespace-per-tenant pattern and torch the budget in a week.
- No on-prem. If your compliance team says "data leaves AWS over my dead body," Pinecone is out unless you negotiate the BYOC Dedicated plan, which starts in Enterprise pricing territory.

Qdrant: the price-performance king
Qdrant is the database I quietly recommend most often when a team can spare two engineering hours a month for ops. It is written in Rust, ships as a single binary or a Helm chart, and the open-source edition is feature-complete β no stripped-down community-vs-enterprise nonsense.
The Qdrant Cloud free tier gives you 0.5 vCPU, 1GB RAM, and 4GB disk permanently. That is genuinely enough for prototypes up to about 200K small-dimension vectors. Paid Standard clusters are billed hourly on resources you provision (RAM, CPU, disk), not per query or per vector. A baseline production cluster runs $30β$200/month.
Self-hosted is where Qdrant becomes ridiculous. A $30/month Hetzner VPS with 16GB RAM handles 10M 768-dim vectors in memory comfortably. Add binary quantization (32Γ compression) and the same node serves 300M+ logical vectors with a small recall hit. I run ContentForge's similarity index on a single 8-vCPU 32GB VPS that costs $48/month and serves about 8M vectors with p95 latency under 25ms.
What Qdrant gets right
- Speed. Qdrant benchmarks at roughly 1,840 QPS on a 1M-vector workload, the highest of the four engines I have measured.
- Quantization story. Scalar, product, and binary quantization are first-class. Binary quantization in particular changed my cost model β a corpus that needed 64GB of RAM uncompressed fits in 4GB compressed with 95%+ recall.
- Filtering is fast. Hybrid filters (vector similarity + payload predicates) compile to a single planner pass, which matters a lot for multi-tenant SaaS.
- Honest pricing. You pay for the box. No surprise unit math.
Where Qdrant hurts
- You own the operations. Self-hosted means you upgrade Qdrant yourself, you watch disk usage, you handle snapshots. The cloud product is good but newer than Pinecone's, and the free tier capacity is genuinely tiny.
- Smaller ecosystem. Plenty of LangChain/LlamaIndex support, but third-party integrations (auth providers, observability vendors, BI tools) are thinner than Pinecone's.
- Hybrid search exists but is less polished. Qdrant added sparse vectors and BM25-style scoring, but Weaviate still has the cleanest hybrid API.
Weaviate: the hybrid-search specialist
Weaviate is the database to pick when keyword search and vector search both matter equally. Its hybrid query syntax is the cleanest in the category β you set an alpha parameter between 0 (pure BM25) and 1 (pure vector) and the engine fuses the rankings for you. No reciprocal-rank-fusion code to write, no reranker glue, no two-pass query plans.
Weaviate restructured its cloud pricing in October 2025. The new tiers as of 2026 are: a 14-day Sandbox (free, auto-expires, you cannot extend it), Flex (~$45/month minimum, shared GCP, 99.5% SLA), Plus (~$280/month annual, 99.9% SLA, SOC 2), and Premium (custom, BYOC, HIPAA-eligible). At 10M vectors, Flex lands around $135/month for typical RAG workloads, which is roughly 2Γ Pinecone at the same scale but you get hybrid search bundled in.
What Weaviate gets right
- Hybrid search is genuinely native. BM25 is built in with no extra storage cost for keyword indices. The alpha-fusion API is one line of code.
- Multi-tenancy is first-class. Tenants are isolated at the storage layer, not just logically. This matters for B2B SaaS β I migrated a multi-tenant feature off Pinecone to Weaviate specifically for this.
- GraphQL queries. Loved or hated, the GraphQL interface is genuinely useful when retrieval needs to traverse cross-references between objects.
- Strong RBAC and SOC 2. Plus tier ships RBAC, SSO, and audit logs. Pinecone matches this only at Enterprise.
Where Weaviate hurts
- Pricing is not the cheapest. If you do not need hybrid search, you are paying for a feature you will not use.
- The October 2025 pricing change moved the entry point from $25 to $45/month minimum. Existing customers are mostly grandfathered, but new teams feel the bump.
- Higher memory footprint. Weaviate's per-vector overhead is noticeably larger than Qdrant's, and the documentation is honest about this.
pgvector: the boring answer that keeps winning
pgvector is a Postgres extension. You install it, run CREATE EXTENSION vector, and you have vector search inside your existing database. No new service, no new SDK, no new auth model, no new backup story. For 80% of the RAG projects I see in the wild, this is the right answer and most teams overcomplicate it.
The 0.7 release (and the 0.8 follow-up shipped on Aurora in late 2025) closed most of the historical performance gap. Parallel HNSW index builds are up to 30Γ faster than 0.5.1, and combined with new compression are up to 67Γ faster end-to-end. On a single Postgres instance, pgvector now delivers roughly 5,000β15,000 QPS with HNSW for typical 1024-dim vectors at small-to-medium corpus sizes. That is the same neighborhood as the dedicated engines.
For SmartExam AI's question-similarity feature, I shipped pgvector on the existing Hostinger MySQL-Postgres setup. Corpus size: about 480K embeddings of 768 dimensions. p95 latency: 18ms. Marginal cost of adding vector search to the existing database: $0. Time spent on operations in the last six months: zero hours.
What pgvector gets right
- One database to operate. One backup. One connection pool. One auth model. This is enormous for small teams.
- Transactional consistency. Insert a row and its embedding atomically. Try doing that across Postgres + Pinecone without a saga pattern.
- SQL all the way down. Joins between vectors and your existing relational data are trivial. Filter by tenant_id, by status, by date range, by anything you have an index on.
- Cost. On a managed Postgres like RDS or Neon, you are typically $20β$60/month for a corpus that would cost $70β$200 on a dedicated vector DB.
Where pgvector hurts
- The 10M-vector ceiling is real. Beyond roughly 10M vectors of 1024 dimensions, you start fighting tuning. HNSW parameters need attention, vacuum becomes interesting, and the QPS gap to dedicated engines widens.
- No native hybrid search. You compose it manually with
tsvector+vector+ a fusion query. It works, but it is not one line. - HNSW index builds are still slower than Qdrant's. A 5M-vector index that takes Qdrant 4 minutes might take pgvector 12. Bearable, but worth knowing if you reindex often.
Pricing scenarios I actually quoted
Three real scenarios I have priced in the last six months, with rough monthly totals. Treat these as ballparks β your actual numbers shift with traffic patterns and dimension count.
| Scenario | Pinecone | Qdrant Cloud | Weaviate | pgvector (RDS) |
|---|---|---|---|---|
| 2M vectors, prototype traffic | $50 (min) | $30 | $45 (min) | $25 |
| 10M vectors, steady production | $70 | $65 | $135 | $45 |
| 100M vectors, heavy traffic | $700+ | $280 | $420 | ~$180 (with care) |
| Self-hosted on $48/mo VPS | N/A | $48 (10M+) | $48 (5M) | $48 (5M) |
Three caveats. First, Pinecone Standard's $50/month minimum dominates the small-scale row even though actual usage might be cheaper. Second, the 100M-vector pgvector number assumes you have already tuned HNSW parameters and partitioned the table β without that work, you are nearer $400. Third, self-hosting on a single VPS is fine for a side project but introduces a single point of failure that production-grade workloads do not tolerate.
Performance, with a grain of salt
Public benchmarks tell you something but not everything. Here is the rough QPS you should expect on a 1M-vector workload with HNSW and 768-dim vectors, on commodity hardware:
- Qdrant: ~1,840 QPS
- Pinecone Serverless: ~1,620 QPS
- Weaviate: ~1,500 QPS
- pgvector (HNSW): ~640 QPS at 1M, much higher at smaller corpora
What the benchmarks do not capture is filtered search performance, which is where Qdrant pulls further ahead and where pgvector benefits enormously from existing Postgres indexes on the metadata columns. In production, "vector search WHERE tenant_id = X AND status = 'active'" is the actual query pattern, and pgvector's planner often beats naive Pinecone namespace partitioning by a wide margin.
Hybrid search: the feature that most teams underestimate
Pure vector search is great until your users type a product SKU, an exact phone number, or a legal-document case ID. Embeddings are bad at exact matches. Hybrid search β combining keyword (BM25) and vector β fixes this.
- Weaviate: Native, one parameter, no extra storage cost. Best in class.
- Qdrant: Native via sparse vectors. Slightly more setup, equally fast at query time.
- Pinecone: Native via sparse-dense indexes. Works, slightly less ergonomic.
- pgvector: You compose it yourself with
to_tsvectorand a fusion query. Functional but homemade.
If hybrid search is a hard requirement for your product (it should be for any enterprise B2B retrieval), Weaviate is the cleanest path. Qdrant is a close second and cheaper. pgvector is workable but you are writing fusion logic yourself.
My picks for four common team profiles
Solo founder shipping an MVP. Use pgvector. You probably already have a Postgres. Pay $0 extra. Move on.
Five-person startup, post-PMF, 10M+ vectors. Self-hosted Qdrant on a 32GB VPS, plus a snapshot job. Spend the saved money on better embeddings instead.
Series-A SaaS with multi-tenancy and hybrid-search needs. Weaviate Plus. Pay the $280/month, get RBAC and SOC 2 out of the box, ship multi-tenant retrieval in days instead of weeks.
Enterprise team that wants to write zero infra code. Pinecone Serverless on an Enterprise contract. Negotiate the storage rate, set up read-unit alerts, and let your engineers ship features instead of tuning indexes.
FAQ
Is pgvector really production-ready in 2026? Yes. Supabase, Neon, Instacart, and many others run pgvector at meaningful scale. The 0.7+ releases shipped parallel index builds, better quantization, and tighter memory management. The "pgvector is a toy" narrative is two years out of date.
What about Chroma, Milvus, LanceDB, Vespa? Chroma is great for prototyping but I would not run it as a primary store in production. Milvus is excellent at very large scale (500M+ vectors) but operationally heavy. LanceDB is interesting for analytics workloads. Vespa is the right answer if you are Yahoo-scale; otherwise it is overkill.
Should I worry about vendor lock-in? A little, but less than you think. Embedding storage is portable β you keep the embeddings in S3 or your data warehouse, and the vector DB is essentially a search index you can rebuild. I migrated 4M vectors from Pinecone to Qdrant in a weekend, including reindexing. Plan for migration as a fact of life rather than something to prevent.
How much does the embedding model matter compared to the database? A lot more than the database, in my experience. Switching from text-embedding-ada-002 to text-embedding-3-large typically improves retrieval quality more than switching databases ever will. Pick the database for ergonomics and cost; pick the embedding model for quality.
What about latency from my application to the database? Often dominated by network. A self-hosted Qdrant in the same region as your app server beats a "faster" Pinecone three regions away. Co-locate.
The verdict
If I had to pick one database to ship every new RAG project on for the next twelve months without thinking, I would pick Qdrant self-hosted on a $48/month VPS. It is the highest performance per dollar, the operational burden is small, and the open-source edition has every feature a typical team needs.
If I had to pick a default for a team that wants zero ops and is fine paying for it, I would pick pgvector on a managed Postgres like Neon or Supabase. The "no new service" benefit is enormous and the performance is good enough for the corpus sizes 80% of teams actually have.
Pinecone is excellent but expensive. Weaviate is excellent if hybrid search is non-negotiable. Both have legitimate places in the market, and I run all four in production today depending on the workload.
What you should not do is pick the database first and figure out the rest later. Start from your actual constraints: how many vectors, what latency budget, how many tenants, how much hybrid-search exposure, how much money. Match those numbers to the table at the top of this article. The right answer is usually obvious once you have written down the numbers.