By Sri Panchavati · May 2026 7 min read

Smara vs Building Your Own Memory with pgvector

The DIY Temptation

"Why not just use pgvector?"

It's a fair question. pgvector is excellent—open source, battle-tested, and it lives right inside Postgres where your data already is. In fact, Smara uses pgvector internally. We're fans.

But here's the thing: storing vectors is the easy part. You can get pgvector running in an afternoon. What takes months is everything that happens after the vectors are stored—the retrieval pipeline that turns raw similarity search into accurate, useful memory.

This post is an honest comparison. We'll walk through what a production-grade memory system requires, show where the accuracy gap lives, and help you decide whether to build or buy.

What You'd Actually Build

Let's say you want to give your AI agent persistent memory. You start with pgvector. Here's the full list of what you'll end up building:

Embedding generation — Pick a provider (OpenAI, Cohere, local model), handle rate limits, manage API keys, batch for throughput, cache to avoid re-embedding
Vector storage — pgvector table schema, HNSW vs IVFFlat indexes, dimension sizing, distance metrics, vacuum tuning
Keyword search — BM25 for lexical matching. Now you need pg_trgm, tsvector columns, or a separate search engine like Elasticsearch
Hybrid ranking — Reciprocal Rank Fusion to merge vector and BM25 results. Two result sets, different score distributions, one unified ranking
Query decomposition — Multi-query rewriting so you find facts the original query misses. "What does Alice like?" should also search for preferences, favorites, hobbies
Contradiction detection — Cosine similarity bands for deduplication and conflict resolution. When Alice switches from vegetarian to vegan, the old fact should retire
Memory decay — Ebbinghaus scoring so stale facts don't pollute your context window. A preference from six months ago shouldn't outrank one from yesterday
Multi-user and multi-agent scoping — Namespace isolation, access control, team-level vs user-level memories
Auth, rate limiting, API design — API keys, usage quotas, REST endpoints, error handling, pagination
MCP server for IDE integration — So Claude Code, Cursor, and other tools can use memory natively

Items 1–2 take a day. Items 3–7 take weeks. Items 8–10 take more weeks. And then you maintain all of it forever.

What most people think they're building

CREATE EXTENSION vector;
CREATE TABLE memories (
  id uuid PRIMARY KEY,
  embedding vector(1536),
  content text
);

"That's it, right?"

What production memory actually needs

Embeddings + BM25 + RRF hybrid
5-pass query decomposition
Contradiction detection bands
Ebbinghaus decay scoring
Temporal + graph enrichment
Multi-user auth + MCP server

~3,000 lines of retrieval logic.

The Benchmark Gap

Theory is one thing. Numbers are another. We benchmarked these approaches against LoCoMo, an academic benchmark for long-conversation memory retrieval. LoCoMo tests whether your system can answer questions about month-old conversations—the kind of thing real users expect from AI memory.

Approach	Cat1 (Single-hop)	Cat3 (Open-domain)	Overall
Raw pgvector search	54.6%	30.2%	~45%
pgvector + BM25 hybrid	72.1%	58.4%	~68%
Smara full pipeline	95.8%	86.3%	92.2%

The gap between "I have vectors" and "I can answer questions about month-old conversations" is 40+ percentage points.

Where does the gap come from?

Vector-only misses lexical matches. pgvector finds semantically similar text, but misses exact name matches, dates, and specific terms that BM25 catches.
Single-query misses related facts. "What are Alice's hobbies?" won't find "Alice started rock climbing last month" without query decomposition.
No decay means stale context. Without Ebbinghaus scoring, six-month-old facts compete equally with yesterday's facts. The LLM gets confused.
No contradiction handling means wrong answers. If "Alice is vegetarian" and "Alice is vegan" both rank highly, the LLM picks one at random.

Each layer of the Smara pipeline adds 10–15 percentage points. Combined, they close the gap from 45% to 92%.

Cost Comparison

Option	Hosting Cost	Build Time	Maintenance	Accuracy
DIY pgvector	$0 (if you have Postgres)	4–8 engineer-weeks	Ongoing	~45–68%
Smara self-hosted	$0 (MIT, Docker, your Postgres)	30 minutes	Pull new images	92.2%
Smara hosted	$0 free tier (10k memories) $19/mo Developer	5 minutes	None	92.2%

The real cost of DIY isn't the Postgres bill. It's the engineer-weeks of building, testing, and iterating on retrieval logic—and the ongoing maintenance as you discover edge cases in production. At $150/hour for a senior engineer, 6 weeks of build time is $36,000 before you've stored a single memory.

Self-hosting Smara gives you the full pipeline for the cost of a docker compose up. It runs against your existing Postgres, uses your own embedding provider, and the code is MIT-licensed.

When DIY Makes Sense

We'd be dishonest if we said DIY is never the right call. Here's when it makes sense:

Highly custom retrieval logic. If your domain has unique data patterns—time-series sensor data, legal document chains, medical records—you may need retrieval strategies that don't exist in any off-the-shelf system.
Complete control of the embedding pipeline. If you're fine-tuning your own embedding model or need to run everything on-premise with no external API calls, you'll want to own every layer.
Learning exercise. Building a retrieval system from scratch is one of the best ways to understand how RAG actually works. If that's the goal, build it yourself.
You only need vector search. If your use case is simple similarity lookup—no temporal reasoning, no contradiction handling, no multi-query—then pgvector alone may be enough.

When Smara Makes Sense

Smara is the better choice when you want to ship memory, not build a retrieval engine:

You want 92% benchmark accuracy out of the box. The full 5-pass decomposition, hybrid search, decay scoring, and contradiction detection pipeline—tested and tuned.
You don't want to build and maintain a retrieval pipeline. Retrieval is an ongoing R&D problem. Let someone else do the research.
You need MCP integration for IDE tools. Claude Code, Cursor, and other MCP-compatible tools get memory with a single server config.
You want memory decay, contradiction detection, and graph memory included. These aren't add-ons. They're core to how Smara works.
You want to self-host but not build from scratch. MIT license, Docker image, your own Postgres. Full control without the build cost.

DIY pgvector

Full control over every layer
No external dependencies
Custom retrieval strategies

Weeks to build, ongoing maintenance
45–68% accuracy without pipeline work
You own every bug and edge case

Smara (hosted or self-hosted)

92.2% accuracy, production-ready
30 min setup (self-host) or 5 min (hosted)
MCP, decay, graph, contradictions built in

MIT licensed, self-host with Docker
Free tier: 10,000 memories
Pipeline updates without your R&D time

Conclusion

pgvector is a great tool. We use it ourselves. But vector storage is the foundation, not the building. The retrieval pipeline on top—hybrid search, query decomposition, decay scoring, contradiction detection—is where memory quality actually lives.

If you need custom retrieval for a specialized domain, build it. If you want to ship production-grade memory this week, start with Smara and iterate from there. You can always eject later—the data is yours, the code is MIT, and the Postgres is the same one you'd use for DIY anyway.

Ship memory in minutes, not months. Start free or self-host with Docker.

Try Smara Free → Self-host with Docker →

Benchmark

LoCoMo: 92.2% on Long-Conversation Memory

How Smara's retrieval pipeline scores on the academic benchmark.

Deep Dive

How Ebbinghaus Decay Curves Make AI Memory Useful

Why flat retrieval fails and how decay scoring fixes it.

Comparison

Smara vs Mem0

Head-to-head comparison of architecture, pricing, and DX.