Smara vs Building Your Own Memory with pgvector
The DIY Temptation
"Why not just use pgvector?"
It's a fair question. pgvector is excellent—open source, battle-tested, and it lives right inside Postgres where your data already is. In fact, Smara uses pgvector internally. We're fans.
But here's the thing: storing vectors is the easy part. You can get pgvector running in an afternoon. What takes months is everything that happens after the vectors are stored—the retrieval pipeline that turns raw similarity search into accurate, useful memory.
This post is an honest comparison. We'll walk through what a production-grade memory system requires, show where the accuracy gap lives, and help you decide whether to build or buy.
What You'd Actually Build
Let's say you want to give your AI agent persistent memory. You start with pgvector. Here's the full list of what you'll end up building:
- Embedding generation — Pick a provider (OpenAI, Cohere, local model), handle rate limits, manage API keys, batch for throughput, cache to avoid re-embedding
- Vector storage — pgvector table schema, HNSW vs IVFFlat indexes, dimension sizing, distance metrics, vacuum tuning
- Keyword search — BM25 for lexical matching. Now you need
pg_trgm,tsvectorcolumns, or a separate search engine like Elasticsearch - Hybrid ranking — Reciprocal Rank Fusion to merge vector and BM25 results. Two result sets, different score distributions, one unified ranking
- Query decomposition — Multi-query rewriting so you find facts the original query misses. "What does Alice like?" should also search for preferences, favorites, hobbies
- Contradiction detection — Cosine similarity bands for deduplication and conflict resolution. When Alice switches from vegetarian to vegan, the old fact should retire
- Memory decay — Ebbinghaus scoring so stale facts don't pollute your context window. A preference from six months ago shouldn't outrank one from yesterday
- Multi-user and multi-agent scoping — Namespace isolation, access control, team-level vs user-level memories
- Auth, rate limiting, API design — API keys, usage quotas, REST endpoints, error handling, pagination
- MCP server for IDE integration — So Claude Code, Cursor, and other tools can use memory natively
Items 1–2 take a day. Items 3–7 take weeks. Items 8–10 take more weeks. And then you maintain all of it forever.
What most people think they're building
CREATE EXTENSION vector;
CREATE TABLE memories (
id uuid PRIMARY KEY,
embedding vector(1536),
content text
);
"That's it, right?"
What production memory actually needs
Embeddings + BM25 + RRF hybrid
5-pass query decomposition
Contradiction detection bands
Ebbinghaus decay scoring
Temporal + graph enrichment
Multi-user auth + MCP server
~3,000 lines of retrieval logic.
The Benchmark Gap
Theory is one thing. Numbers are another. We benchmarked these approaches against LoCoMo, an academic benchmark for long-conversation memory retrieval. LoCoMo tests whether your system can answer questions about month-old conversations—the kind of thing real users expect from AI memory.
| Approach | Cat1 (Single-hop) | Cat3 (Open-domain) | Overall |
|---|---|---|---|
| Raw pgvector search | 54.6% | 30.2% | ~45% |
| pgvector + BM25 hybrid | 72.1% | 58.4% | ~68% |
| Smara full pipeline | 95.8% | 86.3% | 92.2% |
The gap between "I have vectors" and "I can answer questions about month-old conversations" is 40+ percentage points.
Where does the gap come from?
- Vector-only misses lexical matches. pgvector finds semantically similar text, but misses exact name matches, dates, and specific terms that BM25 catches.
- Single-query misses related facts. "What are Alice's hobbies?" won't find "Alice started rock climbing last month" without query decomposition.
- No decay means stale context. Without Ebbinghaus scoring, six-month-old facts compete equally with yesterday's facts. The LLM gets confused.
- No contradiction handling means wrong answers. If "Alice is vegetarian" and "Alice is vegan" both rank highly, the LLM picks one at random.
Each layer of the Smara pipeline adds 10–15 percentage points. Combined, they close the gap from 45% to 92%.
Cost Comparison
| Option | Hosting Cost | Build Time | Maintenance | Accuracy |
|---|---|---|---|---|
| DIY pgvector | $0 (if you have Postgres) | 4–8 engineer-weeks | Ongoing | ~45–68% |
| Smara self-hosted | $0 (MIT, Docker, your Postgres) | 30 minutes | Pull new images | 92.2% |
| Smara hosted | $0 free tier (10k memories) $19/mo Developer |
5 minutes | None | 92.2% |
The real cost of DIY isn't the Postgres bill. It's the engineer-weeks of building, testing, and iterating on retrieval logic—and the ongoing maintenance as you discover edge cases in production. At $150/hour for a senior engineer, 6 weeks of build time is $36,000 before you've stored a single memory.
Self-hosting Smara gives you the full pipeline for the cost of a docker compose up. It runs against your existing Postgres, uses your own embedding provider, and the code is MIT-licensed.
When DIY Makes Sense
We'd be dishonest if we said DIY is never the right call. Here's when it makes sense:
- Highly custom retrieval logic. If your domain has unique data patterns—time-series sensor data, legal document chains, medical records—you may need retrieval strategies that don't exist in any off-the-shelf system.
- Complete control of the embedding pipeline. If you're fine-tuning your own embedding model or need to run everything on-premise with no external API calls, you'll want to own every layer.
- Learning exercise. Building a retrieval system from scratch is one of the best ways to understand how RAG actually works. If that's the goal, build it yourself.
- You only need vector search. If your use case is simple similarity lookup—no temporal reasoning, no contradiction handling, no multi-query—then pgvector alone may be enough.
When Smara Makes Sense
Smara is the better choice when you want to ship memory, not build a retrieval engine:
- You want 92% benchmark accuracy out of the box. The full 5-pass decomposition, hybrid search, decay scoring, and contradiction detection pipeline—tested and tuned.
- You don't want to build and maintain a retrieval pipeline. Retrieval is an ongoing R&D problem. Let someone else do the research.
- You need MCP integration for IDE tools. Claude Code, Cursor, and other MCP-compatible tools get memory with a single server config.
- You want memory decay, contradiction detection, and graph memory included. These aren't add-ons. They're core to how Smara works.
- You want to self-host but not build from scratch. MIT license, Docker image, your own Postgres. Full control without the build cost.
DIY pgvector
Full control over every layer
No external dependencies
Custom retrieval strategies
Weeks to build, ongoing maintenance
45–68% accuracy without pipeline work
You own every bug and edge case
Smara (hosted or self-hosted)
92.2% accuracy, production-ready
30 min setup (self-host) or 5 min (hosted)
MCP, decay, graph, contradictions built in
MIT licensed, self-host with Docker
Free tier: 10,000 memories
Pipeline updates without your R&D time
Conclusion
pgvector is a great tool. We use it ourselves. But vector storage is the foundation, not the building. The retrieval pipeline on top—hybrid search, query decomposition, decay scoring, contradiction detection—is where memory quality actually lives.
If you need custom retrieval for a specialized domain, build it. If you want to ship production-grade memory this week, start with Smara and iterate from there. You can always eject later—the data is yours, the code is MIT, and the Postgres is the same one you'd use for DIY anyway.
Ship memory in minutes, not months. Start free or self-host with Docker.
Try Smara Free → Self-host with Docker →Related Posts
Benchmark
LoCoMo: 92.2% on Long-Conversation Memory
How Smara's retrieval pipeline scores on the academic benchmark.
Deep Dive
How Ebbinghaus Decay Curves Make AI Memory Useful
Why flat retrieval fails and how decay scoring fixes it.
Comparison
Smara vs Mem0
Head-to-head comparison of architecture, pricing, and DX.