How to Give Gemini Persistent Memory with Smara
You tell Gemini your project uses PostgreSQL with pgvector. Next conversation, it suggests MongoDB. You explain your API uses Bearer token auth. Tomorrow, it generates code with API keys. Every session starts from zero.
Claude Code and Cursor solve this with MCP (Model Context Protocol)—a direct integration layer. But Gemini doesn't support MCP. You need a different approach: a REST API that sits between your application and Gemini, giving it persistent memory across sessions.
The Problem: Gemini Forgets Everything
Gemini's context window is generous—up to 1 million tokens in Gemini 1.5 Pro. But that context only lives for a single session. There's no built-in way to carry knowledge from one conversation to the next.
This means every integration you build with Gemini has the same limitation: no continuity. Your chatbot forgets users. Your coding assistant forgets your codebase conventions. Your agent forgets what it learned yesterday.
Without Memory
Session 1: "We use PostgreSQL + pgvector"
Session 2: "I'd recommend MongoDB for that..."
Session 3: "Have you considered Firebase?"
Every session starts from scratch.
With Smara Memory
Session 1: "We use PostgreSQL + pgvector"
Session 2: "Since you're on pgvector..."
Session 3: "Your existing pg setup supports..."
Context carries across every session.
How It Works
Smara exposes a REST API that any tool or language can call. No MCP required. For Gemini, there are three integration paths depending on your setup:
| Method | Best For | Complexity |
|---|---|---|
| Direct REST API | Any Python/JS app using Gemini | Simplest |
| Gemini Function Calling | Letting Gemini decide when to store/recall | Moderate |
| Google ADK | Multi-agent systems built on ADK | Moderate |
All three methods use the same Smara endpoints. The difference is where the memory logic lives—in your application code, in Gemini's function calling, or in an ADK tool definition.
Method 1: Direct REST API (Simplest)
The most straightforward approach. Your application stores memories after each conversation turn and retrieves context before each new prompt. No special Gemini features required.
import requests
import google.generativeai as genai
SMARA_KEY = "smara_..."
SMARA_URL = "https://api.smara.io/v1"
HEADERS = {"Authorization": f"Bearer {SMARA_KEY}"}
# Step 1: Store a memory after a conversation
def store_memory(user_id, fact, importance=0.7):
requests.post(f"{SMARA_URL}/memories", headers=HEADERS, json={
"user_id": user_id,
"fact": fact,
"importance": importance,
"source": "gemini"
})
# Step 2: Get context before a new prompt
def get_context(user_id, query, top_n=5):
resp = requests.get(
f"{SMARA_URL}/users/{user_id}/context",
headers=HEADERS,
params={"q": query, "top_n": top_n}
)
return resp.json()["context"]
# Step 3: Use it with Gemini
model = genai.GenerativeModel("gemini-1.5-pro")
user_id = "user_123"
user_msg = "What database should I use for the search feature?"
# Inject remembered context into the system prompt
memory_context = get_context(user_id, user_msg)
response = model.generate_content([
f"You are a helpful assistant. Here is what you know about this user:\n{memory_context}",
user_msg
])
# Store any new facts from this conversation
store_memory(user_id, "User is building a search feature", importance=0.7)
store_memory(user_id, "Project uses PostgreSQL with pgvector", importance=0.9)
That's the entire integration. Before each Gemini call, pull context from Smara. After each conversation, store anything worth remembering. The source: "gemini" tag lets you filter memories by origin later.
Method 2: Gemini Function Calling
Function calling lets Gemini decide for itself when to store or retrieve memories. You declare Smara's endpoints as callable tools, and Gemini invokes them as part of its reasoning.
import google.generativeai as genai
import requests
genai.configure(api_key="your-gemini-key")
SMARA_KEY = "smara_..."
SMARA_URL = "https://api.smara.io/v1"
HEADERS = {"Authorization": f"Bearer {SMARA_KEY}"}
# Define Smara as function calling tools
tools = [
genai.protos.Tool(function_declarations=[
genai.protos.FunctionDeclaration(
name="store_memory",
description="Store a fact about the user for future sessions. Call this when the user shares preferences, decisions, or important context.",
parameters=genai.protos.Schema(
type=genai.protos.Type.OBJECT,
properties={
"fact": genai.protos.Schema(type=genai.protos.Type.STRING, description="The fact to remember"),
"importance": genai.protos.Schema(type=genai.protos.Type.NUMBER, description="0.0-1.0, how important this is to remember"),
},
required=["fact"]
)
),
genai.protos.FunctionDeclaration(
name="search_memory",
description="Search stored memories about the user. Call this at the start of a conversation or when you need context.",
parameters=genai.protos.Schema(
type=genai.protos.Type.OBJECT,
properties={
"query": genai.protos.Schema(type=genai.protos.Type.STRING, description="What to search for"),
},
required=["query"]
)
),
])
]
Now wire up the function handlers to Smara's API:
def handle_function_call(fn_call, user_id):
if fn_call.name == "store_memory":
args = dict(fn_call.args)
resp = requests.post(f"{SMARA_URL}/memories", headers=HEADERS, json={
"user_id": user_id,
"fact": args["fact"],
"importance": args.get("importance", 0.7),
"source": "gemini"
})
return {"status": "stored", "id": resp.json()["id"]}
elif fn_call.name == "search_memory":
args = dict(fn_call.args)
resp = requests.get(
f"{SMARA_URL}/memories/search", headers=HEADERS,
params={"user_id": user_id, "q": args["query"], "top_n": 5}
)
return resp.json()
# Run a conversation with memory-aware Gemini
model = genai.GenerativeModel("gemini-1.5-pro", tools=tools)
chat = model.start_chat()
user_id = "user_123"
response = chat.send_message("What database am I using for this project?")
# Handle any function calls Gemini makes
while response.candidates[0].content.parts[0].function_call:
fn_call = response.candidates[0].content.parts[0].function_call
result = handle_function_call(fn_call, user_id)
response = chat.send_message(
genai.protos.Content(parts=[
genai.protos.Part(function_response=genai.protos.FunctionResponse(
name=fn_call.name, response={"result": result}
))
])
)
print(response.text)
With this setup, Gemini will automatically call search_memory when it needs context and store_memory when the user shares something worth remembering. You don't need to decide when to store—Gemini does.
Method 3: Google ADK Integration
If you're building multi-agent systems with Google's Agent Development Kit, you can add Smara as a tool that any agent in your pipeline can use.
from google.adk.agents import Agent
from google.adk.tools import FunctionTool
import requests
SMARA_KEY = "smara_..."
SMARA_URL = "https://api.smara.io/v1"
HEADERS = {"Authorization": f"Bearer {SMARA_KEY}"}
def remember(user_id: str, fact: str, importance: float = 0.7) -> dict:
"""Store a fact about the user for future recall."""
resp = requests.post(f"{SMARA_URL}/memories", headers=HEADERS, json={
"user_id": user_id, "fact": fact,
"importance": importance, "source": "gemini-adk"
})
return resp.json()
def recall(user_id: str, query: str) -> dict:
"""Search stored memories about the user."""
resp = requests.get(f"{SMARA_URL}/memories/search", headers=HEADERS,
params={"user_id": user_id, "q": query, "top_n": 5})
return resp.json()
# Create an ADK agent with Smara memory tools
agent = Agent(
name="memory_agent",
model="gemini-1.5-pro",
instruction="You are a helpful assistant with persistent memory. Use recall() at the start of conversations and remember() when users share important facts.",
tools=[FunctionTool(remember), FunctionTool(recall)]
)
The ADK approach is cleanest when you already have an agent pipeline. Each agent in your system can share the same Smara memory layer, so your research agent and your coding agent both remember the same user context.
Memory Decay in Practice
A common concern with persistent memory: what happens when old facts become irrelevant? If you told Gemini you were using React six months ago but switched to Svelte last week, you don't want the old React memories polluting your context.
Smara handles this automatically with Ebbinghaus decay scoring. Every memory has a decay score that drops over time based on its importance. When you search memories, results are ranked by a blend of semantic similarity (70%) and temporal freshness (30%).
Recent, important memories surface first. Old, low-importance memories fade naturally. If you store a contradicting fact (e.g., "User switched to Svelte"), Smara detects the contradiction and soft-deletes the old memory automatically.
For a deep dive into the math and implementation, see How Ebbinghaus Decay Curves Make AI Memory Actually Useful.
What Gets Stored
The power of persistent memory depends on what you store. Here's what works well for Gemini integrations:
| Category | Examples | Importance |
|---|---|---|
| Architecture decisions | "Project uses PostgreSQL + pgvector", "Deployed on Railway" | 0.9 |
| User preferences | "Prefers functional style", "Uses TypeScript strict mode" | 0.8 |
| API patterns | "Auth uses Bearer tokens", "REST not GraphQL" | 0.8 |
| Current projects | "Building a search feature", "Migrating to v2" | 0.7 |
| Conventions | "snake_case for DB columns", "camelCase for JS" | 0.7 |
| Temporary context | "Debugging a CORS issue", "Testing on staging" | 0.3 |
A good rule of thumb: if you'd be annoyed re-explaining something to Gemini, it should be stored with importance ≥ 0.7. If it's a passing detail, use 0.3 or lower and let it decay.
Conclusion
Gemini is powerful, but without persistent memory it treats every conversation as the first one. Smara's REST API bridges that gap—no MCP required, no vendor lock-in, just HTTP calls that give Gemini a memory that spans sessions.
The direct REST approach works in any language. Function calling lets Gemini manage its own memory. Google ADK gives you memory as a shared tool across agents. Pick the method that fits your architecture and your Gemini integration goes from stateless to stateful in under 50 lines of code.