I replaced cloud embeddings and vector databases with a local SQLite-based memory layer for AI bots. The result was faster, cheaper, and much simpler than the usual stack.
The problem: memory was eating my budget
I run AI bots. Support bots, content bots, automation bots. They all need memory.
For a while I used the standard setup: cloud vector database plus embedding API. Cognee, Pinecone, OpenAI embeddings, the usual stack people reach for when they hear “RAG” or “AI memory.”
It worked, but it was annoying in exactly the ways that matter in production. The bill kept growing, and every lookup had network latency attached to it. A memory read that should feel instant became an extra hop to someone else’s infrastructure.
That makes sense if you’re indexing millions of documents for a search company. It makes less sense if you just want your bots to remember what happened five minutes ago, find the right note, and move on.
The realization: I was overbuilding the whole thing
Most bot memory doesn’t need a fancy distributed vector setup.
It needs a few simple things:
- fast lookup of recent conversation history
- good text search across a modest dataset
- metadata filters by user, date, or project
- predictable local performance
That’s not a hard problem. We just keep pretending it is because the AI tooling market loves expensive solutions.
Enter gbrain-local
I built gbrain-local as a local memory plugin for AI bots. It uses SQLite for storage, FTS5 for full-text search, and optional local embeddings when semantic search actually helps.
That means no cloud dependency for the memory layer, no vector DB invoice, and no waiting on remote infrastructure just to retrieve a note your bot already owns.
It runs on the same machine as the bots. That alone cuts out a lot of nonsense.
The numbers
| Metric | Cloud stack | gbrain-local |
|---|---|---|
| Setup time | ~2 hours | ~10 minutes |
| Monthly cost | Recurring API + DB cost | €0 extra infra |
| Average query latency | Hundreds of ms | ~50 ms |
| Offline capable | No | Yes |
The best part is not even the speed. It’s the predictability. There’s no mystery latency because some provider is having a bad afternoon.
How it works
The architecture is boring on purpose.

The bot talks to a small API layer. That API reads and writes to SQLite. FTS5 handles text search. If embeddings are enabled, they’re used as a second pass, not as the whole system.
That detail matters. Exact matches and filtered search solve more real problems than people admit. Semantic search is useful, but it doesn’t need to sit in the center of everything.
Why SQLite was enough
SQLite has a branding problem. People hear “SQLite” and think toy database, local script, prototype. That’s nonsense.
SQLite is production software. It’s in browsers, phones, embedded systems, and a ridiculous number of apps people trust every day. For a local AI memory layer, it’s often the right answer.
You get one file, simple backups, fast local reads, no extra service to babysit, and fewer moving parts to break. That’s a good trade.
When this is the wrong tool
gbrain-local is not trying to replace every vector database on earth.
If you need distributed search across huge datasets, or you’re building a system where semantic retrieval across millions of documents is the product, use the heavier tools.
But if you’re building AI bots that need memory and retrieval in the normal world, a simpler stack will usually beat the sexy one.
What I got out of it
I got lower latency, no extra memory bill, easier debugging, and a setup I can run anywhere. Local machine, VPS, client server, offline demo, whatever.
More importantly, I got rid of an unnecessary dependency. That’s always worth something.
There’s a pattern here I keep seeing in AI infrastructure: people start with the architecture they think sounds advanced, then spend weeks trying to make it tolerable. Usually the better move is to start with the simplest thing that can survive contact with production.
For bot memory, that turned out to be SQLite.
If you’re building AI bots and your memory layer already feels heavier than the bots themselves, you probably overdid it.