gbrain-local: Why I Ditched Cloud Embeddings for a 50ms SQLite Memory

I replaced cloud embeddings and vector databases with a local SQLite-based memory layer for AI bots. The result was faster, cheaper, and much simpler than the usual stack.

The problem: memory was eating my budget

I run AI bots. Support bots, content bots, automation bots. They all need memory.

For a while I used the standard setup: cloud vector database plus embedding API. Cognee, Pinecone, OpenAI embeddings, the usual stack people reach for when they hear “RAG” or “AI memory.”

It worked, but it was annoying in exactly the ways that matter in production. The bill kept growing, and every lookup had network latency attached to it. A memory read that should feel instant became an extra hop to someone else’s infrastructure.

That makes sense if you’re indexing millions of documents for a search company. It makes less sense if you just want your bots to remember what happened five minutes ago, find the right note, and move on.

The realization: I was overbuilding the whole thing

Most bot memory doesn’t need a fancy distributed vector setup.

It needs a few simple things:

  • fast lookup of recent conversation history
  • good text search across a modest dataset
  • metadata filters by user, date, or project
  • predictable local performance

That’s not a hard problem. We just keep pretending it is because the AI tooling market loves expensive solutions.

Enter gbrain-local

I built gbrain-local as a local memory plugin for AI bots. It uses SQLite for storage, FTS5 for full-text search, and optional local embeddings when semantic search actually helps.

That means no cloud dependency for the memory layer, no vector DB invoice, and no waiting on remote infrastructure just to retrieve a note your bot already owns.

It runs on the same machine as the bots. That alone cuts out a lot of nonsense.

The numbers

Metric Cloud stack gbrain-local
Setup time ~2 hours ~10 minutes
Monthly cost Recurring API + DB cost €0 extra infra
Average query latency Hundreds of ms ~50 ms
Offline capable No Yes

The best part is not even the speed. It’s the predictability. There’s no mystery latency because some provider is having a bad afternoon.

How it works

The architecture is boring on purpose.

gbrain-local architecture diagram showing AI Bot, gbrain-api, and SQLite plus FTS5

The bot talks to a small API layer. That API reads and writes to SQLite. FTS5 handles text search. If embeddings are enabled, they’re used as a second pass, not as the whole system.

That detail matters. Exact matches and filtered search solve more real problems than people admit. Semantic search is useful, but it doesn’t need to sit in the center of everything.

Why SQLite was enough

SQLite has a branding problem. People hear “SQLite” and think toy database, local script, prototype. That’s nonsense.

SQLite is production software. It’s in browsers, phones, embedded systems, and a ridiculous number of apps people trust every day. For a local AI memory layer, it’s often the right answer.

You get one file, simple backups, fast local reads, no extra service to babysit, and fewer moving parts to break. That’s a good trade.

When this is the wrong tool

gbrain-local is not trying to replace every vector database on earth.

If you need distributed search across huge datasets, or you’re building a system where semantic retrieval across millions of documents is the product, use the heavier tools.

But if you’re building AI bots that need memory and retrieval in the normal world, a simpler stack will usually beat the sexy one.

What I got out of it

I got lower latency, no extra memory bill, easier debugging, and a setup I can run anywhere. Local machine, VPS, client server, offline demo, whatever.

More importantly, I got rid of an unnecessary dependency. That’s always worth something.

There’s a pattern here I keep seeing in AI infrastructure: people start with the architecture they think sounds advanced, then spend weeks trying to make it tolerable. Usually the better move is to start with the simplest thing that can survive contact with production.

For bot memory, that turned out to be SQLite.


If you’re building AI bots and your memory layer already feels heavier than the bots themselves, you probably overdid it.