Postgres on NFS: The Shortcut That Made Odoo Unusable

Some infrastructure mistakes look reasonable until they ruin your afternoon.

For us, the mistake was putting PostgreSQL data on NFS.

Not the Odoo filestore. Not backups. The actual PostgreSQL data directory.

It sounded practical. We were running Odoo in Docker Swarm. We wanted services to survive node movement. We already had NFS working. The mental model was tempting: if containers can move, the data should be reachable from anywhere.

That shortcut made Odoo unusable.

The symptom: slow, not dead

The first clue was latency.

zczoft.com started behaving like a system under water. Requests that should take milliseconds took 90, 150, even 250 seconds. XML-RPC calls hung. /web/login was painfully slow. The service was alive, but using it felt impossible.

That kind of failure is hard to read. A crash gives you a direction. A 500 gives you a stack trace. Latency makes everything suspicious: Traefik, Odoo workers, Postgres, Swarm networking, recent module changes, the VPS itself.

But this was not a tuning problem.

It was storage.

The architecture we thought we had

The goal was resilience.

We had Docker Swarm, multiple nodes, and NFS. Shared volumes seemed like the obvious way to let services move around without losing state.

That idea is fine for some data. Odoo's filestore can live on shared storage if you accept the tradeoffs. Backups can live there. Generated artifacts too.

Postgres is different.

Postgres is not just writing files. It is coordinating WAL, fsync, checkpoints, locks, and durability guarantees. It expects the filesystem below it to be fast and predictable. Put that on a weak network filesystem and the database starts waiting on storage while the application above it suffers.

Our setup had two problems:

1. website_db had PGDATA on an NFS-backed volume.

2. Odoo and Postgres were allowed to live across Swarm nodes instead of being treated as one tightly coupled unit.

On paper, it looked more available.

In practice, it was slower and more fragile.

The evidence

PostgreSQL told the truth.

On a related Odoo stack, checkpoint timings were absurd: sync=92.673s, total time around 165.016s.

That is storage screaming.

After moving the database off NFS and onto local storage on the DB node, checkpoints dropped to roughly 0.03s.

Same class of app. Same type of infrastructure. Different storage choice.

That ended the debate.

The fix

We stopped pretending this was HA.

The fix was boring:

  • move PGDATA away from website_pgdata_nfs;
  • put Postgres data on local storage;
  • pin Odoo and Postgres to the same node;
  • run Odoo as one replica;
  • treat the database as stateful infrastructure, not as a container that can casually wander around the Swarm.

For zczoft.com, /web/login dropped to around 0.21s.

For the OK CarHire stack, login requests went down to about 0.25–0.36s after warm-up. Checkpoints went from minutes to milliseconds.

When the diagnosis is right, the machine stops arguing.

The lesson

The real mistake was not "using NFS." The mistake was believing we had built high availability when we had only built distributed fragility.

There is a big difference between a real HA database design and a single Postgres container writing to NFS so Swarm can reschedule it somewhere else.

More nodes do not automatically mean more reliability. Sometimes they just create more places for latency, routing, and filesystem assumptions to hurt you.

An honest single-node setup would have been better:

  • Odoo on one node;
  • Postgres on the same node;
  • local disk for active database data;
  • clear backups;
  • a tested restore path;
  • no fantasy that Swarm rescheduling equals database HA.

That is less impressive in a diagram.

It is much better at 2 a.m.

The rule we kept

Do not put active Postgres data for Odoo on NFS.

Not because NFS is evil. Because Postgres is the wrong workload for that shortcut in this kind of infrastructure.

If the system is small enough that NFS looks like the easy database mobility layer, it is probably small enough to run honestly on one node with local disk and good backups.

If the system truly needs HA, it deserves a real HA design.

The middle ground is where pain lives.