From 5-second page loads to sub-second responses — a real-world debugging story.
The problem
An Odoo instance running on Docker Swarm was painfully slow. Page loads took 5+ seconds. The login page took almost 25 seconds on a cold start. Something was clearly wrong, but the server had 29 GB of RAM, a modern CPU, and only a handful of users. It should have been flying.
We diagnosed and fixed it with no rearchitecting, no code changes — just proper infrastructure tuning.
Step 1: identify the bottleneck
First, we checked the basics:
# Load averages
cat /proc/loadavg
# 3.13 2.78 2.62 ← High for a lightly-used server
# Memory
free -h
# 6.3 GB used / 29 GB total — Where's the RAM going?
# Top CPU consumers
ps aux --sort=-%cpu | head -15
The output showed 23 Playwright/Chromium processes consuming 57.7% CPU and over 4 GB of RAM. These were zombie renderer tabs from an OpenClaw instance that had been running since February — nearly 3 months of accumulated browser processes that were never cleaned up.
One single process (PID 2509720, running since March 1st) had accumulated 63,183 minutes of CPU time. That's 43 days of CPU consumed by a single zombie tab.
Step 2: kill the zombies
pkill -9 -f 'ms-playwright/chromium'
Instantly freed 4 GB of RAM and the load dropped from 3.13 to under 1.0. But Odoo was still slow — login took 5 seconds. The zombies were part of the problem, but not the root cause.
Step 3: PostgreSQL — the real culprit
With Chrome gone, we looked at the database:
-- 47 connections, 41 of them idle
SELECT count(*) as total FROM pg_stat_activity;
-- total: 47
-- Connections idle for over an hour
SELECT pid, now() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - query_start) > interval '5 minutes'
AND state = 'idle';
-- 41 rows returned
Then we checked the PostgreSQL configuration:
SHOW shared_buffers; -- 128MB ← On a 29GB server!
SHOW work_mem; -- 4MB ← Way too low for Odoo
SHOW effective_cache_size;-- 4GB ← Could be higher
SHOW random_page_cost; -- 4.0 ← Default for mechanical disks
shared_buffers = 128MB was the smoking gun. PostgreSQL was caching almost nothing in memory, hitting the mechanical disk for every query. On a server with 29 GB of RAM, this is like riding a bicycle on a highway.
We also found dead tuples accumulating because autovacuum wasn't aggressive enough:
-- Tables with high dead tuple percentages
SELECT relname, n_live_tup, n_dead_tup,
CASE WHEN n_live_tup > 0
THEN round(100.0 * n_dead_tup / n_live_tup, 1)
END as dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000;
-- stock_move_line: 17.1% dead tuples
-- stock_move: 16.3% dead tuples
-- account_move_line: 16.1% dead tuples
Step 4: the fix
We applied the changes via ALTER SYSTEM (no config file editing needed):
-- Memory: use the RAM you have
ALTER SYSTEM SET shared_buffers = '2GB';
ALTER SYSTEM SET effective_cache_size = '6GB';
ALTER SYSTEM SET work_mem = '64MB';
ALTER SYSTEM SET maintenance_work_mem = '512MB';
ALTER SYSTEM SET wal_buffers = '16MB';
-- WAL: reduce checkpoint pressure
ALTER SYSTEM SET max_wal_size = '2GB';
ALTER SYSTEM SET min_wal_size = '512MB';
ALTER SYSTEM SET checkpoint_completion_target = '0.9';
-- Planner: this server has a mechanical disk but lots of RAM
ALTER SYSTEM SET random_page_cost = '1.5';
ALTER SYSTEM SET effective_io_concurrency = '2';
-- Autovacuum: be aggressive for Odoo's write-heavy tables
ALTER SYSTEM SET autovacuum_max_workers = '3';
ALTER SYSTEM SET autovacuum_naptime = '1min';
ALTER SYSTEM SET autovacuum_vacuum_scale_factor = '0.05';
ALTER SYSTEM SET autovacuum_analyze_scale_factor = '0.02';
ALTER SYSTEM SET autovacuum_vacuum_cost_limit = '2000';
-- Logging: help future debugging
ALTER SYSTEM SET log_min_duration_statement = '2000';
ALTER SYSTEM SET log_checkpoints = 'on';
-- Kill idle connections after 5 minutes
ALTER SYSTEM SET idle_in_transaction_session_timeout = '300000';
Then we cleaned up the 29 idle connections and restarted PostgreSQL:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND now() - query_start > interval '5 minutes'
AND pid <> pg_backend_pid();
-- 29 connections terminated
docker service update --force odoo_db
docker service update --force odoo_kw-odoo
The results
| Metric | Before | After |
|---|---|---|
| Login page TTFB | 5.0s | 0.5s |
| Home page TTFB | 0.8s | 0.5s |
| Server load | 3.13 | 0.78 |
| RAM used | 6.3 GB | 2.7 GB |
| DB connections | 47 (41 idle) | 29 |
| Chrome processes | 23 | 0 |
10x improvement on page load time with zero code changes.
Key takeaways
- Default PostgreSQL config is for tiny servers. If you're running Odoo on a VPS with more than 4 GB of RAM, you almost certainly need to tune
shared_buffersandwork_mem. The defaults are designed to run on anything — they're not designed to run well. - Kill zombie processes. Browser automation tools (Playwright, Selenium) leak processes if not properly managed. Set up a cron job or watchdog to clean them up regularly.
- Autovacuum needs tuning for Odoo. Odoo's write-heavy tables (stock_move, account_move_line, mail_message) generate dead tuples fast. The default
autovacuum_vacuum_scale_factor = 0.2means vacuuming only triggers after 20% of rows are dead. For a production Odoo, 5% is much better. idle_in_transaction_session_timeoutis essential. Odoo doesn't always clean up its connections. Without a timeout, idle connections accumulate and waste resources.- Diagnose before you scale. This server had plenty of resources — they were just misconfigured. Throwing more hardware at the problem would have been expensive and ineffective.
Appendix: the configuration parameters explained
| Parameter | Default | Our Value | Why |
|---|---|---|---|
shared_buffers |
128MB | 2GB | PostgreSQL's main cache. Should be 25% of RAM on dedicated DB servers |
work_mem |
4MB | 64MB | Memory per sort/hash operation. Higher = fewer disk spills |
effective_cache_size |
4GB | 6GB | Planner hint for OS cache. Higher = smarter query plans |
maintenance_work_mem |
64MB | 512MB | For VACUUM, CREATE INDEX. Faster maintenance |
random_page_cost |
4.0 | 1.5 | Cost of random disk access. Lower = prefers index scans |
wal_buffers |
4MB | 16MB | Write-ahead log buffer. Reduces disk writes |
max_wal_size |
1GB | 2GB | More WAL space = fewer forced checkpoints |
idle_in_transaction_session_timeout |
0 (disabled) | 5min | Kills stuck connections automatically |
Diagnosed and fixed on a Contabo VPS running Odoo 17 on Docker Swarm with PostgreSQL 13. Total time from diagnosis to fix: ~30 minutes.