Your AI Assistant Can't Click Buttons — Unless You Give It a Body

My AI assistant lives on a $6/month Linux VPS in Germany. It reads my emails, manages my calendar, deploys code, runs shell commands. But until last week, it couldn't do one simple thing: open a browser and check a website where I'm already logged in.

Cloud brain, no hands

Most AI setups stop at the API layer. Your assistant can call endpoints, parse JSON, run scripts. But the moment you need it to log into a dashboard, fill out a form, or check something that only renders in JavaScript — you're back to doing it yourself.

I tried the obvious route. Playwright, headless Chrome, disposable browser sessions. They work for scraping. They don't work when you need your saved passwords, your cookies, your two-factor sessions. Every time my assistant needed to check Outlook, or verify a deploy on an admin panel, I had to context-switch, open the URL myself, read what was on the screen, and tell the assistant what I saw.

That's not automation. That's me being the assistant's assistant.

I plugged in a Mac mini

I had a Mac mini sitting on my desk, running 24/7. Chrome with all my sessions saved. LinkedIn logged in. Outlook logged in. Every admin panel I use, already authenticated.

I connected it as a remote node to my AI gateway. One WebSocket connection over TLS, and now my assistant on the VPS 800km away can:

  • Open tabs on the Mac mini's browser
  • Navigate to sites where I'm already logged in
  • Take snapshots of what's on screen
  • Run macOS commands (Homebrew, native apps, whatever)

What changed in practice

Last week one of my Telegram bots went down. My assistant diagnosed the problem from the VPS — wrong LLM provider config, out-of-memory crashes. But to verify the fix actually worked, it needed to check the bot's admin panel. That's a web UI with an authenticated session.

Before: I open the URL, check it, tell the assistant "yes, it's back up." Three minutes of context-switching for a five-second check.

Now: the assistant opens the browser on the Mac mini, navigates to the admin panel, reads the status, and tells me it's fixed. I didn't leave what I was doing.

Same thing happened today. I needed to create a Google Business profile for a client. The assistant opened Chrome on the Mac mini, went to business.google.com, and started filling in the form. It hit a passkey verification wall (Google won't let anyone else past that), but everything before and after that step — it handled alone.

This isn't about tech, it's about bottlenecks

I used to think the limitation of AI assistants was intelligence. It's not. GPT-4, Claude, whatever — they're smart enough for most tasks. The limitation is reach.

Your assistant can write the perfect email but can't send it from your actual email client. It can diagnose a server problem but can't check the dashboard to verify the fix. It can draft a social media post but can't publish it where you're already logged in.

Every time your assistant hits one of those walls, you become the middleman. You copy, paste, click, screenshot, and feed information back. The smarter the assistant gets, the more annoying that bottleneck becomes — because it can do everything except the last mile.

Giving it access to a physical machine with a real browser solves that. Not for every task. But for enough of them that it changes how you work.

The honest setup experience

I'm not going to pretend this was one click. Here's what actually happened:

  1. Installed the node agent on the Mac mini
  2. Gateway config was wrong — had to set the right mode
  3. First pairing attempt used the wrong role
  4. WebSocket connection failed — needed TLS flag
  5. Had to approve the pairing from the gateway side
  6. Once approved, it connected immediately

Six steps, maybe 15 minutes. Not terrible, not magical. After that initial setup, the Mac mini reconnects automatically on reboot. I haven't touched it since.

When this actually makes sense

Not everyone needs this. If your AI assistant only does API calls and text generation, a cloud-only setup is fine.

But if you find yourself being the go-between — opening URLs for your assistant, copy-pasting dashboard data, clicking buttons it can't reach — that's the signal. You're the bottleneck, and a $300 Mac mini sitting on your desk can fix it.

The interesting part isn't the technology. It's what happens to your workflow when your assistant can finally see and touch the same things you can. You stop being the middleware between the AI and the real world.

And that's worth more than any model upgrade.


Omar Díaz builds AI-powered virtual employees at The Employees. Previously: Odoo consultant, backend engineer, and someone who learned that shipping beats building every time.