My AI assistant finished my first assignment before I did

Today I got my first real task at the new job. And before I tell you what happened, some context: I've been writing here about how my AI assistant helps me with software development. There was an article about my first day where I described how it helped me get oriented, understand the codebase, and figure out the lay of the land. If you've been following along, you know what I'm talking about.

If not — well, I'm a software engineer who works with an AI assistant that I've customized extensively. It lives in my terminal, has access to my repos, can SSH into servers, query APIs, and do the kind of things that would take me hours in minutes. It's not a chatbot. Think of it more like a very fast junior engineer who never sleeps and never complains.

Anyway. Back to today.

The task

My team had a sprint retrospective last week. One of the action items that came out of it landed on my desk: review the current release process and propose a plan to improve it.

Release process. That thing every software company has but nobody really wants to think about. The invisible machinery that takes code from "it works on my machine" to "it's running in production and real users are hitting it."

The retro had flagged a specific problem: 14 tickets were sitting in the "tested" column, some for weeks, waiting for someone to deploy them. No clear owner. No automation. Just a queue of work that was done but not shipped.

My job was to figure out why.

Phase one: tell the assistant everything

I started by doing what I always do — I told my assistant exactly what happened in the retro. Not a summary. Not bullet points. The full picture: who was there, what they said, what the pain points were, what the team wanted, what the constraints were.

Then I shared the Granola notes from the meeting. If you don't know Granola, it's a meeting notes tool that records and transcribes. I gave the assistant access to those notes through an API, and it pulled the full transcript with action items, decisions, and context.

This is the part most people skip. They give the AI a vague prompt like "analyze our release process" and wonder why the output is generic. Garbage in, garbage out. I've learned that the more specific context I give, the more specific the output I get.

Phase two: look at the actual code

I pointed the assistant to three repositories:

  • The PHP monolith (Laravel, GitLab CI)
  • The Next.js frontend (GitHub Actions, Vercel)
  • The infrastructure config (Kubernetes, FluxCD)

The assistant cloned them, read the CI/CD pipelines, the deploy scripts, the Docker configurations. It found things that would have taken me a full day to discover:

  • The PHP repo has tests (PHPUnit + Behat) but they're configured with allow_failure: true. They can fail and the deploy still proceeds.
  • The Next.js repo has Playwright E2E tests written and ready to go, but they're disabled. A comment in the code says "disabled until pipeline duration/costs are known."
  • Every single production deploy is manual. Someone has to click a button.
  • There are 8 environments (Stage, Beta, Test, Demo, UAT, Alpha, POC, Prod) and none of them have clear ownership.

That last one was the real insight. 8 environments with no clear owner means tickets die in the handoff. It's not a technical problem. It's an organizational one.

Phase three: SSH into the servers

This is where it got interesting.

I had just gotten Teleport access set up. Teleport is an identity-aware proxy — think of it as SSH with SSO and audit logging. I told my assistant it was authenticated, and it connected.

It checked the Kubernetes test cluster. Found it running but empty — just Flux, Cilium, and a Teleport agent. The observability stack we'd been trying to deploy wasn't there yet.

Then it SSH'd into stage-1, one of the DigitalOcean droplets that runs the PHP monolith. Found the actual deploy structure: a single Docker container running PHP 8.2, with multiple site directories mounted as volumes. Each environment (Stage, Beta, Test, Demo) has its own directory with its own .env files and its own release symlinks.

It checked which branches were deployed where. Stage was running o-1887 (a feature branch). Test was running gro-212 (another feature branch). Beta was on cor-476. None of them were on master.

That means every environment was running a different version of the code. No two environments were the same. When QA tests something in Test, they're testing code that isn't what's in Stage, which isn't what will go to Production.

Also found that stage-1's disk was at 80%. That server has 26 queue worker containers running on it. That's a ticking clock.

Phase four: write the proposal

After all the investigation — repos, CI pipelines, live servers, deploy manifests — the assistant wrote a proposal. Not a vague "we should improve things" proposal. A specific, phased plan with:

  • Quick wins for this week (make tests block deploys, re-enable E2E, add health checks)
  • Stabilization for next sprint (automated rollback, Slack notifications, release notes)
  • Continuous deployment for month two (auto-deploy to staging, canary releases, approval via Slack)

It included a table of current vs. target metrics. Specific action items with checkboxes. Open questions for the team.

The whole thing was a few pages. Not a novel. Not a PowerPoint deck with 40 slides. A concise document that says: "Here's what's broken, here's why, and here's how to fix it in order of priority."

The heavy lifting

Here's the thing. The investigation part — reading three codebases, understanding two different CI/CD systems, SSHing into servers, checking deploy manifests, comparing environment states — that would have taken me at least a full day. Probably two.

The assistant did it in under an hour. And it didn't just gather information. It connected the dots. It saw that allow_failure: true in the PHP tests was directly related to the 14 tickets stuck in "tested." It saw that 8 environments with no ownership was the organizational root cause. It saw that feature branches being deployed everywhere meant QA was testing the wrong code.

I still need to review the proposal carefully. I need to make sure it fits the team's reality, the company's priorities, and the constraints I might not know about yet. That's my job. The AI can't do that part — it doesn't know the politics, the history, the personalities.

But the heavy lifting? The part where you spend hours digging through code and servers to understand how things actually work? That's done.

What I'm learning

Every time I do something like this, I learn the same lesson: the value isn't in the AI writing code. The value is in the AI doing the research that nobody has time for.

Every team I've worked with has the same problem. There's stuff that everyone knows is broken but nobody has investigated because investigation takes time and time is scarce. So the broken thing stays broken until it becomes a crisis.

Having an assistant that can do that investigation — read repos, check servers, run commands, synthesize findings — changes the equation. It doesn't replace the engineer. It makes the engineer more effective because now they can actually spend their time on the decisions, not the discovery.

The proposal is sitting in a draft file right now. I'll review it tomorrow, tweak it, and share it with the team. The hard part — understanding the problem — is already behind me.

That's what AI-assisted development actually looks like. Not magic. Just a very fast research assistant that happens to live in my terminal.

← Back to Blog