Three Production Apps Stabilized in 5 Days After Entire Dev Team Left

The Situation

the client provides practice management software to small law firms. In the span of 30 days, all three of their engineers departed — two to a well-funded competitor, one for a career change. The CEO, Rachel Torres, found herself running a software company with no software team and three production applications that 200+ law firms relied on every business day.

She had two weeks before her last engineer's final day. She needed a maintenance partner who could onboard rapidly, understand a complex multi-application codebase, and give her the certainty that critical client-facing systems would stay online.

The Challenge

The technical situation was messy in the way that most SMB codebases are messy after years of fast growth:

Three applications: a client portal (React), a backend API (Ruby on Rails), and a document automation tool (Rails + custom templating engine)
Sparse documentation — most institutional knowledge lived in the departing engineers' heads
Heroku deployments with a Procfile structure that had grown organically over 5 years
Several critical Sidekiq background jobs running nightly with no monitoring or alerting
Stripe billing integration with custom logic that wasn't tested and wasn't documented

The most urgent risk: a nightly job that processed billing for all subscription accounts had no error handling. If it silently failed, the client could go days without invoicing clients.

Our Approach

We started two weeks before the outgoing engineer's last day — deliberately. That overlap period was valuable: we ran daily knowledge transfer calls, recorded screen walkthroughs of critical workflows, and built a runbook in real time alongside the departing team.

Our onboarding process for maintenance engagements follows a consistent structure: on Day 1, we get read access to every repository. By Day 3, we've mapped all critical paths and dependencies. By Day 5, we've identified the top 10 failure modes and have monitoring in place. the client was no different.

The Solution

During the onboarding period, we completed:

Full codebase documentation: Wrote a technical runbook covering every service, every background job, every third-party integration, and the deployment process for each app
Monitoring deployment: Added error tracking (Sentry) and uptime monitoring across all three apps — within 48 hours of gaining access
Billing job hardening: Added comprehensive error handling, retry logic, and Slack alerting to the nightly billing job. Added idempotency keys to prevent double-billing on retry.
Dependency audit: Identified 14 outdated gems with known security vulnerabilities; patched 11 immediately and scheduled the remaining 3 for the following maintenance window
SLA establishment: Agreed on P1/P2/P3 response times, escalation paths, and a monthly health review cadence

Results

Since Kovil AI took over maintenance, the client has had zero production downtime incidents. The billing job, which we later discovered had silently failed twice in the previous 6 months, has run without error every night. Monthly health reviews give Rachel and her team full visibility into the technical state of their systems without needing an internal engineer to translate.

"I used to wake up anxious about whether the apps were running," Rachel told us. "Now I just check the Slack channel and go back to work."