You're a software developer. You've been there. You (or your team) has just pushed a highly-anticipated release out to production. Success!

But, one/four/eight hours later, you start hearing customers complain about something being broken. The release contains a defect! Crap.

And so, the team scrambles into action. Developers diagnose the issue and ascertain the source of the defect. QA reproduces the defect in the test environment. A fix is coded and the project/support manager is brought up to speed on options. The change manager is notified of a coming emergency release and procedures are put into motion. A build is prepared and the fix is verified. Customers are notified of the coming fix and necessary downtime to get it into production. Apologetic calls are made to significant others and yet another Friday night is sacrificed in the name of making things right and keeping the customer happy(ish).

Prevention

The next-day fallout is predictable. Calls are made for more code reviews, more unit tests, more regression tests, more sign-offs, and, of course, less frequent release cycles. In other words, more of everything but the actual deployments. Because, obviously, none of this would have happened if the defect hadn't actually been put into prod! The response is predictably cautious and conservative. And, to me, maddening.

Mistakes will happen. There's no getting around it. You can reduce the chances, frequency, and impacts of mistakes, but you can't eliminate them entirely. The closer you get to zero defects, the closer you get to zero profits. In other words, it's too expensive. Expensive in terms of IT budget and expensive in terms of market opportunity.

Correction

Given the inevitability of mistakes, you're much better off planning how to correct them (as quickly as possible) rather than trying to prevent them (to an absolute degree). I know that the conventional corporate posture is cautious and measured. But I say fight it! Be aggressive. Be dynamic.

  • Instead of lengthening the release cycle, shorten it.
  • Instead of pushing back the deployment window to the middle of the night, put it smack in the middle of the day.
  • Instead of branching into a safe sandbox, merge into the main codebase continually.

From a traditional corporate IT standpoint, these actions would appear foolish. And, if you were to just lunge headlong into doing them without proper thought and planning, foolish is what you'd be. But there's nothing foolish about targeting these actions as whatever-term goals. Because achieving them will make your IT department into a more motivated ("No more midnight deployments! Yay!"), more responsive ("Yes, ma'am. We'll have that bug fixed for you right after lunch."), more efficient ("No more merge hell! Yay!") asset to the corporation.

The more often you do something, the better you get at it. Production deployments are no exception.

Have you worked someplace that went this route? Did it provide the benefits I envision? Am I off-base here?