Do you have a rollback plan?

It’s true that DevOps practices and CI can make your deploys much safer. That doesn’t negate the need for a rollback plan though.

A rollback plan is exactly what it sounds like. It’s a list of steps you’d take to undo a release and restore the system to its original state.

You or your team might think that you have one if you have an engineer who can undo things, but that isn’t the same as a rollback plan. Having a single or even a small fraction of the engineering team know how to undo a given release isn’t a replacement for a rollback plan.

A rollback plan is a set of written steps that others can follow to undo the release.

Why have a way to undo a release if you have the other pillars of automation (code review, automated tests, etc…)? Just like you wouldn’t want to get placed into the middle of the woods without knowing how you got there and how to get home, you wouldn’t stage a release without knowing what steps would get you back.

Writing a rollback plan can also help clarify what impact the release is expected to have on other systems and what other steps should be taken.

Would you have to restart a service? Dependents of that service need to be notified.

Would you have to make a change to a database? If its shared with other applications those owners should be notified.

Would you have to restore data? Better make sure the backups are working and restorable.

What does a rollback plan look like?

Here is a sample of what one might look like for an organization that has a release pipeline and both automated and manual tests. Yes, its entirely possible that yours is longer, but this is an example of where you can start. If you already have something like this, you can improve it by making sure you have a link to a document that answers the question “how?” for each of the below:

Release v5 of SuperImportantApp Enterprise Edition:

Scheduled: 1/2/2018 0700 PST

Rollback steps: Should the deploy fail for any reason, the following steps will need to be performed in order: Set CI to retrieve release tag 4.5 artifacts Deploy resultant artifacts to QA Confirm automated tests pass Release artifacts to Prod Rolling restart of service on all nodes 5a. See our internal document hub “Rolling release” document for how to do this Confirm tests pass, both manual and automated