Salesforce.com experienced some downtime last week. This caused panic amongst sales folks, it being the last week of the month. Downtime can be catastrophic for your customers. Besides lost revenue opportunities, it can also tarnish your company’s reputation. A tip to architecture your application for downtime…
Why applications go down
There’s planned downtime. And then there’s unplanned downtime.
Unplanned downtime could be the result of a software bug, human error, equipment failure, or malfunction.
It could be because someone, somewhere tripped over a cable.
Sometimes, a natural calamity strikes your datacenter and you experience downtime.
A lot of times, downtime is outside your control.
What to do about it
You first responsibility is to communicate the downtime to your customers. They may be relying on your service for a critical internal process unknown to you.
Next, your team must be working hard to identity the root cause of the downtime, push a patch and restore the application to a normal functioning state.
You can reach your customers through regular communication channels like email or upon request, SMS. Throughout the downtime, you should continue to update your customers on the resolution process via a status page.
Another alternative, which is the focus of this article, is to failover to a limited version of your application. It can be slow, maybe have stale data and you can’t make any changes to it. But the key point is that your application continues to be available.
Let’s explore this idea further…
How to do read-only failover
Having a read-only application experience is a 5-point checklist:
- Have a most recent backup of the database always running in another datacenter. Let’s call this the shadow database.
- Assuming your application holds no state, also have your application stack running atop the shadow database.
- Configure this application stack to run in a read-only mode. How you accomplish this depends on your application. For content sites, ignoring session cookies will provide a signed-out experience.
- Use DNS to fail-over to the new application stack. Amazon Route 53’s new domain name based health checks can be useful for this purpose.
- Work hard to detect the root-cause of the downtime and restore the application to normal state.
Downtime is inevitable — it’s how you deal with it. Providing access to your application and your customers’ data during downtime is preferable to a hard dead-end. Your customers will thank you for it!
Photo Credit: Zeusandhera.