Lessons Companies Can Learn from Recent Airline Outages
Recent news headlines have been filled with the shutdowns of some major airlines due to computer systems issues. Most recently, Delta Airlines suffered a power outage that grounded thousands of passengers and planes. This outage came at one of the worst possible times too: during the peak summer travel season.
The outage affected the reservation systems and prevented Delta from notifying passengers of cancelled flights. Many passengers were forced to reschedule flights several times as Delta scrambled to reset its staff and planes over the course of several days.
This outage will likely cost Delta millions of dollars, as well as incalculable damage to the airline’s reputation. According to IDC, downtime from an infrastructure failure can cost $100,000 per hour. Other monetary costs include the expense of waiving change fees as well as granting full refunds and $200 vouchers. Disgruntled customers flooded social media outlets with their complaints and images of crowds and long lines at airports.
A One-two Punch for US Airlines
The outage at Delta comes on the heels of a system shutdown at Southwest Airlines, which cancelled 2,300 flights over the course of 4 days.
A single computer router malfunctioned at Southwest’s data center in Dallas, bringing flights to a screeching halt. Shutting down the system and rebooting took 12 hours, leaving passengers stranded at airports.
At first, Southwest estimated the shutdown could cost the airline between $5 and 10 million, but the costs could end up going higher. Dallas Business Journal speculated that costs could run between $54 and 82 million due to lost revenue and escalating costs. Just in the day after the incident, the airline’s stocks fell 11% and showed no signs of recovery in the following days.
What Went Wrong
The problems at Delta and Southwest have a common theme. Both shutdowns were symptoms of aging infrastructures that left the airlines vulnerable.
Delta’s shutdown originated with an electrical problem at their Atlanta headquarters. When the electrical problem occurred, Delta’s critical systems failed to switch over to the backup. Additionally, after a merger between Delta and Northwest Airlines almost 10 years ago, some systems were updated, but others were overlooked.
Southwest had a backup system in place, but the partial failure failed to trigger the backup. Airlines also tend to back up only a few times a day rather than backing up in real time.
IT Security Lessons Learned
The Southwest CEO compared the router failure to a “once-in-a-thousand-year flood.” With major outages affecting 2 airlines in close succession, that could hardly be the case.
Both the Delta and Southwest outages illustrate the problem of having a single point of failure. In both cases, a single vulnerability interrupted the entire system and spanned across airports all over the world. Eliminating single points of failure is crucial for companies, like airlines, that need to be available 24/7.
Single points of failure can be eliminated through redundancy. For instance, backup generators provide an uninterrupted power supply in the case of a power failure. Redundant servers can serve as data backup and failover solutions in the event of an outage.
Data backups need to be completed regularly to ensure mission-critical information is up-to-date at all times. Backup frequency should match up to how often your mission-critical data changes. Backup systems also need to go through routine testing to guarantee they will work in an emergency. Had Southwest known their backup wouldn’t be triggered by a partial failure, they may have been able to fix the problem in time.
Not every outage can be avoided, but ensuring that you have resources to fall back on in the case of an incident can go a long way. Your clients and customers expect you to be available and equipped to serve their needs at all times. If your business drops the ball, your reputation can suffer a major hit.