The 2024 Crowdstrike Update Fail and What it Means for Security
When Crowdstrike CEO George Kurtz said that the July 2024 update failure that caused a global IT shutdown “was not a security issue”, he was rather downplaying this incident. After all, availability is part of the CIA triad; a fundamental concept in information security. With systems rendered inoperable, data inaccessible, and thousands of companies plunged into crisis mode, here’s a brief overview of the Crowdstrike update fail and some takeaways from a security standpoint.
Crowdstrike 2024 Bug: What Happened?
The veritable tech meltdown caused by CrowdStrike’s flawed update impacted 8.5 million Windows devices. Chaos ensued in industries like healthcare and travel—two German hospitals had to cancel elective operations. Cyber insurers estimate they’ll have to pay out at least $1.5 billion while Fortune 500 companies, many of which use Falcon, could face financial hits of up to $5.4 billion.
The actual issue stemmed from an update error in Falcon, which is a cloud-based endpoint protection platform developed by CrowdStrike. Falcon protects against a wide range of cyber threats through a single lightweight agent installed on company endpoint devices (think PoS devices, workstations, laptops, etc). This specific update was for devices running Windows, and it caused blue screens of death on Windows devices running Falcon.
The tricky thing with platforms like Falcon is that they need certain updates to be rolled out as fast as possible to ensure proper protection against threats on endpoint devices. To slightly oversimplify, the issue was in an update to the product’s antivirus signatures rather than the underlying software.
Still, though, the updates should’ve been staggered so that every CrowdStrike customer wasn’t impacted simultaneously. Crowdstrike should’ve rolled it out and tested it on a computer running Windows in their own testing environment. And, companies that use Falcon should also have the option to select whether to apply updates to all of their systems at once or try it out first.
Security Lessons from the 2024 Crowdstrike Incident
Many more lessons will become evident as new, more in-depth information emerges about this update failure. Still, here’s a run-through of some key takeaways from the 2024 Crowdstrike incident that are worth thinking about.
Overreliance on Specific EDR Providers
Perhaps the first thing that stands out is a more systemic issue of dependance on CrowdStrike’s EDR platform. Similar issues exist in other areas of IT, not just cybersecurity (think of how Amazon and Microsoft dominate cloud computing). Honing in on the Fortune 500, the incident hit 100 percent of enterprises in this coveted list, and 43 percent of retailer and wholesaler companies.
Such widespread reliance on single providers is a recipe for low-probability incidents having hard-hitting impacts. While Crowdstrike garnered a perhaps deserved reputation for its Falcon product, there are hundreds of other available and stellar EDR products. Increased diversity in vendor selection can limit the potential for such widespread IT outages because even the biggest companies are not immune to software patching problems.
Incident Response and Disaster Recovery
The companies that fared best in the fallout from this disruption tended to have solid incident response plans that deeply integrated disaster recovery strategies. Many system admins were left struggling to restore systems that had their hard drives protected by BitLocker encryption. Getting past blue screens of death required entering BitLocker recovery keys.
Often those very keys were installed on servers that were also inaccessible.
Having BitLocker keys backed up in some sort of offline storage medium would’ve saved a lot of recovery time for affected businesses. In general, disaster recovery strategies must be detailed enough to include things like:
- container technology to encapsulate applications and make them portable and easier to deploy across diverse environments.
- redundant hardware and networking configurations that automatically take over during a primary system failure.
- data replication methods, such as mirroring, to maintain real-time copies of your most important production data in separate physical or cloud environments.
View disaster recovery from the perspective of planning for all possible causes of downtime so that you don’t overlook something vital.
The Need for Robust Testing
While a really solid disaster recovery plan could’ve helped mitigate some of the damage from the Crowdstrike update fail, the blame mostly rests on Crowdstrike. There’s not really anything that any Falcon user could’ve done to avoid this incident. But zooming out a bit, the fact that such a large company like Crowdstrike missed this bug underscores the need for proper testing of code and validation in staged environments before rollout.
Aside from just testing code, there’s a wider point here about the value of regularly testing your entire IT ecosystem for potential single points of failure. Not many companies considered the potential single point of failure introduced by using Crowdstrike’s Falcon product. A thorough penetration test carried out by an experienced team is exactly what can flag the lesser-spotted vulnerabilities and weak points that automated scans might miss.
Our team at DIESEC uses specialist skills and knowledge to audit your environment and unearth weak points and vulnerabilities that you might have never known about or had a plan in place to mitigate. We’ll give you detailed final reports that outline areas of your security and IT infrastructure you can improve upon with specific actions.