Lessons from the CrowdStrike outage: A wake-up call for software testing

2 weeks ago 10

In what might be perceived as the world's most significant IT outage, the CrowdStrike incident is likely to cost upwards of $5 billion in damages, also erasing billions from the cybersecurity company's own share price almost overnight. For organizations attempting to update and release software more quickly than ever, the lessons learned from the havoc wreaked by a single "buggy" endpoint security software update are poignant.

The CrowdStrike outage is the definitive illustration of the complications of software development and delivery today. The growing intricacies of the modern digital environment and the velocity and scope of software change make quality and reliability ever more challenging. Even when quality is a primary focus, as it was for CrowdStrike, delivering consistently high standards is no easy feat. Effective testing strategies are essential.

However, it's not uncommon for teams to push testing to the bottom of the priority pile in order to keep pace with delivery expectations. It's unfortunately considered a burden rather than a significant value driver.

Chief Product and Strategy Officer at Tricentis.

From bottleneck to business enabler

With the right approach, testing can harness automation to ensure continuous and actionable feedback, and identify and target risks and defects early. Far from hindering speed, testing can actually shorten and enhance release cycles for high-quality products and services.

Automated testing has revolutionized software development, enabling teams to keep pace with projects' increasing speed and scale. It improves test coverage and accuracy when compared to manual approaches, as well as while delivering faster feedback - crucial for continuous integration and deployment (CI/CD) processes.

Particularly effective for regression testing and change validation, automated testing also pinpoints code changes for a more precise and efficient workflow. The advent of GenAI-powered testing tools further elevates this process by significantly reducing testing time and freeing up resources to identify complex issues that AI alone might miss.

But it's not just building test automation coverage; it's about ensuring the right development environments are in place to test changes, from development and staging to pre-production and production. In order to create a testing strategy that provides quality while limiting exposure to risk and downtime, it's vital to adapt testing according to the problem to be solved. That means looking at who and where the user is, their risk profile, how they are using the application, and whether it's on-prem or in the cloud.

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

User-centric testing across environments: adapting strategies to mitigate risk

Understanding users' needs and how they use a product across multiple environments is fundamental to establishing where to focus testing efforts. User profiles can help recreate real-life situations, making sure that all potential use cases and environments are covered. By aligning testing efforts with user behaviors, requirements and environments, testing can be prioritized based on high-impact areas to identify issues early. This thorough approach ensures reliable performance across diverse operating systems, application technologies and device types. It can avert possible failures and ensure a consistent, high-quality user experience regardless of how customers access software applications.

Increased traffic and demand are two such scenarios it's vital to plan for. Higher loads can cause critical systems to grind to a halt or crash. The recent Oasis ticket sale fiasco showed that performance and load testing are essential. They help minimize unexpected repercussions from increased website and application loads, ensuring performance and reliability. Performance testing is a vital element of the overall testing strategy and should be incorporated throughout the software development life cycle (SDLC) to identify potential bottlenecks and ensure systems are prepared for real-world demands.

Identifying infrastructure & supply chain risk

However, it's not enough to consider only an organization's own systems. Integrations with third-party systems must also be robust and reliable. While the Oasis ticket sale sites had taken steps to manage load, including putting a new queue counter system in place, external payment systems also struggled.

It is therefore vital to identify and align with all third-party providers involved in critical business applications, reviewing their policies and procedures for managing faulty code that might originate with them. Service virtualization can be used to simulate services that are outside of an organization's control, allowing teams to test how their applications will perform under various conditions. Chaos testing of these integrated systems is also an important element of a complete test strategy to uncover vulnerabilities and ensure resilience against unexpected failures. Closely examining infrastructure and application integration points in this way reduces external source risks for an efficient and secure ecosystem.

It can be hard to know where to test, especially when resources are limited and prioritization is required. Finding solutions that look inside the code and alert teams of untested areas, providing visibility into change assessment, can deliver dividends. As the pace of change accelerates, this will be increasingly important.

Indeed, the majority of SaaS applications continuously release updates. Where application updates are dependent on external providers, automated and fast regression testing is imperative for businesses to minimize risk.

When accelerated development tempo creates bottlenecks and heightens testing demands, leveraging no-code or low-code platforms alongside AI-generated code can also enable development teams to deliver updates and new features more rapidly. Companies should incorporate AI-driven automation into their testing solution set to prevent innovation slowdowns.

Redefining software testing for a safer digital future

Staying error-free while keeping up with software change in today's complex and fast-paced digital world is a hard task. Still, we must strive to minimize the potential impact. The CrowdStrike and Oasis events should be treated as a wake-up call to reevaluate how businesses everywhere implement software testing to ensure software quality and reliability throughout the development lifecycle. Only by doing this can we hope to prevent similar problems in the future.

We've featured the best IT infrastructure management service.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Read Entire Article