As security professionals, and although we are not experts in all areas, we must be aware of the risks present in our daily environment and take preventive measures.
by Gigi Agassini, CPP*
In today's hyper-connected world, our reliance on digital infrastructure is immense. From businesses to governments, and from personal communications to critical national services, the digital realm is the backbone of modern society. However, this dependence also exposes us to significant risks.
The recent CrowdStrike failure that affected globally, paralyzing entire operations and causing international chaos, serves as a stark reminder of how vulnerable and fragile we are. He highlighted, without a doubt, the deficiencies in our response plans, processes, policies, controls, communication, resources and training.
The digital landscape is fraught with risks, ranging from cyberattacks to technical failures and human error. It is common to focus heavily on only one part of our entire environment and fail to consider all points of failure, due to an overconfidence that we feel "protected or safe" by the investment we have made in recognized manufacturers, in certain tools, changes in processes, or even the development of new response plans.
It is clear that we are more fragile than we think and we must pay greater attention to our internal organizational environment, without neglecting the external and the entire supply chain. We shouldn't "assume" that seemingly small and harmless processes, such as "change control," are under control just because we expect applications to work as they should.
Ignoring that this small piece can make a big difference is a mistake. It is crucial to recognize the importance of each element in our processes to avoid surprises that can significantly affect the operation.
Throughout the supply chain, all software developments (regardless of the brand) involve humans and that simple fact opens up a margin of error. Our job is to make sure we put in place the measures, policies and controls necessary to prevent something like this, especially in critical products.
For this reason, this type of incident will occur again; it would be very uncertain to think otherwise. Industry consolidation across some vendors will mean that more and more organizations and institutions will be affected the next time a major software bug occurs. Today it happened with a certain brand, but they are all exposed to the same thing, even the own developments that some organizations generate for internal use.
This creates certain "gaps" as security teams that are at the limit of their possibilities will only worsen their ability to respond. People are in a fatigue and work stress, in the areas of security, especially due to the changes in the markets and the speed in the dynamism that has taken. There is a lack of labor in the industry, which undoubtedly increases and generates different risks and challenges in the operation.
As security professionals, and although we are not experts in all areas, we must be aware of the risks present in our daily environment and take preventive measures. Some of these risks, which are increasing significantly, need to be integrated into our incident response plans, processes, controls, and policies.
Cyberattacks: Malicious actors, including hackers and nation-states, are constantly looking to exploit vulnerabilities in digital systems. These attacks can lead to data breaches, financial losses, and service interruption.
Technical failures: Hardware and software failures can bring down entire networks. Whether due to outdated technology, insufficient maintenance, or unforeseen errors, these failures can bring operations to a halt.
Human error: Mistakes made by employees or administrators can lead to significant security breaches or system failures. This includes everything from poor password management to misconfigured systems.
Natural disasters: Events such as earthquakes, floods, and hurricanes can damage physical infrastructure, leading to extended outages and data loss.
Our fragility in the digital world proves that no matter the size of the organization or the strength within the industry, we are all vulnerable to human error or cyberattacks. As we initially reviewed, CrowdStrike, a leading cybersecurity company, suffered a significant failure that disrupted the services of multiple organizations globally, affecting hospitals, governments, and other critical operations. This incident revealed several crucial issues that we need to consider in order to strengthen our defenses and mitigate future risks.
Review of disaster recovery plans: The CrowdStrike failure, although it was not a cybersecurity incident, should be treated as such for the simple reason that it had the force to stop entire operations and led to the breakdown of any incident response and business continuity plan. This opens an opportunity for CISOs to review their response and preparedness plans for similar crises, using this case to simulate how they would react to a similar problem of such importance in the future, as it was definitely not just anything.
Expectation of repetition: As I noted earlier, mistakes like this are inevitable due to human intervention in software development, so to think that it is an "isolated and unique" event is unrealistic. Organizations need to assume that they will happen again (we don't know when) and be prepared with robust recovery plans to minimize the impact.
Software vendors: Even the most trusted and well-known vendors can fail, as already noted and lived with the unfortunate experience of CrowdStrike. It's crucial not to rely solely on reputation; instead, it's important that your tool assessments include verification of their commitments through clear agreements (SLAs) and manage supplier risks with a mature approach, this has to be part of your incident response plan.
All senior executives should ask themselves if they have a process of accountability to suppliers with service level agreements for times of crisis. Probably after this incident everyone, including software vendors, will learn from the experience.
Communication in crisis: reviewing and strengthening internal communication plans is essential. Companies must ensure that they can effectively communicate with their employees through alternative channels during a crisis to maintain operability.
After communicating risks, it is essential to focus on clear mitigation strategies. Outlining specific actions and contingency plans to manage each risk, ensuring they align with organizational objectives and risk tolerance.
Stakeholders need to see the proactive steps being taken to address potential threats. As a security professional, it is crucial to understand the needs of each stakeholder and build strong advisory relationships.
This approach helps, in a way, to alleviate some concerns, builds trust, and ensures support for risk management efforts. Well-defined mitigation strategies are vital to integrate risk management into organizational strategy and ensure stakeholder trust.
Resource management: as part of the remediations to the unfortunate situation suffered with CrowdStrike, it required processes in several sites that had to be done manually, which generated another crisis because the ease of doing many processes remotely or centrally today has led to the reduction or even elimination of local resources. This incident allows us to see that it is necessary to reconsider having local personnel.
Having staff on site is essential for rapid remediation. During the CrowdStrike crash, some organizations faced challenges with not having staff available to manually handle issues. Maintaining a backup team, even if it's minimal, is crucial to recovery.
Insufficient training: Many organizations did not have the proper training to deal with significant disruptions. The lack of preparation left employees without a clear response protocol, leading to confusion and reduced operational efficiency.
This gap in training reveals the urgent need to establish continuing education programs that equip employees with the skills and knowledge needed to effectively manage crises and minimize the impact on the organization.
Although some organizations have not yet recovered 100% of their operations, and the global impact is unfortunate, especially in critical sectors such as hospitals and airports, it is crucial to draw lessons from what happened. Learning from this incident will allow us to strengthen our strategies and better prepare for future crises, minimizing risks and improving organizational resilience. So it is important to consider:
Comprehensive risk management: Organizations need to adopt comprehensive risk management strategies that consider all potential threats, including those from trusted partners.
Improved redundancy: Implementing redundancy in critical systems can help mitigate the impact of failures. This includes having backup cybersecurity measures and alternative providers.
Improving response plans: Developing and regularly updating incident response plans is essential. These plans should include clear protocols for different types of incidents and regular drills to ensure preparedness.
Ongoing training: Ongoing training for employees is vital. This includes not only technical training but also cybersecurity awareness education and best practices.
Regular audits: Conducting regular audits of digital infrastructure and security measures can help identify and fix vulnerabilities before they are exploited.
It is important to strengthen our digital defenses through the following strategies:
Investment in technology: Investing in up-to-date technology can prevent many technical failures. This includes not only cybersecurity tools but also infrastructure upgrades.
Collaboration and sharing: Sharing threat information and best practices across organizations can help build a more robust collective defense.
Government and industry standards: Governments and industry bodies must work together to establish and enforce cybersecurity and digital infrastructure resilience standards.
Public awareness: Increasing public awareness of digital risks can help smaller individuals and organizations take the necessary precautions.
The digital world offers immense opportunities, but it also exposes us to significant risks. The CrowdStrike ruling is a reminder of how vulnerable we are and highlights the need for robust response plans, policies, controls, and training. By learning from such incidents and taking proactive measures, we can build a more resilient digital infrastructure capable of withstanding the multiple threats we face.
In a world where our daily lives and critical services are increasingly digital, addressing these vulnerabilities is not only a technical necessity but a societal imperative. The time to act is now, before the next failure or attack reveals our weaknesses again.
Until next time!
*Gigi Agassini, CPP
International Security Consultant
GA Advisory
[email protected]