(Un)Security Update is Dangerous? Who should take responsibility, Microsoft or CrowdStrike?
Two days ago, we witnessed a rare incident in which many computer systems crashed, causing congestion in public services. The computer uses the Windows Operating System, which crashed suddenly because of a bug in its software.
Why did it happen suddenly? The bug is mounted on a system by an automatic download mechanism in CrowdStrike software. I have been advocating for our clients for many years, saying that automatic updates are merely another backdoor from the point of view of security. This Automatic Update mechanism is becoming more common nowadays and has existed for many years, and this accident has proved its downside. We should not enable an automatic update mechanism on a production system to prevent unwanted behavior caused by other entities on our controlled system. Typically, we should harden our system before going into the production system and disable any mechanism contrary to our measures to contain and maintain the system.
This risk is accurate, and it happens already; many OSes deployed in the business still have Automatic Update in their systems, and that also applies to Linux, such as Redhat, etc.
Relate to the case of massive BSoD this weekend, caused by an update from Crowdstrike software on Windows OS, incurring kernel panic or Blue Screen of Death (BSoD), who is responsible? Is it Microsoft or Crowdstrike?
For anyone unfamiliar with BSoD, it is an OS event triggered by machine code action related to something wrong regarding hardware-related access or misaddressing the resource. One is accessing the memory area, which does not exist yet; in other words, it is a segmentation Fault, Access Violation, etc. BSoD happens in a kernel space code. In user space, the kernel will take over the violation and, therefore, won't occur BSoD. Indeed, it is difficult for any ordinary developer to cause this error in the kernel space unless a system developer creates kernel modules or drivers.
As you may see in the picture below, there is a bold separation between the application programmer or user's space and the system programmer or kernel's space. Once we, as programmers, develop a code that is used in kernel space (for example, drivers, kernel modules, etc.), we have any possibility of ruining the system quickly. The Crowdstrike patch is a *.sys file that belongs to the kernel space; therefore, those bugs are 100% responsible for causing the BSoD that struck the crowd widely last weekend.
As we know, only trusted entities would granted access to the kernel space unless untrusted ones will cause an unstable system. The kernel is a very delicate and complex system that requires cautious attention; therefore, a strong shell and authentication model to protect the kernel from any abusive user access lies between kernel space and user space. Based on the above understanding, then we raised some questions related to this issue:
- Why does Microsoft trust the Vendor (in this case, CrowdStrike) to access their kernel without any throughout the test or give an acceptance test before releasing it to the public?
- How is the testing and integration being conducted? Actually, the code that will be submitted to the public should be well-tested and pass the User Acceptance Test before being committed to the public domain.
- Why is there no quick response mitigation to prevent such action? In other words, Why is there no automatic rollback mechanism when the update goes wild like this?
- Historically, an antivirus program also caused BSoD in 2014. Apparently, history repeats itself; CrowdStrike is also trying to mitigate malware, including computer viruses, just like the previous one. Why didn’t Microsoft learn from history?
- Which one is worse, having been struck by a cyberattack or struck by a reckless update?
Most of the questions above are rhetorical and not meant to be answered. Answering them might take a very long time, but they intend to raise our analytical minds and help us understand something better.
This issue is NOT only CrowdStrike's problem alone; it is part of Microsoft itself as the owner of the Windows Operating System. The untidy behavior in software updates opens any vulnerability, including having many bugs and backdoors (in the past, at least). This creates many possibilities for the bad guy to attack its system, gain some access, and create some disturbance, which results in some loss of data and, moreover, loss of business opportunities.
Application Bug is everywhere.
There are no perfect things, including software that normally has bugs. Technical problems are everywhere in our technology implementation; this is part of our lives. Those are not for complaint, but we have strategy and tactics: we can mitigate and decrease the risks. In software development, there is an SDLC (software development life cycle), which includes complete tests of any software that will be placed in the production system. There are also regression tests, and it is common to test against many versions of the objects.
We are addressing governance when the technical has reached its limitation. Then, governance and applying procedures will take a step to protect and prevent the system's operation from going wild. This is a big part of Microsoft as the principal and the real owner of the piece of software called Windows Operating System, and how they run their governance to protect their customers, the clients, and their stakeholders.
In Linux and any nonproprietary software, there are different kinds of governance, such as ownership and liability. Then, the responsibility level might be different. Redhat, as an example of the distributors of Linux Redhat, is NOT the owner of the Linux Kernel since the public owns the Linux Kernel through GPL Licence. Therefore, anyone who brought the CrowdStrike software installed in their machine is all theirs, NOT the owner of the Linux kernel (which is called public)
Yes, Indeed. Microsoft has shared this problem along with CrowdStrike as well.
Jakarta, 21 July 2024 ~Julyanto S