Learn About Containment, Eradication and Recovery in Cybersecurity Incident Response

Welcome to the fourth episode of our Energy Talks miniseries titled, Why Should You Talk About Incident Response? Join OMICRON cybersecurity consultant Simon Rommer as he explores the different process steps involved in cybersecurity incident response alongside other experts from the power industry.

In this episode, Simon speaks with Stephan Mikiss, who is Head of Managed Security Services at SEC Consult and a SANS-certified forensics analyst based in Vienna, Austria. Simon and Stephan discuss the steps of containment, eradication and recovery in the incident response process and highlight the need for collaboration between IT and OT teams to effectively manage cybersecurity incidents. 

Simon and Stephan also explore the iterative nature of incident response, the unique challenges posed by OT environments, and the necessity of understanding both the business model and the attacker's motives to make informed decisions during a crisis.

quote

“Decision-making processes must be well thought out and documented, but since every incident is different, they must also allow for adaptability.”

Stephan Mikiss

Head of Managed Security Services, SEC Consult

Here Are The Key Topics from This Episode

1. Continuous Incident Response Cycle: Stephan Mikiss explains that incident response is a continuous cycle of identification, containment, eradication, and recovery. These steps overlap and require constant reassessment to manage risks and maintain operations, ultimately protecting the business from financial losses.

2. Importance of Decision-Making: Stephan emphasizes the need for a well-thought-out decision-making process in incident response. Preparation is crucial, but flexibility within the framework is essential to adapt to the unique aspects of each incident. 

3. OT-IT Convergence Challenges: Stephan and Simon discuss the rapid OT-IT convergence, emphasizing the need for specialized tools and expertise. Effective incident response requires understanding both business and attacker objectives to ensure safe operations.

4. Role of Crisis Boards in Incident Response: Stephan emphasizes that the security team is just one part of the overall crisis board during an incident. Effective incident response requires alignment between technical security teams and high-level management to ensure decisions are well-informed and authorized.

Scott: Hello everyone! My name is Scott Williams from the podcast team at OMICRON. This is the fourth episode in our Energy Talks podcast miniseries about cybersecurity titled, "Why Should You Talk about Incident Response?” Your host of this miniseries is Simon Rommer, who is an OT Security Consultant in the OMICRON Power Utility Communications Team. Simon continues to explain the steps involved in the incident response process with his guests. So, without further delay, I hand over the microphone to Simon. Hi Simon!

Simon: Thank you, Scott. I welcome our listeners to this fourth episode in our Energy Talk Cybersecurity Miniseries, where we explore these critical roles of IT and OT in power systems, cybersecurity, and discuss the steps involved with the incident response process according to SANS. In our previous episode, I spoke with Johann Stockinger, Head of Digital Forensics and Incident Response at the Deutsche Telekom Security Operations Center in Vienna.


We discussed the second step in the incident response process called identification. In this episode, my guest is Stephan Mikiss, who is Head of Managed Security Service at SEC Consult. SEC Consult is known for security services, starting from pen tests over to incident response. More so, is also a SANS-certified forensics analyst. We will discuss the third, fourth and fifth part of the incident response process. These are called containment, eradication and recovery. Together we will find out why we are grouping this in one episode and why it makes sense to do so.

Simon: Stephan, welcome to this episode of Energy Talks about incident response.

Stephan: Thank you, Simon. It's a pleasure being here.

Simon: We teased it a little bit in the introduction, but from your point of view, what are the main steps after identifying a threat?

Stephan: That's an interesting question because starting off, don't believe that you really complete the identification phase and just afterwards go into containment, recovery, et cetera. But it's more of a living cycle between identification, containment, and eradication, partly also with recovery afterwards. So I would say the main steps are to make your mind about what you need to do to gain control over the network again, or if you're not that far in the investigation yet, to go a few steps back and think about what's needed to reduce the risk to an acceptable level so that you at least can proceed either with the, let's say, emergency operations so that you can continue your business, but also, to reduce the risk of getting an even bigger impact on your business.

Simon: So basically, it's a circle of getting to know more information, what the attacker did and having your system run and keep on with operations and don't lose too much money of having to shut down.

Stephan: Yes, exactly. Because in the end an incidental response is there to prevent the business from suffering from huge financial losses. So, in the end, it's a matter of protecting the business for business purposes.

Simon: This is also the first answer to the question why we're taking these steps at the same time, because it basically happens at the same time and during the investigation you get to know more information about the attack and then you can react differently, can activate certain parts of the network again and keep the operation going, which is very critical for us in the energy sector. So, what is in your opinion the most important part during these steps or during this process?

Stephan: From my point of view, the most important part is to have a well-thought decision making process, honestly. So of course, it's important to be capable of implementing containment measures and so on. But that's, I would say, the baseline that you need to provide. Of course you can come up with it on the fly, but who wants to do it at 2 a.m. in the morning? So usually, you want to have that prepared. And then it comes down to really having a solid understanding of how you come to a decision. And who is responsible and allowed to make the decision. It's always a balance between what we need to do to protect the business, but also to protect the IT infrastructure. We, from a defending point of view, don't block ourselves too much and for example, maybe do some irreversible changes to the overall system, which could lead to even more financial impact. Also, of course, look at the obligations that we have, regulations, et cetera, that we need to abide. But also, to you know, bring in more of the mentality that you're acting against a real human being. So, whatever you're doing leads to a reaction of the opponent. And so, you will need to take all those different points into account and then come up with a decision. And that's why I also like the other loop, for example, observe, orient, decide and act. Because it really helps to go into this dynamic situation, has some sort of natural feeling process that really helps you in decision making and helps you in executing and following exactly that process. So, from my point of view, that really benefits crisis management, incident managers to get to not always the right decisions, at least get to decisions that are well thought and can be followed.

Simon: So, in the first episode we talked a lot about preparation. My favorite idiom is “Preparation is key”. And to me it sounds like you're saying basically the same, have a well thought out plan before the incident happens. Which leads me to the playbooks where you should also define who is responsible for what, contacts and so on. And the decision makers should also be in there. To me it sounds like you're also saying with your practical knowledge preparation is key and a good preparation is half the battle won so to say. We didn't even talk about technicalities yet, but this is just the organization and stuff around an attack.

Stephan: Yeah, totally true. So, I believe preparation is key, as you mentioned. However, even during preparation, you must be aware that, and I really like Mike Tyson saying that no plan...

Simon: And he has a plan until somebody punches you in face.

Stephan: Exactly, yeah. And this is so true because you will need the plan to remember key steps or key information that you want to rely on during the incident. And at least to have some sort of framework that you can move in and that you can rely on during the incident.

You will need to have the flexibility within the framework, within the incident response plans or the playbooks to adapt to the situation. So, there is no step-by-step guide to solve a crisis because that's just the definition of crisis. But you will want to have some sort of guidance or framework that helps you really survive and maybe even handle the crisis in a superior way.

Simon: That's exactly true, because as you said, everybody has a plan until they don't. And the framework is supposed to be a guiding protocol…

Stephan: A helping hand. Let's say a helping hand.

Simon: Thank you. And you have to adapt, because no two incidents are the same. Or have you had two incidents that are the same? In my experience, not even two ransomware cases are the same. And they are quite similar.

Stephan: I mean, of course you can see similarities if it's the same campaign by the same threat actor. But what makes the difference is that the networks of our clients are different. So of course, then every incident is a bit different. Maybe not always that different from an etiquette point of view. But then of course on the side of how the defenders are dealing with it, how the organization responds to it.

Simon: This difference is exactly where we also would need our listeners because in our energy sector we have a different type of substations, we have different type of environments and for us as external consultants we are always dependent on the proficiency and the information the engineers in the field provide because the engineers in the field know their systems the best and even in the incident response case where we are supposed to be specialists, we are still dependent on the people that know the system. So, in your case, for example, you said two systems are never the same with the customers. You always have to go to the Active Directory admin, to the network admin, to the firewall admin. In your case, sometimes the endpoint management guy or girl, in my case more the protection engineers and so on and so forth. So, this is a universal truth for every industry.

Stephan: Definitely and it gets even more interesting the more different providers are involved especially IT providers and not necessarily security providers, but more different IT providers are involved, and I guess it's somewhat similar in OT, right?

Simon:  In OT you have certain types of vendors that care for certain type of the environment or networks. You have all the big vendors that come in, okay, I need to protect my devices, or you don't have all the capabilities because it's proprietary. Or you need people with a special skill set and so on and so forth. So, in IT it's I would say different, I don't want to say it's easier, because you have endpoint protection and you can get the memory dumps and you can get the images and all the services and so on and so forth, which I really enjoyed back in the days. But in OT you have first differences, you have clear text. So, you can do Wireshark sniffs, you can do packet captures. That's why we always have network intrusion detection systems and never endpoint intrusion detection systems. But the whole process is supposed to be the same, but the technicalities are different. So, you also need a different kind of skill set because Stephan, have you ever heard of IEC 62?

Stephan: Actually, I did, but I would not reckon myself to be an OT expert. I'm very happy to have at least some experts that are capable of that. There is a community of experts building up all over the country, forming a team or a community to defend against threats to target exactly these kinds of words, these kinds of protocols, of course. But yeah, it's on the technical level very, very different than from typical IT incident response. Although I would say even IT incident response has changed quite drastically during the last years. Because you mentioned analyzing single systems, looking at memory dumps, et cetera, et cetera, which changed on one hand maybe with EDR to have telemetry much quicker, of course, in a broader spectrum. But also, forensic tools changed, and incident response tools, I would like to say, changed to really support the increasing rates of complexity and of sheer size that company networks have nowadays.

So, it's quite common for us, for example, to perform the investigation in parallel across a couple of thousand endpoints or generally assets or a couple of assets.

Simon: Which was not possible a few years ago.

Stephan: Yeah, correct, correct. So, my question to you then would be, Simon, how does this compare to OT? Are we there yet when it comes to scalability or is it totally different and not that huge of a problem because you focus on network instead.

Simon: This was exactly my next point because the ominous OT-IT convergence has been thrown around for a few years now and it also has been going on quite rapidly. So, we also have OT-specific active directory trees, OT-specific I'd like to say Windows PCs that are branded as engineering workstations, but it's basically just a Windows PC with specialized software on it.

Stephan: Sorry, Simon. Specialist software. mean like just it receives no updates, right?

Simon: No, sorry. DigiSys and other stuff. But there are also Windows PCs that are on long-term support. I would call it live support because Windows 10 is going out of service soon or has already. I don't know the exact time but it's somewhere around beginning Q1 2025. But the IT OT conversion also brings these forensic tools into the OT. And if you have an OT incident response, it's not only OT incident response. Of course, if PLCs and IEDs are attacked, we need to have the specialties of looking at the network traffic, going into IEC 104 packets, going into MMS packets, having historian data, and so on and so forth. But at the end of the day, an attacker always comes from somewhere. Most of the time, this somewhere is from IT through the Purdue levels. And that's why we also have to have the IT tools and IT tool set. But at the end of the day, if you are critical infrastructure and if you're part of the energy sector, you're most likely critical infrastructure, the goal will always be somewhere around Purdue level zero, one or two, going for the assets. And this is where we also need the special tool set and the special capabilities of our engineers in the field.

So, for me, I would say I have quite a lot of experience in the OT space, but I’m not as experienced as a production engineer, for example. And this is where we are dependent on the people in the field.

What we also have differently, because we both mentioned EDR. EDR is something that you do online.

Stephan: Not necessarily. It gets fewer and fewer by the day, to be honest. But there are at least some vendors that still provide EDR services on purely on-premises basis. Not that many, to be honest, but there are some.

Simon: Are they good?

Stephan: It depends. (Laughs)

Simon: That's what I was expecting.

Stephan: Some are good, so don't get me wrong. Some are really good.

Simon: I mean, for me, having a packet capture and going through the packets and looking for the commands, for example, shutdown commands, image upload commands and so on and so forth, it's an offline task basically. There is no way to have online network traffic capture analysis in this space because you can't connect directly to a pump or directly to a photovoltaic device or something. I mean, you can connect to PLCs, you can connect to other devices via MMS. But at the end of the day, you don't want to impact on productivity, you don't want to impact on the core systems. But let’s head back into the three steps that we started with. So, we have an identification process that we started in the beginning and then we need to contain, we need to eradicate and then we need to recover. Recovery is also something we are going to talk about in the next episode in more detail. But you also said that it's a process that is iterative. So how do you decide what to keep online and what you shut down and how to decide which parts of the company can be still intact or which ones need to be rebuilt or something.

Stephan: Yeah, as I said before, in my opinion, it comes down to not a simple but a more complex risk calculation, right? Because you do have to take several options and several key factors into account. Like, does the attacker already know that you're on him? So, is there already an active game of, let's say, IT chess between attacker and defenders? And on the same hand, it's very important to know what’s the business model of the company.

Simon: And, the business model of the attacker. Does he want just encryption and ransom or are they from a nation state and want to impair the country's energy supply and so?

Stephan: Yeah, totally true. That's a key point because then the better you know about your enemy and the better you know yourself, the better your decision making will be. So, it's quite important to be aware of what you're facing to make the right decisions, honestly. So, what I believe is sometimes incident response when it comes to decisions about containment and eradication might be quite like OT, but most of the time businesses rely on integrity and authenticity. Of course, depending on the business model, it might be when we think about e-commerce, for example, it’s more dependent on availability, which is then a bit more like the OT environment because in OT usually it's about availability and safety I think. Of course, depending on who you ask it, either safety with number one or availability is number one. So, it's always a bit dependent, I would say. But that's why it's also sometimes different in the containment phase.

So, going back to IT. If I think about containment measures in IT, then it's always also a question of how important availability is, right? And sometimes we had incidents where we were able to just completely shut down the internet connection over the weekend or even, I don't know, for four days and were able to solve this out without any pressure, etc. Because it was not that urgent for the company to be online again. But of course, sometimes there is the situation that you don't have the option to go offline at all. And what this means is that your containment measures must be very precise. And of course, you must be aware that there will be a counter reaction by the attacker on it.

Simon: I think you mentioned some very important points, especially if you can go offline, is there another way to do business? If you're not able to go offline, what type of behavior would you need to look out for from the attacker side? And you always have to expect a reaction from the attacker because it comes back to what you said in the beginning. There is a human being on the other side. So, we are not fighting against computer programs, at least not yet, maybe in the future computer program versus a computer program. But currently we're still fighting against real human beings with an agenda, and they want to win this fight that IT teach us. So, I see similarities, especially when I think about a battery station. that does day trading on energy. So, their main purpose is trading energy and gaining money through trading. And on the other hand, we have substations that provide energy to a certain area that you cannot shut down. I mean, you can shut down the internet connectivity and you can go in a manual mode if it's possible, but you cannot shut down a substation most of the time, I know our listeners will think about, ah there are different substations that can jump in. But in our scenario, let's say, it's just one.

Stephan: Or at least let's say it's a bit more complicated to decide whether it's possible to shut it down or not.

Simon: So, the decision also needs to be made by a decision maker that has authority to decide. I mean, for us it's clear, but when you're out in the field and everything is hectic, it's not clear who is allowed to decide what. And this is something that comes down to preparation again, having the right people in the room and having the right decision makers make the decision.

Stephan: Definitely. Because in the end, you always have to remember, even if you're in a crisis that involves cyber or IT security or OT security, um remember that the security team is just a technical part of the overall crisis board. So, it's not like the ah security team are superheroes that decide whatever they want and think is the best for either themselves or the ah company or organization power plant etc. But in reality, it's a decision by the crisis board and of course there needs to be an alignment between technical security teams and the overall crisis board. But these are the points that need to be taken into the incident response plan and playbook to be quite sure about what is allowed, maybe what's pre-authorized to do and what needs to be decided by the crisis board.

Simon: And this is also the reason why, especially during an incident, OT personnel is needed as part of the crisis board of the incident response team to provide crucial information and skills which we can base the decisions on. So that we can guarantee safe operations, and this can only be done by the experts in the field.

Stephan: Yeah. And honestly, I think it's a very difficult task to try and translate the technical fundamentals or the technical factors and the risks involved from a security perspective from within the OT environment to a high-level management crisis board. It sometimes gets difficult in the IT already, but in the OT, I would say it's on a whole different level.

Simon: And with that, thank you Stephan for taking the time to talk with us. Any last words?

Stephan: It was a pleasure. As I said in the beginning, it was a pleasure throughout. So, thank you for the interesting questions and the interesting discussion.

Simon: Thanks again Stephan and with that I'm heading back to Scott.

Scott: Thank you, Simon, for hosting this and upcoming episodes of your Energy Talks podcast miniseries titled, Why Should You Talk about Incident Response? We look forward to listening to further discussions about this important area of cybersecurity.

And to our audience, a big thank you for listening to this and other episodes of Energy Talks! We always welcome your questions and feedback. Please send us an email to podcast@omicronenergy.com.

OMICRON has several years of experience in power system testing, data management, and cybersecurity and offers the matching solution for your application. For more information, please visit our website at omicronenergy.com.

Please join us for the next episode of Energy Talks and stay tuned for future episodes of our miniseries titled, Why Should You Talk about Incident Response, with Simon Rommer.

Goodbye for now, everyone!

Have you listened yet?

Resources