What is alarm fatigue in cyber security?

Cybersecurity teams often start out purchasing a SIEM solution with high hopes they have the staff, training, capability and organisational resources to get the most out of their new platform.

Unfortunately, in many cases the reality is that an already overstretched security team does not have the time needed to manage a SIEM, investigate alarms, manage rulesets, juggle compliance requirements, and ensure continuous proactive monitoring. In the short term, the gap can be plugged through vendor professional services. However, this needs to be factored into costs on a rolling basis as the organisations attack surface evolves and is not a viable long-term solution.

One of the main contributors to why these internal deployments fail, have more hidden costs, and take more time than expected to utilise is alarm fatigue. In the following post, we’ll investigate what alarm fatigue is and the possible causes.

What is Alarm Fatigue?

Alarm fatigue is caused by an operator being exposed to a large number of alerts in a short period of time, causing an overload of information and resulting in a reduction in the ability to prioritise more critical alerts.

The impact of alarm fatigue in cybersecurity is that the quality of investigations will drop and Mean Time to Respond (MTtR) will increase. The vast number of alarms makes the ability to sort true positives from all the noise difficult and so the potential to miss malicious activity will also increase. From a people perspective, your team will burn out and have lower job satisfaction, leading to high staff turnover.

A common example sees an analyst spending 90% of their role investigating a lot of menial, repetitive account audit alarms with little analytical value (that could be tuned and automated) that turn out to be false positives. Of course, this is a worst-case scenario, and one that echoes well with many SOC professionals.

The Psychology of Alarm Fatigue and Analyst Burnout

Alarm fatigue in cybersecurity causes a form of cognitive burnout in analysts leading to analytical desensitisation and reduced capability of a SOC team to perform to standard; it affects not just the department but the individual.

In psychology, this is the phenomena known as Semantic Satiation, a possibly similar cognitive form of reactive inhibition. Semantic Satiation was first characterised in 1962 by Psychologist Leon Jakobovits James and is described as:

“Repetition causes a word or phrase to temporarily lose meaning for the listener”.

It can apply to looking at words, lengthy investigations and correlations too - the more you’re exposed to a particular activity, the more you adapt, normalise and begin to disregard it based on past experience.

Ultimately, it means alarm fatigue results in semantic satiation, then sub-par detection, interpretation and response to critical attacks and alerts – it is not where we want to be as a Managed Service Provider and certainly not for a SOC service, even an internal one.

4 Key Causes of Alarm Fatigue in the Cybersecurity Process

Now that we have a good understanding of what alarm fatigue is and the psychology involved, it’s important to understand the key causes of alarm fatigue for a SOC, the counter-productive impacts it has and why it is synonymous with failure.

1. Poorly-Tuned Rules & high amount of False Positives

High amounts of false positives caused by poorly tuned rules that aren’t applicable to the monitored environment is one of the prime suspects. If an analyst is spending most of their time investigating false positives, that is less time spent investigating true positives where their time is best placed.

In a security operations environment, especially for an MSSP, this can have contractual impacts that results in lost trust, brand reputation and revenue.

2. Inadequate Tuning & Rule Management Policy

Rule management policies feed into change control, keep your alarms up to date and detecting the most prolific threats, give analysts the right information at the right time, and are the bread & butter of detecting attacks.

If we do not have a defined cybersecurity process for frequently reviewing existing rule sets, aging out obsolete alarms, tuning noise contributors and introducing high-fidelity alarms then we not only risk increasing the likelihood of alarm fatigue, but also our ability to detect the most prominent threats.

Having a well-defined, cross-functional Rule management policy is absolutely critical to getting the most out of a SIEM – think of how quickly a preventative AV product receives updates, we should be looking to frequently update our reactive controls too.

3. Too Many Manual Alarms

Not every alarm needs to be manually investigated and often at times, automation can be utilised with the more menial alerts. If we have compliance alarms that are categorised as informational only, having systems in place to automatically generate an informational security incident saves up time for your analysts and avoids burnout caused by manually investigating each occurrence of these.

For example, a requirement of PCI-DSS monitoring includes reporting on every administrative action performed by a high privileged account; this should be an automated report and not on analyst to raise each and every activity of that administrator.

The impact of having no SOAR or in-house automation is that every alarm has to be investigated manually, each incident has to be raised manually, and correspondence with the relevant teams needs to be handled manually.

If alarms are poorly configured and manual, it is going to result in alarm fatigue.

4. Lack of Job Task Rotation in the cybersecurity process

Ideally, 40% of an analyst’s time should be dedicated to alarms, with the rest of their time aligned to threat hunting for emerging threats, reviewing threat intelligence to create advisories for stakeholders on the latest threats, and on projects to enhance capability and service.

If an analyst’s job is dedicated only to looking at alarms, not only do we miss out on these benefits, it increases the likelihood of alarm fatigue - it is lacks professional interest and reward and progression is likely to be limited. An analyst dedicated day in and day out to alarms does not have the capacity to keep learning, develop and become a better security professional, increase organisational knowledge and improve the service we offer to keep ahead of competitors. This is a recipe for disaster, high turnover and extra recruitment costs especially during remote working and the isolation inherent with the current pandemic.

5 Tips to Reduce Alarm Fatigue in a SOC

As we discussed in our alert ranking post, one of the key aspects of improving SOC alert services and reducing alarm fatigue is alerting prioritisation and having a robust process for reviewing alerting fidelity, receiving feedback and implementing whitelisting.

However, there are some important tips to consider if we want to go further and reduce alarm fatigue for our analysts:

1.Job and Task Rotation – variation in daily assignments

As alert fatigue is both a technical and process problem that directly negatively impacts analysts, most of the solution should revolve around improving the situation for them.

Separate each day with no concurrent days on alarms, this can be varied depending on analyst to maintain monitoring coverage. For example, one analyst could do the following whilst the other alternates with them:

- - Monday; Alarms
  - Tuesday; Reporting
  - Wednesday; Alarms
  - Thursday; Training
  - Friday; Alarms

2. Ensure monthly training and project days

Ensure monthly training and project days to break up workload, projects are especially rewarding for development and department capability improvement. Training can include external certification or on newly introduced tooling. LRQA have been utilising Elastic for statistical analysis and to act as a data lake, we sent a number of analysts on Elastic stack training to support introduction of these tools.

3. Reporting Days

Have reporting days for any management reports separated out from alarms days where possible.

4. Threat Hunting and Threat Intelligence

Threat Hunting and Threat Intelligence days are critically important, not only from a detection stand point, but from a learning and development standpoint too.

Assigned days to hunt for new threat actor tactics, techniques and procedures (TTPs) that aren’t covered by alarms, then implement alarms to detect those TTPs and keep detections current
Threat Intelligence review days to review the latest industry threats attacking your organisations market and sectors, update any blocklists with new Indicators, create new advisories for any critical vulnerabilities and enable early warning of upcoming attacks so that preventions can be implemented in advance

5. Cross-Training days

Cross-Training days are also really useful to expand analyst capability, improve redundancy and general skills levels. If you have a NOC, cross functional days for an analyst to get involved in supporting network device updates and configuration, deliver a training workshop on a particular subject or brief a project they have been working on for an upcoming change

Blocking out daily time and rotating hours

We can take daily variation of tasks to a new level by blocking out daily tasks across hours. For instance, if we have a 24x7 team that works continental 12-hour shifts, these can be blocked out into 4-hour sections.

Each section can be dedicated to a particular task and rotated with other analysts ensuring no more than 8 hours per day on alarms per any analyst. For example:

Block 1: Alarms
Block 2: Project
Block 3: Alarms

This results in greatly improved job satisfaction, a vast decrease in the likelihood of fatigue and decreased burnout.

Tuning and Alarm Management Policy

We need to have a well-defined tuning and alarm management policy that is clearly communicated, well understood and backed by training & processes.

This allows analysts to know the process to follow when they’re met with a repetitive alarm with little to no investigative value, they’re closest to the action so will know best, and it allows them to take ownership of getting these whitelisted or tuned out.

A good way to start is to have a weekly cross-functional review meeting to cover off current detections, bringing in security engineers, developers and analysts. Look at statistics for the top firing alerts and identify tuning opportunities over the events that have fired over X amount of time, answering questions such as:

What were our top 10 alarms in the past week?
Of those, what % were false positives?
How many resulted in Incidents?
What tuning do we need to put in place to remove these?

Next, we’ll want a change process as part of the Tuning and Alarm management policy that is simple for analysts to understand, utilise and submit changes into. This could be coupled with a visual process workflow on a knowledge base to better illustrate the process and a front-end portal, security engineers can then implement these during the weekly meeting.

Automation: Enrichment, Correlation and Visualisation in cybersecurity

According to Google Trends, Automation and machine learning have seen a gradual increase in interest over the past 5 years, and this is even more true in the Cybersecurity space.

New compliance regulations like GDPR and the rise of online shopping brought Security Operations Centres to the forefront of many companies’ strategies, meaning many started to find alarm fatigue as being a key problem affecting the success of their new services.

One of the key solutions in combating alarm fatigue is using automation and machine intelligence to contextualise, aggregate and visualise alerts to improve the speed of investigations, mean time to respond and enable at a glance-decisions on the contents of those alarms.

LRQA SOC utilises automation to enable automated enrichment of IP addresses, threat intelligence lookups, websites, alerts and blacklist monitoring. We use machine intelligence to baseline hosts and users, then alert us on activity that is outside those baselines. We also have a number of automated actions on the more granular investigative actions analysts have to perform to speed up investigation – all of these helps improve the analyst experience and lessen alarm fatigue.

A good start is to look to fully automate account audit alerting, meaning these alerts are automatically investigated and raised as incidents if they match a specific use-case. If your SIEM offers functionality to visualise event trends, identify spikes, changes in user/host behaviour then even better. If not, it may be worth investigating SOAR solutions that can plug into your SIEM and provide this capability.

The aim is to save analyst time, and keep them focused on investigations that pertain to truly suspicious activity that can’t be completely automated, not on menial high-level alerting.

Lack of Job Task Rotation in the cybersecurity process

Summary

It’s essential that any effort dedicated to reducing alarm fatigue is completed on an ongoing basis, like many things in cybersecurity, the rate of change is very fast. It takes dedication from a technical, process and administrative standpoint to respond to these changes and minimise the likelihood it affects your team. An absolutely key element of the process is involving your analysts before any change in working, they need to be part of the process and feedback any suggestions or ideas they have when we communicate it to them because they are the ones handling daily alarms. Even better, invest in a new coffee machine for the office and have a bi-weekly Starbucks run, it feels good to be appreciated.

Want to know more? Don’t hesitate to get in touch with our LRQA team. Alternatively, explore our SOC 24x7 Monitored Services.