Lessons Learned in Detection Engineering

What I’ve learned from “good” intrusion detection programs.

Ryan McGeehan
Starting Up Security

--

Every security program eventually gains the ability to “detect” bad things happening on their systems. This creates the burden of manual analysis and escalation, and it’s hard to do well.

More recently in my career, I’ve abandoned the desire to run a security team in favor for exploring how security teams run.

This has given me wonderful exposure to varying qualities of intrusion detection approaches, both good and bad, and these are the notes I’ve put together on what qualities describe high functioning detection teams. Some of what follows will describe the trajectory of several roadmaps and where security teams hope to be.

We’ll be talking through detection infrastructure that is dependent on logs with rules that trigger automation, prepares leads for hunts, or raises alerts.

Rough language for a detection engineering team

The traditional SOC is more often an on-call rotation.

Great teams are not solving detection problems with analysts.

When a human being is needed to manually receive an alert, contextualize it, investigate it, and mitigate it… it is a declaration of failure. Newer infrastructure has plenty of opportunity to manage this churn and resulting fatigue significantly.

Weaker teams view this mechanical turk as an asset instead of a symptom. Stronger teams do not focus on developing a fenced-in SOC that gives them an alerting route to be manually analyzed, they prefer humans to be entirely out of the solution.

On-call alert rotations make it possible to continue hiring a few skilled engineers instead of many analysts. Then, the challenge is making sure this on-call is not overloaded with garbage.

The “law of the lever” is respected.

Great teams are aware of where, and how, analysis work is being created.

One person, with the right lever, can lift the weight of multiple people. This law applies to application of force.

Similarly, one person can generate work for many others. Let’s call this the work lever.

Security can be viewed in terms of individual work that is divided into risk discovery and mitigation. Building and breaking. Alerting and analysis.

Discovery of risk usually creates a lot of mitigation work.

It is usually easier to discover vulnerability than to mitigate it. For instance, one hour of vulnerability hunting done by yourself may impose weeks of mitigation work for someone else.

Risk assessment work is similar. The work product of a risk assessment (done by a few) should create a long and thorough roadmap of mitigation projects (for many).

This applies in detection and logging as well.

The time spent creating a poor quality detection rule will likely create a significant amount of work for someone responding to the follow up alert. A single person and a poorly tuned IDS can overwhelm others with lousy rules and noise.

Similarly, a single person can flood centralized logging with useless data, assuming someone else’s job is to make sense of it all or that you’ll buy a magic AI product that will produce value out of it for you.

Better detection infrastructure is developed by engineers who are hyper-aware of this “work” imbalance created by finding/breaking, and do their best to avoid the creation of wasteful, downstream work. They will limit the inbound discovery of risks so they have a better chance of mitigating what they’ve already found.

Removing any separation between engineer and analyst will help avoid conflicting incentives around rule creation and case closing. Having an on-call on rotation will keep your team on dogfood as they will be responding to the alerting pipelines they will create themselves. All participants will own their own destiny, and won’t feel victim to outside influence. This feeling is a major contributor to SOC burnout, not being in control of your own work.

Rules are codified and subject to peer review.

Great teams are pulling from their engineering culture and standards.

Most vendor security tools lend themselves to point-and-click-the-textfield rule creation amongst a few individuals. This discourages peer review.

The work lever is exacerbated negatively when the rules go over the fence to a completely different group or on-call, with no feedback loop or quality control.

Peer review is valuable when the detection stack allows for it. In this case, peer review isn’t valuable to avoid buggy code, instead, it helps enforce a standard as to what makes a good rule and would prevent churn on the other end.

This is fairly easy to accomplish with a detection stack involving Elastalert, since all of its rules are .yml files. Splunk can read alerting configs from savedsearches.conf. Phantom playbooks can entirely live within a repository as well.

Phantom, being fairly new, I will point out is an orchestration layer on top of rule building, as described in the next section, giving more opportunity to kick off automation from a rule.

Higher quality detection infrastructure should be repository managed and follow well known engineering standards. Most infrastructure is moving into configuration as code anyway, so security should not be an exception. Accountability of rule development is improved, and linting / unit testing becomes centralized as well.

Rules trigger automation before alerting.

Great teams prepare the on-call analyst with as much information as possible.

Alerts delivered to an analyst without any sort of pre-processing will be highly vulnerable to a false positive resolution and will wear down an analyst downstream.

There are a few ways to consider standards for preprocessing and alerting.

You should decorate alerts. This describes a standard of detail where an alert brings additional information to the analyst without requiring extra work. This helps avoids “tab hell” where an analyst needs to be logged into several tools to follow up on an incident, just to know what is going on.

For instance, you can include a screenshot of the phish, check browser blacklist information, automatically answer “how many other hosts have visited this site” from DNS or netflow, etc.

An alert is not useful unless it carries all of the context an analyst would need to understand to declare an incident or not. A rule should trigger automation that pulls in corresponding information, including log snippets, translation of IDs or employee names, hostnames, opinions from threat intelligence, etc.

You should mitigate, when possible, automatically. For instance, rules can immediately trigger mitigation steps, like a “report a phish” address that triggers a “did you enter your credentials” workflow resulting in a lockout or password reset. This will reduce the follow up actions of an on-call, the incident timeline, and the median time of response.

Additionally, previous decoration may allow you to automatically reconsider the severity of an alert, and close if it doesn’t meet your standards.

You should escalate creatively. For instance, you can send an alert to be available to someone on a rotation on an unguided hunt for malice. Or, perhaps, the impacted employee themselves. Let’s explore these routes, starting with hunting.

Hunting makes a great sandbox for new rules and new alerts.

Great teams will prefer to guide hunts, instead of triggering on-calls.

As I’ve repeated throughout this article, it’s easy to create a rule that slams multiple analysts with work. To add to this, it’s also a misconception is that every alert must become a “task” for an analyst to review and close out.

Better detection teams have some type of “hunting” function, whether it is a rotating on-call or a pizza and beer hackathon. Early in FB’s detection engineering around incident response, our CERT dedicated Wednesdays to hunting.

In the last year, mostly from incident work, I’ve become an advocate for free-form hunting on a security team that values detection capability. I believe in this because I’ve seen how freestyle hunting can still beat sophisticated detection infrastructure in discovering an adversary. It’s a great way to develop and improve rules, understand an environment, and potentially beat your own detection stack in discovering badness.

Haroon Meer discusses at length the importance of knowing your own network.

If your security team is sophisticated enough to build detection program, I think regularly occurring, semi-guided hunting is critical enough to be formalized within it.

Rather than sending all alerts to an analyst queue for basic FIFO investigation, consider some rules as more appropriate for an ongoing, regularly occurring hunt rotation instead.

Allow hunters to determine if they will find it useful in looking for threats. Leave it to their own discretion to follow up on them, and promote them as material that can be loudly alerted on to a real on-call, or demote them for further development to avoid wasting time.

If this is too liberal, give these alerts a reasonable due date that fits.

If a hunter won’t find it useful, they will be well prepared to have feedback on further decoration, automatic mitigation, other escalation paths, refactoring or elimination.

The Summit Route blog has a great article on quality alerting, which is a good follow up on strict alert creation.

Employees are a part of the alert escalation path.

Great teams have great co-workers who can help respond to alerts.

It’s useful to involve an employee in some investigations if they can provide exponentially faster context than a manual, tool driven investigation would provide. An employee has a stronger work lever than an analyst, when an alert involves them.

For instance, imagine a Slack bot that can automatically, or when prompted by an on-call, ask an employee a question like the following?

When they’ve VPN’d from a strange location/country:

Slackbot: “Hey, are you traveling?”

When they’ve done something against policy:

Slackbot: “Hey, did you just need to sudo in production?”

The work lever here is nice, mainly because an employee typically has more information than an analyst. There may be corner cases where this is less inappropriate, like an internal threat. In general, this escalation path to add context to a rule or alert is great.

There is lots of public work to point you to on this subject. Diogo Monica, while at Square, has a great talk on this concept, covering many tools and approaches taking advantage of this.

How engineers at Square will review their own alerts.

Ryan Huber at Slack talks about this at length as well, with this awesome screenshot:

And Dropbox has just recently released a tool inspired by Ryan’s post.

Information is captured on closed alerts.

Great teams don’t pass bad alerts from one on-call to the next.

The reason an alert was closed should be captured for measurement. This is an important aspect of a feedback loop that takes us closer to quality alerting. You want to develop language to understand why an alert was closed. For instance:

  • The event did not identify the event it was intended to.
  • The event it identified looked malicious, but wasn’t.
  • The event identified malicious behavior, but it was not worth an alert.

There’s many reasons an alert may be closed, and this feedback loop will help teach standards to be enforced in the development process through linting or peer review.

Passing this data to the immediately following on-call will also help avoid churn while an alert can be fixed, as a new on-call will be guided by data provided from the last on-call on alert accuracy. A bad alert shouldn’t impact more than one rotation.

My observation is that this is usually the weakest aspect of the strongest programs. It is usually heavily patchworked, or enforced tribally.

Frequently used investigative tools are integrated.

Great teams investigate incidents collaboratively and transparently.

Hunting should be efficient, as should any on-call time being spent diving into an alert. Instead of burying yourself in browser tabs for different investigative tools, many stronger detection teams are integrating their tools in a single place.

More and more often this results as a Slack bot, for the added benefit of a more public investigation amongst a team. Scott Roberts has released a lot on this subject.

Detection flows are integration tested constantly.

Great teams don’t wait for an incident or red team to see how they’re doing.

The problem set of detection is a complicated integration challenge. So why don’t more detection teams focus on integration testing? This is an incredibly interesting space in intrusion detection, to treat detection the same way you’d treat a build pipeline supporting by CI/CD platforms like Jenkins.

There’s a few way you would test this, today.

Firefighting. Hopefully you don’t have so many real incidents that your detection infrastructure is continuously battle tested. At least you know about them, I guess.

Tabletops. It’s very simple to test core detection pipelines with simple scenarios. “I’ll watch logs while you install a helloworld python script to LaunchAgents.”

Red teaming. This is expensive and scales poorly. You can do maybe once/twice a year. This is more useful for response procedure than an integration test. It will also help inspire creative rule creation.

Detection Tooling. You write software to directly attack the detections you’ve created, like a true unit test. Many of these are would be simple to write, like IAM permission failures with AWS. AttackIQ is a company I’ve been helping, and they develop more sophisticated test scenarios for aggressive tooling an adversary would use that you wouldn’t write yourself.

Chris Gates and Chris Nickerson have a great talk covering these concepts at length.

This area is probably the most important to me, because it’s something I didn’t stay at Facebook long enough to see through myself. The biggest takeaway from our red team exercises was how fragile large detection infrastructure can become. I cannot begin to describe the frustration you’ll feel when an active adversary generated activity that should have been observable, but was missed because of a broken log pipeline.

This should be approached similarly as a modern engineering organization would: with a culture around unit and integration testing, and continuous visibility into what is failing.

Realtime detection isn’t forced.

Great teams don’t pretend that every alert is worth being paged.

A conveniently timed tweetstorm from Chris Sanders sums up some good observations nicely.

My sentiment is similar, mostly founded upon the observations that the bulk of most alerting is rarely actionable or malicious. Instead of FIFO’ing through any and all alerts, I would rather have the discretion to hunt with rules in development. Not every alert should be treated as a first class citizen, and should be swept away for improvement.

If someone hunting is betrayed by a poorly crafted rule, it should not be an unrealistic expectation to ignore that alert. This should be the beginning of a feedback loop to improve detection.

It is not uncommon to come across security teams that have buried themselves with tasks created by their own alerts. This is terrible self-management, but I think it is often fueled by guilt of ignoring an alert, and the guilt of accepting anything other than “realtime” as a standard.

Conclusion

Detection, even when modernized, is still incredibly tough. We still discover incidents outside of detection pipelines, we still find incidents through freeform hunting, and we still have plenty of false positives.

While these concepts may improve your detection program in some way, you should still invest in quality incident response as well.

@magoo

I write security stuff on medium.

--

--