Wednesday, September 25, 2024

Agentic AI in SOCs: A Solution to SOAR's Unfulfilled Promises

Security Orchestration, Automation, and Response (SOAR) was introduced with the promise of revolutionizing Security Operations Centers (SOCs) through automation, reducing manual workloads and enhancing efficiency. However, despite three generations of technology and 10 years of advancements, SOAR hasn't fully delivered on its potential, leaving SOCs still grappling with many of the same challenges. Enter Agentic AI—a new approach that could finally fulfill the SOC's long-awaited vision, providing a more dynamic and adaptive solution to automate SOC operations effectively.

Three Generations of SOAR – Still Falling Short

SOAR emerged in the mid-2010s with companies like PhantomCyber, Demisto, and Swimlane, promising to automate SOC tasks, improve productivity, and shorten response times. Despite these ambitions, SOAR found its greatest success in automating generalized tasks like threat intel propagation, rather than core threat detection, investigation, and response (TDIR) workloads.

The evolution of SOAR can be broken down into three generations:

  • Gen 1 (Mid-2010s): Early SOAR platforms featured static playbooks, complex implementations (often involving coding), and high maintenance demands. Few organizations adopted them beyond simple use cases, like phishing triage.
  • Gen 2 (2018–2020): This phase introduced no-code, drag-and-drop editors and extensive playbook libraries, reducing the need for engineering resources and improving adoption.
  • Gen 3 (2022–present): The latest generation leverages generative AI (LLMs) to automate playbook creation, further reducing the technical burden.

Despite these advancements, SOAR's core promise of SOC automation remains unfulfilled for reasons we will discuss shortly. Instead each generation has primarily improved operational ease and reduced the engineering burden of SOAR and not addressed the fundamental challenges of SOC automation.

Why Didn't SOAR Succeed?

When seeking to answer the question "of why SOAR hasn't tackled SOC automation'", it can be helpful to remember that SOC work is made up of a multitude of activities and tasks which are different across every SOC. Generally though, SOC automation tasks involved in alert handing fall into two categories:

  • Thinking tasks – e.g. figuring out if something is real, determining what happened, understanding scope and impact, creating a plan for response, etc.
  • Doing tasks – e.g. taking response actions, notifying stakeholders, updating systems of records, etc.

SOAR effectively performs "doing" tasks but struggles with the "thinking" tasks. Here's why:

  • Complexity: The thinking tasks require deeper understanding, data synthesis, learning patterns, tool familiarity, security expertise, and decision-making. Static playbooks are difficult, if not impossible to create which can replicate these traits.
  • Unpredictable Inputs: SOAR relies on predictable inputs for consistent outputs. In security, where exceptions are the norm, playbooks become increasingly complex to handle edge cases. This leads to high implementation and maintenance overhead.
  • Customization: Out-of-the-box playbooks rarely work as intended. They always need customization due to the previous point. This keeps maintenance burdens high.

It is by automating "thinking tasks" that more of the overall SOC workflow can be automated.

Investigation: The SOC's Weakest Link

The triage and investigation phases of security operations are filled with thinking tasks that occur before response efforts can even begin. These thinking tasks resist automation, forcing reliance on manual, slow, and non-scalable processes. This manual bottleneck is reliant on human analysts and prevents SOC automation from:

  • Significantly reducing response times—slow decision-making delays everything.
  • Delivering meaningful productivity gains.

To achieve the original SOC automation promise of SOAR—improving SOC speed, scale, and productivity—we must focus on automating the thinking tasks in the triage and investigation phases. Successfully automating investigation would also simplify security engineering, as playbooks could concentrate on corrective actions rather than handling triage. It also provides the possibility for a fully autonomous alert-handling pipeline, which would drastically reduce mean time to respond (MTTR).

The key question is: how do we effectively automate triage and investigation?

Agentic AI: The Missing Link in SOC Automation

In recent years, large language models (LLMs) and generative AI have transformed various fields, including cybersecurity. AI excels at performing "thinking tasks" in the SOC, such as interpreting alerts, conducting research, synthesizing data from multiple sources, and drawing conclusions. It can also be trained on security knowledge bases like MITRE ATT&CK, investigation techniques, and company behavior patterns, replicating the expertise of human analysts.

What is Agentic AI?

Recently, there has been tremendous confusion around AI in the SOC, largely due to early marketing claims from the 2010s, well before modern AI techniques like LLMs existed. This was further compounded by the 2023 industry wide mad dash to bolt an LLM-based chatbot onto existing security products.

To clarify, there are at least 3 types of solutions being marketed as "AI for the SOC". Here's a comparison of different AI implementations:

  • Analytics/ML Models: These machine learning models have been around since the early 2010s and are used in areas like UEBA and anomaly detection. While marketers have long referred to these as AI, they don't align with today's more advanced AI definitions. This is a detection technology.
  • Analytics solutions can improve threat detection rates, but often generate numerous alerts, many of which are false positives. This creates an additional burden for SOC teams, as analysts must sift through these alerts, leading to increased workloads and impacting productivity negatively. The net effect is more alerts to triage, but not necessarily more efficiency in the SOC.
  • Co-pilots (Chatbots): Co-pilot tools like ChatGPT and bolt-on chatbots can assist humans by providing relevant information, but they leave decision-making and execution to the user. The human must ask questions, interpret the results, and implement a plan. This technology is typically used in the SOC for post-detection work .
  • While co-pilots improve productivity by making it easier to interact with data, they still rely on humans to drive the entire process. The SOC analyst must initiate queries, interpret results, synthesize them into actionable plans, and then execute the necessary response actions. While co-pilots make this process faster and more efficient, the human remains at the center of the hub-and-spoke model, managing the flow of information and decision-making.
  • Agentic AI: This goes beyond assistance by acting as an autonomous AI SOC analyst, completing entire workflows. Agentic AI emulates human processes, from alert interpretation to decision-making, delivering fully executed work units. This technology is typically used in the SOC for post-detection work. By delivering fully completed alert triages or incident investigations, Agentic AI allows SOC teams to focus on higher-level decision-making, leading to exponential productivity gains and vastly more efficient operations.

Now that we have clear definitions of several common implementations of AI in the SOC, it can be important to know that a given solution may include multiple, or even all of these categories of technology. For example, Agentic AI solutions often include a chatbot for threat hunting and data exploration purposes, as well as analytic models for use in analysis and decision making.

How Agentic AI Works in SOC Automation

Agentic AI revolutionizes SOC automation by handling the triage and investigation processes before alerts even reach human analysts. When a security alert is generated by a detection product, it is first sent to the AI rather than directly to the SOC. The AI then emulates the investigative techniques, workflows, and decision-making processes of a human SOC analyst to fully automate triage and investigation. Once completed, the AI delivers the results to human analysts for review, allowing them to focus on strategic decisions rather than operational tasks.

The process begins with the AI interpreting the meaning of the alert using a Large Language Model (LLM). It converts the alert into a series of security hypotheses, outlining what could potentially be happening. To enrich its analysis, the AI pulls in data from external sources, such as threat intelligence feeds and behavioral context from analytic models, adding valuable context to the alert. Based on this information, the AI dynamically selects specific tests to validate or invalidate each hypothesis. Once these tests are completed, the AI evaluates the results to either reach a verdict on the alert's maliciousness or repeat the process with newly gathered data until a clear conclusion is reached.

After completing the investigation, the AI synthesizes the findings into a detailed, human-readable report. This report includes a verdict on the alert's maliciousness, a summary of the incident, its scope, a root cause analysis, and an action plan with prescriptive guidance for containment and remediation. This comprehensive report provides human analysts with everything they need to quickly understand and review the incident, significantly reducing the time and effort required for manual investigation.

Agentic AI also offers advanced automation capabilities through API integrations with security tools, enabling it to perform response actions automatically. After a human analyst reviews the incident report, automation can resume in either a semi-automated mode—where the analyst clicks a button to initiate response workflows—or a fully automated mode, where no human intervention is needed. This flexibility allows organizations to balance human oversight with automation, maximizing both efficiency and security.

Can We Really Trust AI for SOC Automation?

A common question in the security industry is, "Is AI ready?" or "How can we trust its accuracy?" Here are key reasons why the agentic AI approach can be trusted:

  1. Thoroughness of Work: While human analysts can conduct deep investigations, time constraints and large workloads often prevent these efforts from being exhaustive and frequently performed. Agentic AI, on the other hand, can apply a broad range of investigative techniques to every alert it processes, ensuring a more thorough investigation. This increases the likelihood of identifying the evidence needed to confirm or dismiss an alert's maliciousness.
  2. Accuracy: Modern AI is powered by a collection of specialized, mini-agent LLMs, each focusing on a narrow domain—whether it's security, IT infrastructure, or technical writing. This focused approach allows the agents to pass work between one another, similar to microservice architectures, preventing issues like hallucination. With accuracy rates in the high 90%, these AI agents often outperform humans in repetitive tasks.
  3. Behavioral Investigation: AI excels in using behavioral modeling during triage and investigation. Unlike human analysts, who may lack the time or expertise to conduct complex behavioral analysis, AI constantly learns normal patterns and compares suspicious activity against baselines for users, entities, peer groups, or entire organizations. This enhances the accuracy of its findings and leads to more reliable conclusions.
  4. Transparency: AI SOC analysts keep a detailed record of every action—each question asked, test performed, and result obtained. This information is easily accessible through user interfaces, often supported by chatbots, making it simple for human analysts to review the findings. Every conclusion and recommended action is backed by data, frequently cross-referenced with industry security frameworks like MITRE ATT&CK. This level of transparency and auditability is rarely achievable with human analysts due to the time it would take to document their work at such a scale.

In short, agentic AI offers a more thorough, accurate, and transparent approach to SOC automation, providing security teams with a high level of confidence in its capabilities.

4 Key Benefits of an Agentic AI Approach to SOC Automation

By adopting an agentic AI approach, SOCs can realize significant benefits that enhance both operational efficiency and team morale. Here are four key advantages of this technology:

  1. Finding More Attacks with Existing Detection Signals: Agentic AI reviews every alert, correlates data across sources, and conducts thorough investigations. This enables SOCs to identify the detection signals that represent real attacks, uncovering threats that might have otherwise been missed.
  2. Reducing MTTR: By eliminating the manual bottleneck of triage and investigation, Agentic AI allows remediation to happen faster. What previously took days or weeks can now be resolved in minutes or hours, drastically cutting mean time to respond (MTTR).
  3. Boosting Productivity: Agentic AI makes it possible to review every security alert, something that would be impossible for human analysts at scale. This frees analysts from repetitive tasks, allowing them to focus on more complex security projects and strategic work.
  4. Improving Analyst Morale and Retention: By handling the repetitive triage and investigation work, Agentic AI transforms the role of SOC analysts. Instead of doing tedious, monotonous tasks, analysts can focus on reviewing reports and working on high-value initiatives. This shift boosts job satisfaction, helping retain skilled analysts and improve overall morale.

These benefits not only streamline SOC operations but also help teams work more effectively, improving both the detection of threats and the overall job satisfaction of security analysts.

About Radiant Security

Radiant Security is the first and leading provider of AI SOC analysts, leveraging generative AI to emulate the expertise and decision-making processes of top-tier security professionals. With Radiant, alerts are analyzed by AI before reaching the SOC. Each alert undergoes multiple dynamic tests to determine maliciousness, delivering decision-ready results in just three minutes. These results include a detailed incident summary, root cause analysis, and a response plan. Analysts can respond manually, with step-by-step AI-generated instructions, use single-click responses via API integrations, or choose fully automated responses.

Want to learn more?

Book a demo with Radiant to learn more about how an AI SOC analyst can turbocharge your SOC.

Found this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Twitter and LinkedIn to read more exclusive content we post.



from The Hacker News https://ift.tt/0WayolT
via IFTTT

No comments:

Post a Comment