Friday, June 12, 2026

AI Storage Bottlenecks: Why AI Workloads Slow Down and How to Fix Them

GPU clusters cost serious money whether they’re training or waiting. In most production pipelines, storage constraints idle expensive hardware more often than compute limits do. Slow dataset reads and checkpoint write contention compound across every training run, and fragmented data tiers make the problem worse.

Below, we’ll cover where these bottlenecks show up, how you’ll spot them, and which architectural decisions actually help – though I’ll admit up front that not every fix needs new hardware, and some of the worst bottlenecks are self-inflicted (something I wish more teams understood before they started calling vendors).

What counts as a bottleneck

An AI storage bottleneck is any weak point that prevents workloads from getting data fast enough to keep compute busy. Constraints can sit in throughput, latency, IOPS, metadata performance, or network bandwidth. GPU clusters consume power and cooling whether they’re working or not, so the cost of getting this wrong shows up from the first idle cycle. If you’ve ever watched a utilization graph flatline while the SAN LEDs blink happily, you know exactly what I mean.

AI workloads place different demands on storage than traditional enterprise apps. Training repeatedly reads the same large datasets from many parallel workers. Checkpointing writes hundreds of gigabytes in short bursts. Inference needs fast, predictable access to model weights and cached context. A system that’s tuned for one pattern can fail miserably on the others.

Effects compound fast. You add GPUs and see little improvement because the limit sits earlier in the data path. Sad truth – many teams we’ve worked with plan compute carefully and treat storage as an afterthought.

That ordering produces the exact bottlenecks described here. Every time.

Where bottlenecks appear in the pipeline

Where does storage actually choke? AI workloads span several stages, and each creates a different storage demand.

 

AI pipeline and common bottlenecks

Figure 1: AI pipeline and common bottlenecks

 

The larger your pipeline becomes, the more these demands overlap. In many cases – ingestion, training reads, checkpointing, and backup all compete for the same throughput and network bandwidth at once.

Meta’s work on ML training data illustrates this at scale. The company trains thousands of models on petabyte-scale datasets using Tectonic, its distributed file system, while a separate preprocessing tier handles decoding and conversion to tensor formats. At that scale, storage and network paths are as much a part of AI performance as the training code itself. Maybe more.

Training reads versus inference latency

Training and inference put different demands on storage, so the diagnostic approach depends on which phase is slow.

Training bottlenecks usually appear in dataset reads or checkpoint writes. When an LLM trains across dozens of GPUs, each worker repeatedly reads from the same shared dataset, and if your storage can’t serve those requests concurrently, workers stall, which means your expensive silicon sits there doing nothing while the clock ticks. Checkpoint I/O creates a separate challenge. Model and optimizer state can reach hundreds of gigabytes per save, and research into LLM checkpointing overhead shows this can eat a large share of total training time when storage isn’t built for burst writes.

Inference behaves differently. KV-cache access speed and model loading latency matter most. Research on dual-path KV-cache architectures (DualPath) has shown that agentic LLM inference workloads can become dominated by cache storage I/O rather than compute. A model that runs efficiently once loaded can still deliver poor user-facing latency if storage retrieves cached context slowly.

Symptoms differ. When debugging training jobs, low GPU utilization during dataset loading is the first thing to check. In inference, it’s high first-token latency even when the model itself performs well.

Additionally, enterprise data often arrives scattered across NAS platforms, databases, and cloud buckets before it ever reaches an ML pipeline. Time spent federating and normalizing that data can become the first bottleneck you hit, before any GPU is involved.

Agentic inference introduces another challenge by generating a persistent write stream of tool calls, trace logs, intermediate state, and telemetry. Teams that don’t account for this when planning inference storage frequently see write queues build up under sustained traffic.

Why storage chokes

Slow shared storage serving multiple GPU jobs simultaneously is one of the most common causes of AI performance issues. When several training jobs compete for the same NAS or SAN, throughput per job drops and latency rises. A 10 GbE storage network (still very common in older racks, unfortunately) feeding a cluster that can consume far more bandwidth becomes the ceiling regardless of the hardware behind it, and no amount of flash on the array side fixes a network pipe that’s simply too narrow, which is a lesson teams seem to relearn every budget cycle.

I’ve seen too many teams start with a good old shared NAS because it’s simple to deploy. That works for early experiments. Once several GPU servers read the same dataset in parallel, the NAS controller or network caps the entire cluster, even when the underlying drives still have capacity to spare. It happens.

Data scattered across tiers creates another bottleneck. Excessive movement between NAS, object storage, local disks, and cloud buckets adds latency at every step. A backup job running during an active training window can consume enough bandwidth to measurably reduce GPU utilization. Teams that haven’t isolated these traffic types encounter this pattern regularly. It’s not subtle.

Small-file datasets introduce their own failure mode. A computer vision training job may involve millions of individual JPEG files. On paper, the storage system’s got more than enough throughput. In practice, the file system spends a disproportionate amount of time locating and opening files. Metadata performance becomes the bottleneck. Bandwidth is rarely the issue, and no amount of sequential read optimization fixes it because the problem isn’t the read – it’s the lookup.

How to tell storage is the problem

Check the basics first. The most obvious signal is low GPU utilization during data loading. If your GPUs spend significant time waiting for data instead of processing it, storage or preprocessing is often the real constraint. Long data loader wait times relative to actual compute time are a common indicator. At the infrastructure level, high storage latency, saturated network links, and elevated queue depth usually point to the same problem.

One reliable test is to stage the active dataset on local NVMe storage and run the same training job again. If performance improves significantly, the bottleneck is likely in shared storage or the network path between compute and storage. Checkpoint write duration provides a useful secondary check. Writes that take minutes instead of seconds almost always indicate storage saturation. No exceptions.

Track metrics together. GPU utilization, storage throughput, latency, IOPS, queue depth, network saturation, data loader time, and checkpoint write duration all tell part of the story. Teams identify bottlenecks faster when GPU idle time and storage performance appear on the same dashboard instead of being treated as separate concerns – well, not necessarily on one screen, but correlated, which is different and honestly most monitoring tools don’t do this well out of the box.

Storage architectures for AI workloads

AI pipelines usually need more than one storage layer. A data lake, a hot training tier, a checkpoint target, and an inference path often have different and sometimes conflicting performance requirements. For a detailed breakdown of storage types designed specifically for AI, see AI storage in 2026: types, benefits, and vendors on the StarWind blog.

 

Typical storage layout to keep GPUs busy

Figure 2: Typical storage layout to keep GPUs busy

 

Each architecture solves a different problem, so choosing the right one depends on whether your priority is throughput, latency, scalability, operational simplicity, or cost.

The most common architectural mistake is trying to use one storage platform for every stage of the pipeline. For example, platforms such as WekaFS, VAST Data, and DataCore Nexus are designed specifically for HPC-style access patterns. NVMe storage via NVMe-oF serves the hot tier, whether that means rapid model loading during inference or handling burst checkpoint writes during training.

For edge AI deployments and read-heavy workloads in particular, NVMe-oF makes shared flash accessible across local nodes without requiring a central SAN. StarWind VSAN and DataCore SANsymphony both support this transport layer for compact edge clusters running local inference.

How to reduce AI storage bottlenecks

The right fix depends on where the bottleneck actually exists.

We’ve found that data placement is usually the fastest improvement you can make, and it’s also the cheapest because it often requires nothing more than moving data to a different mount point before the job starts. Stage the active training dataset on local NVMe or a high-throughput shared tier before the job starts, not while it’s already running. Data movement happens once instead of competing with training traffic throughout the run.

For distributed training, avoid routing every GPU worker through a single NAS controller. One overloaded controller caps the entire cluster regardless of what the underlying hardware can do. Parallel file systems and scale-out storage spread the load across multiple nodes and remove that single point of contention.

Checkpoint storage is another area that often receives attention too late, and I’ll admit we’ve missed this ourselves more than once. When checkpoint traffic shares the same path as training reads, training performance usually suffers. Separating checkpoint storage onto its own tier, even a relatively small one, often resolves the problem without requiring a major architectural redesign.

Not every bottleneck requires new hardware. Some just need better configuration. Data loader optimization can be surprisingly effective. Serial file reads and poorly configured loaders create CPU-side delays that look like storage problems but aren’t. Prefetching and parallel loading can significantly reduce wait times. Small-file sprawl is another issue worth addressing early. Packaging datasets into larger shards reduces metadata overhead before it becomes the limiting factor.

Backup traffic deserves special attention as well. Giving backups their own schedule or their own network path is usually more effective than simply adding capacity. More bandwidth doesn’t eliminate contention if competing workloads continue to share the same resources.

Cloud, on-premises, and hybrid AI storage

The right deployment model depends on your workload requirements, data sensitivity, and how much latency your applications can tolerate.

Cloud environments work well for burst training and short-lived experiments. You can provision compute close to managed storage, complete the training run, and release the resources afterward. The issue surfaces at scale. Egress costs for multi-terabyte training datasets can match the compute cost of the training job itself, and staging data near cloud compute before each run is usually more practical than treating storage and compute placement as independent decisions. Painful, but predictable.

On-premises infrastructure remains a strong fit for organizations with sensitive datasets and predictable workloads. For example, a healthcare team keeping regulated imaging data close to local GPU resources avoids both compliance concerns and the cost of repeatedly moving large datasets to the cloud. The same organization may still use cloud GPUs for less sensitive experimentation while keeping primary datasets on-premises.

Edge deployments address situations where round-trip latency to a central location is unacceptable. Storage and compute stay together. Full stop.

Hybrid architectures are where many organizations I’ve talked to usually land. Cold data resides in object storage, active datasets are staged close to GPU clusters, and cloud resources absorb temporary demand spikes. Managing data movement between tiers without introducing new bottlenecks is the hard part.

HCI and software-defined storage at the edge

HCI and software-defined storage are not the primary answer for hyperscale AI training, but they fit several adjacent use cases well. Edge inference and local data preparation are sweet spots, as are compact clusters where operational simplicity matters. Hyperscale training is not.

Consider a factory running local inference on production-line camera feeds. Sending every request to a centralized data center introduces unnecessary latency and creates a dependency on WAN connectivity. A compact hyperconverged cluster keeps compute and storage in the same environment, eliminating the need for a separate storage network. If you’re designing edge AI infrastructure, edge storage deserves consideration as its own architectural category.

StarWind HCI Appliance and StarWind VSAN are designed around this model. They support two-node and small-cluster deployments where compute and storage share the same hardware, high availability remains local, and there’s no dependency on a centralized storage network. StarWind VSAN also provides software-defined fault tolerance without requiring a dedicated witness node. We’ve found this particularly valuable at remote sites where every additional server adds cost and operational overhead, and where shipping a replacement part might take days.

Common AI storage mistakes

Most AI storage problems are predictable. In fact, we see the same mistakes repeatedly, regardless of team size and budget, or the sophistication of the models involved.

Buying GPUs before checking storage throughput is probably the most common. The GPUs arrive, the cluster is deployed, and only then does the team discover that the existing storage system can’t feed them at the required rate. The hardware budget is spent. Not on the actual bottleneck.

Testing with generic benchmarks creates false confidence. Sequential read tests pass, while checkpoint writes and small-file handling still fail. AI-specific workload testing should be part of storage validation before any major hardware investment is made.

Treating object storage as a universal hot tier is a recurring mistake in first-generation AI environments. Object storage scales extremely well for data lakes and archives. It also handles large repositories without issue. But active training workloads typically require lower and more predictable latency than S3-compatible storage can provide. Over long training runs and repeated dataset scans, that gap becomes increasingly visible.

No monitoring of GPU wait time means teams notice slow runs but can’t locate the cause. GPU idle cycles tied to data loading are the most actionable signal of a storage bottleneck, and the metric most commonly missing from AI infrastructure dashboards.

What to check before your next GPU purchase

Storage is rarely the first thing teams investigate when AI jobs run slowly, but it’s frequently where the actual limit sits. Many storage bottlenecks can be resolved without buying additional hardware. Start there.

Before we buy our next GPU, we run a simple test. Stage your dataset on local NVMe, watch the utilization graph, and compare it to your shared storage baseline. If the gap is wide, you don’t have a compute problem. You have a plumbing problem. Fix the storage first. The GPUs can wait. They’re already good at that.

FAQ

What is an AI storage bottleneck?

An AI storage bottleneck is any limitation in throughput, latency, IOPS, metadata performance, or network bandwidth that prevents workloads from receiving data fast enough to keep compute resources fully utilized.

Why do GPUs sit idle during AI training?

GPUs typically sit idle when the data pipeline cannot deliver training samples quickly enough. Common causes include slow shared storage, saturated network links, inefficient data loaders, or datasets that have not been staged close to compute resources.

What storage is best for AI workloads?

Hot training data benefits from NVMe or a parallel file system. The data lake and cold datasets suit S3-compatible object storage. Checkpoints need a tier that absorbs burst writes. The right design is tiered, matched to each pipeline stage.

Is object storage good for AI?

Yes, but it depends on the workload. Object storage works well for AI data lakes, backups, archives, and long-term dataset repositories. It is generally less effective as the primary hot training tier unless additional caching or staging layers are used.

Is NVMe required for AI storage?

Not always, but it is the fastest option for hot datasets, checkpoint writes, and model loading. Many teams use NVMe as a local staging tier with colder data in NAS or object storage behind it.

What is the difference between AI storage for training and inference?

Training needs high sustained throughput for dataset reads and burst write capacity for checkpoints. Inference needs low latency for model loading, KV-cache access, and embedding retrieval.

How do I know if storage is slowing down my AI workloads?

Start by monitoring GPU utilization during data loading and correlating it with storage latency, throughput, and network utilization. A simple validation test is to move the dataset to local NVMe storage and rerun the workload. If performance improves significantly, storage or the network path is likely the bottleneck.

Can HCI help with AI storage bottlenecks?

For edge AI, local inference, and smaller training clusters, yes. For large-scale distributed training, dedicated high-throughput storage is usually more appropriate.

What storage metrics matter for AI workloads?

GPU utilization, storage throughput and latency, IOPS, queue depth, network saturation, data loader time, and checkpoint write duration.



from StarWind Blog https://ift.tt/f8qBa7d
via IFTTT

UniconOS Management Cloud: A simpler way to manage enterprise endpoints without infrastructure overhead

Enterprise IT leaders often recognize it, even if they don’t call it out: an endpoint program grows—a new region, a new business unit, another wave of devices—and what started as a focus on policy, lifecycle, and user experience subtly shifts. The conversation moves from endpoints to the platform behind them: how it runs, scales, stays available, and who owns the risk.

At this point, endpoint management can feel less like a strategic capability and more like an infrastructure project. The cost is real—not always on a budget line, but in time, operational drag, and missed opportunities. Rollouts slow, growth plans become sizing exercises, and reliability turns into a governance discussion. All at a time when the business expects IT to move faster, not become more operationally burdened.

The strategic question is no longer whether you can manage endpoints. It is whether managing the management platform itself is where you want to focus your best resources.

This is exactly the problem UniconOS Management Cloud, formerly Scout, is designed to solve, bringing a cloud‑hosted operating model to enterprise endpoint OS management without sacrificing control.

The operating model is the real decision

Some organizations deliberately run platforms internally; full control, tailored governance, and internal ownership are part of their DNA. Others want endpoint management to behave like a service: reliable, scalable, and predictable – without turning every growth step into another infrastructure initiative.

In today’s environment, where IT is measured on speed and resilience, operating model decisions are resource decisions. Where should your best people focus: maintaining platform infrastructure or improving endpoint outcomes, security posture, and user experience?

This is not a technical question. It is a strategic one.

Extending Citrix cloud to endpoint OS management

Citrix continues to simplify how customers consume secure digital workspaces. Extending that philosophy to endpoint OS management is a natural next step.

With Citrix UniconOS, customers get an endpoint OS platform built for Citrix environments.

Citrix UniconOS Management, available in Local and Cloud deployment models, provides the management layer for UniconOS, handling policies, configuration, and visibility.

  • The Local deployment model is customer-managed, giving organizations full control over infrastructure, scaling, and availability.
  • The Cloud deployment model is Citrix-managed and reduces operational overhead while scaling.

What’s new is the ability to choose between these two deployment models: organizations can run UniconOS Management in Local mode or adopt the Cloud deployment model, where Citrix operates the underlying platform services in a managed cloud environment. This is a deliberate choice based on how much operational responsibility teams want to carry.

The strategic cost of endpoint management infrastructure

Endpoint programs are growing faster and more complex: more locations, device types, distributed teams, and higher security expectations. Each expansion introduces operational risk, and every delay slows business outcomes.

Running the underlying platform is increasingly specialized. When IT teams spend significant time maintaining infrastructure instead of managing endpoints, they trade strategic focus for operational overhead. It’s a tradeoff most executive teams are trying to reduce.

Infrastructure that does not directly create business value should not become a scaling tax. UniconOS Management Cloud reduces platform operations overhead, allowing IT to focus on what drives real impact: faster deployments, secure and reliable endpoints, and smoother user experiences – without sacrificing control where it matters most.

Four strategic outcomes of Citrix UniconOS Management Cloud

1. Flexibility
Choose the operating model that aligns with governance, regulatory, and organizational requirements. Customer-managed remains fully supported; Citrix-managed is available when simplicity and cloud operations are the priority.

2. Scale without friction
Endpoint growth should not trigger new platform projects. The management layer scales with device counts without repeated infrastructure redesign.

3. Enterprise by design
Availability, monitoring, backup, and resilience are built in. Reliability is foundational, not an afterthought.

4. Accelerated time-to-value
Move from evaluation to rollout without lengthy infrastructure preparation. Removing platform operations as a bottleneck lets teams focus on delivering endpoint outcomes faster.

Getting started with Citrix UniconOS Management Cloud

Exploring a managed operating model does not require a large-scale transformation. A focused proof of concept allows teams to validate outcomes that matter most—time-to-value, scalability, availability, and reduced operational overhead—before broader rollout decisions.

Local mode remains the right choice for many customers. The Cloud mode exists for organizations that want to reduce operational burden while maintaining clarity, control, and enterprise standards.

Endpoint strategy should never be constrained by platform operations. With UniconOS Management Cloud, Citrix customers gain the freedom to decide where control truly matters – and where simplicity drives value. The choice of operating model is ultimately a choice about focus.

Next Steps

Start with a focused proof of concept to validate time-to-value, scalability, and operational overhead. Contact your Citrix representative to explore your options.



from Citrix Blogs https://ift.tt/LTke24I
via IFTTT

The Good, the Bad and the Ugly in Cybersecurity – Week 24

The Good | Authorities Dismantle Crypto Laundering Empire & Seize Espionage Domains

Europol has dismantled a major cryptocurrency laundering network called “AudiA6”, known for actively facilitating illicit transactions for ransomware syndicates and cybercriminals worldwide. Since 2022, the platform allegedly laundered more than $380 million by obscuring the origin of cybercrime proceeds through complex transaction routes for a 3-10% service commission. The joint operation, spanning 11 countries and supported by Eurojust, successfully seized multiple domains and froze a substantial amount of AudiA6’s digital assets.

Following forensic analysis stemming from a prior arrest in Poland, investigators were able to identify and apprehend the platform’s two senior administrators in Georgia. The industrial-scale infrastructure relied on thousands of fraudulent exchange accounts, all registered by recruited money mules using stolen identities. The suspects, who also managed the “Dark2Web” cybercrime forum, now face potential 20-year prison sentences for operating the illicit service.

The FBI has seized 13 fraudulent websites operated by suspected Chinese intelligence agents attempting to recruit U.S. citizens holding sensitive government security clearances. The campaign used AI-generated photographs and stolen identities to construct fake consulting firms that advertised generic analyst and consultant roles across major professional networking platforms including Upwork, HUbstaff Talent, and Wellfound.

When targets applied, operatives then pressured the candidates to disclose confidential or non-public information in exchange for lucrative compensation. To obscure their identities and the origin of funds, the recruiters used cryptocurrency and online payment systems.

Federal authorities have now successfully identified and dismantled the network after several targeted individuals reported the suspicious payment methods to investigators. Officials continually urge current and former government personnel to exercise extreme caution regarding unsolicited recruitment offers promising easy income for vague consulting work.

The Bad | JDY Botnet Expands Scope to Target U.S. Military Networks for Cyber Reconnaissance

A malware network previously associated with PRC-based threat groups like Volt Typhoon is expanding its cyber reconnaissance operations and target scope. Known as “JDY botnet”, the network has grown rapidly from approximately 650 active bots in early 2024 to over 1,500 compromised small office/home office (SOHO) and Internet of Things (IoT) devices today. While operators maintain a global footprint, they are now heavily concentrating efforts within the United States, specifically focusing on the military and its associated networks.

Unlike traditional distributed denial-of-service (DDoS) botnets, JDY functions primarily as a distributed scanning and fingerprinting network. Operators weaponize the network to quickly locate vulnerable infrastructure immediately following public vulnerability disclosures.

The malware then registers with a central dispatch service hosted on hidden Tor networks to receive scanning assignments. Once deployed on compromised edge devices, including hardware from Cisco, Ubiquiti, and Hikvision, the botnet executes comprehensive service discovery, service banner grabbing, TLS certificate collection, and protocol fingerprinting. When it has enough administrative privileges, JDY performs exceptionally fast and stealthy SYN scanning using custom-crafted TCP packets to batch-process thousands of potential targets.

A snippet of the JDY malware dropper that downloads and executes the malware (Source: Black Lotus Labs)

Federal agencies previously warned about the risks to unprotected routing infrastructure. To prevent hardware from being recruited into these vast reconnaissance networks, administrators must consistently ensure all edge devices run the latest security patches. Organizations can proactively reduce their external attack surfaces by disabling unnecessary internet-exposed management interfaces, fully replacing default administrative credentials, and thoroughly monitoring for any unusual outbound scanning activity originating from local networks.

The Ugly | Miasma Supply Chain Worm Continues Propagation Across Microsoft & PyPI Repositories

The ongoing Miasma self-replicating supply chain worm recently compromised 73 Microsoft GitHub repositories, including projects related to Azure, prompting GitHub to rapidly disable access. An evolution from the “Mini Shai-Hulud” malware, threat actors are now directly pushing malicious configuration files into legitimate source repositories.

The hidden payloads automatically trigger code execution whenever developers open the compromised projects using popular AI coding assistants or integrated development environments (IDEs). The latest intrusions most notably involve the re-compromise of the “durabletask” PyPI package, indicating attackers retained previously stolen developer credentials to seamlessly propagate the worm through automated contributor workflows.

Miasma continues to infect more packages on GitHub (Source: TheHackerNews)

Since the series of Microsoft repo breaches, the campaign has evolved into a fresh attack wave dubbed “Hades”, actively targeting the PyPI registry. Attackers poisoned 19 PyPI packages with malicious wheel artifacts containing hidden .pth setup files. This mechanism executes silently during Python interpreter startup, entirely eliminating the need for victims to explicitly import the compromised packages.

The payload then downloads the standalone Bun JavaScript runtime to evade traditional network proxies, subsequently deploying a heavily obfuscated credential stealer. The malware aggressively harvests cloud access tokens, SSH keys, shell histories, and Docker configurations while introducing new, tailored memory scrapers specifically targeting macOS and Windows environments.

Advanced in its defensive evasion, the Hades variant incorporates novel plain-text prompt injections deliberately designed to deceive LLM-based package analysis tools into incorrectly classifying the malicious packages as safe.

Ultimately, these cascading supply chain attacks successfully exploit fundamental trust models within open-source ecosystems, leveraging compromised, authenticated maintainer accounts to embed persistence mechanisms directly into standard developer environments.



from SentinelOne https://ift.tt/0awiN9H
via IFTTT

Agentjacking Attack Tricks AI Coding Agents Into Running Malicious Code

Cybersecurity researchers have described what they say is a new class of attack that can trick artificial intelligence (AI) coding agents into running arbitrary code on developer machines.

Called Agentjacking by Tenet Security, the attack can be triggered by means of a fake error report crafted using Sentry, an open-source error-tracking and performance-monitoring platform.

"The attack exploits a critical architectural flaw at the intersection of Sentry's event ingestion (which accepts arbitrary payloads from anyone with the DSN) and the Sentry MCP server (which returns this data to AI agents as trusted system output)," security researchers Ron Bobrov, Barak Sternberg, and Nevo Poran said.

The idea is to inject crafted input into Sentry error events, which are then interpreted by coding agents like Claude Code and Cursor as legitimate diagnostic resolution steps and run attacker-controlled code.

A successful attack of this kind can expose sensitive data, including environment variables, Git credentials, private repository URLs, and developer identities, without having to rely on methods like phishing or prior server compromise.

The problem is rooted in the implicit trust associated with connecting to external services using Model Context Protocol (MCP). Because an AI agent is unable to distinguish between an error event generated by a real application crash or injected by an attacker, it creates a pathway to arbitrary code execution when the agent processes the response.

The attack chain devised by Tenet is as follows -

  • An attacker finds a target's Sentry Data Source Name (DSN), a public, write-only credential that's embedded in websites.
  • The attacker sends a malicious error event to Sentry's ingest endpoint via a POST request using the DSN.
  • The injected event contains "carefully formatted markdown" in the message field and context key names. When the Sentry MCP server returns this event to an AI agent, it is rendered as structured content visually identical to the Sentry's system template.
  • When a developer asks their AI coding agent to "fix unresolved Sentry issues" (or a similar prompt), the agent queries Sentry via MCP and receives the malicious event.
  • The agent executes malicious code, which runs with the developer's full privileges.

"The attacker never touches the victim's infrastructure," the researchers explained. "The malicious instruction arrives disguised as a legitimate 'Resolution' inside an ordinary error. When a developer asks their AI agent to fix the Sentry issue, the agent reads the attacker's command as trusted guidance and runs it - with the developer's own privileges, on the developer's own machine."

Agentjacking stands out because it targets the AI agent a developer trusts and uses a Sentry DSN as a starting point. In addition, the markdown injection is rendered such that the agent cannot distinguish it from legitimate Sentry guidance.

The AI cybersecurity company said it found at least 2,388 organizations exposed with valid injectable DSNs, and that it tested the attack in a controlled manner against over 100 organizations, achieving an 85% exploitation success rate against injected errors across some of the most widely used AI coding assistants.

Sentry, for its part, has acknowledged the issue, but opted not to fix it, stating it's "technically not defensible." However, the company is said to have activated a global content filter that blocks a "specific payload string."

"As enterprises race to deploy AI coding agents, this research proves the agents themselves are now the attack surface - turned against the developers who trust them, using nothing but data those organizations publish about themselves," Tenet said. "The attack bypasses EDR, WAF, IAM, VPN, Cloudflare, and firewalls - because there is nothing malicious to detect. Every action in the chain is authorized."



from The Hacker News https://ift.tt/VE0b4Wf
via IFTTT

Rethinking MDR as Attackers and Defenders Embrace AI

For most of the past decade, managed detection and response was the answer to a real problem. Security teams couldn't staff around the clock, couldn't hire enough analysts, and needed someone else to handle the alert queue. MDR stepped in. It worked well enough. Until now.

The threat landscape has changed faster than the MDR model can adapt. Attackers are using AI to move faster, generate more convincing phishing at scale, automate reconnaissance, and create malware variants that evade signature-based detection. The attack surface has expanded from endpoint to cloud, identity, and network simultaneously. And yet MDR is still doing what it always did. Routing alerts to human analysts who triage what they can, in the order they can get to it.

That is no longer enough. The data we share below proves it and security leaders might consider exploring whether they have outgrown their MDR.

MDR's 24/7 promise doesn't cover 60% of your alerts

MDR promised 24/7 human coverage. What it delivered was a 24/7 human capacity to triage high-severity alerts. Those are not the same thing.

Across the industry, approximately 60% of alerts go unreviewed. That's not a performance failure. Human teams, whether in-house or outsourced to an MDR, cannot process the volume of alerts that modern environments generate. So they do what any rational person does. They prioritize. P1s and P2s get worked. P3s and P4s pile up.

But this is exactly where attackers hide.

Analysis of 25 million alerts across global enterprises in 2025 found that nearly 1% of real threats originate in low-severity and informational alerts. In an enterprise generating 450,000 alerts annually, that translates to roughly 54 real incidents per year, about one per week, sitting in the deprioritized queue where no one is looking.

The breaches hiding in that backlog are not theoretical. They are happening right now, in organizations that believe they have coverage.

Note: The math behind the above statement assumes 450K annual alerts, of which 60% are not investigated and of those, 2% are real incidents. Of those real incidents, 1% originate in low-severity alerts.

Investigation quality varies by who is on shift

Even for alerts that do get reviewed, MDR investigation quality is not consistent. It is bounded by the experience of the analyst on duty, the queue depth at that moment, the time of day, and whether the team is fully staffed. A P1 at 3 am gets a different investigation than the same alert at 10 am.

This is not a criticism of MDR analysts. It is a description of what happens when any human-executed process runs at high volume, under pressure, around the clock. Variance is unavoidable.

The consequences are real. When an investigation is shallow, threats get classified as noise. When follow-through is inconsistent, early-stage lateral movement looks like routine behavior. The attacker who got in on a low-severity alert keeps moving undetected because no one had the time or context to connect the signals.

Detection engineering is not a closed loop

In most MDR deployments, detection engineering is a periodic exercise. Rules get tuned when customers complain about alert volume. New coverage gets added when a major CVE makes news. Otherwise, the detection posture drifts.

The core problem is architectural. MDR investigation and detection engineering operate in separate silos. When an analyst investigates an alert and closes it as a false positive, that insight rarely feeds back into the detection system. Broken rules stay broken. Noisy rules keep generating noise. New attacker techniques arrive without matching detections.

The result is a detection posture that degrades faster than it improves. Real coverage, measured against the MITRE ATT&CK framework, can be far lower than teams assume.

You can't audit what you can't see

Most MDR services are a black box. Customers receive escalations and summaries. They do not get to see the investigation logic, inspect the evidence trail, verify the verdict, or audit what the analyst actually reviewed before closing a case.

In an era where accountability and transparency are security requirements, this is a genuine liability. When an incident is missed, you cannot diagnose why. When a verdict is wrong, you cannot trace the reasoning. When regulators ask what was investigated and how, there is no answer.

The AI savings are going to the vendor, not to you

AI is reducing the operational cost of MDR. Providers are using it to automate portions of triage, reduce analyst hours, and increase margins. Those efficiency gains do not flow through to customers as lower prices or expanded coverage. The buyer still pays the same rate, or more. The provider keeps the savings.

But the coverage gap stays the same. The human scaling constraint stays the same. Only the provider's cost structure has improved.

You don't own what was built in your name

Detection rules, triage logic, case history, and investigation learnings accumulate inside the MDR vendor's platform over the life of the contract. When the contract ends, that knowledge does not move with you. The years of tuning, the accumulated context about your environment, and the detection improvements built from your data all stay with the vendor.

This creates two problems. First, organizations that switch providers start from scratch, rebuilding institutional knowledge that took years to develop. Second, organizations that want to bring security operations in-house, a trend that is accelerating as AI SOC tools mature, find themselves starting with no foundation.

MDR providers, for obvious reasons, are not incentivized to help customers build internal capability. Their model depends on retaining the work.

Your MDR contract may block you from using Claude for your SOC

The above-mentioned knowledge lock-in is no longer just a switching-cost problem. It's also an AI readiness problem. When you try to deploy an AI agent for SOC work, it needs a knowledge foundation to reason over. Detection rules, case history, behavioral baselines, and forensic verdicts. If those live in your MDR vendor's platform, your agent is starting from near zero.

Additional MDR gaps worth noting

Aside from the above, MDR has a set of smaller gaps that compound over time. Every customer gets the same generic playbook regardless of their specific risk profile, compliance obligations, or data sensitivity. Integration tools like SOAR, which were supposed to streamline MDR findings into internal workflows, largely failed to deliver on that promise because human-driven investigation doesn't produce the structured, consistent outputs that automation requires. And when a real incident surfaces and a customer needs to talk to someone who understands their environment, they often reach an AI chatbot or a ticketing queue instead of a person.

What the AI-powered attacker era actually requires

The attackers of 2026 are not waiting for alert queues to clear. AI-generated phishing campaigns hit inboxes at a volume and quality that bypass conventional gateways. Credential stealers like Agent Tesla and LummaC2 move fast. EDR tools are being actively evaded, with research showing that more than half of confirmed compromised endpoints had already been marked as "mitigated" by the EDR vendor. The attacker has already won a round that the defender didn't know was being played.

Meeting this moment requires a different operating model. One where investigation speed is measured in seconds, not hours. Where every alert gets examined, regardless of severity or time of day. Where the output is an evidence-backed verdict, not an analyst's judgment call under pressure.

This is what an AI SOC is designed to deliver.

An operating model shift where AI executes and humans supervise

The core idea behind an AI SOC is simple. Move investigative execution out of the human queue and into AI, so that humans can focus on decisions rather than discovery.

In practice, this means 100% of alerts, including endpoint, identity, cloud, network, phishing, and SIEM, are triaged and investigated automatically. Not sampled. Not filtered by severity. All of them. The AI applies the same forensic depth to a P4 alert at 3 am that a senior analyst would apply to a P1 in the afternoon.

Intezer's platform data across 25 million alerts shows this is achievable. Less than 2% of alerts required human escalation. The over 98% that resolved autonomously did so with sub-minute median triage time and 98% verdict accuracy. For a large enterprise with 450K annual alerts, that means roughly 441K alerts per year are fully investigated and resolved without human intervention and 54 genuine threats that would have been missed under traditional MDR coverage are now caught with actional remediation recommendations.

Forensic depth is what makes AI autonomy trustworthy

AI can summarize an alert. That's useful. AI can enrich with threat intelligence. Also useful. But neither of those activities is investigation. They are pre-processing.

Genuine AI-driven investigation requires forensic-level interrogation. When an alert fires, the question is not "does this look suspicious?" It is, what actually executed, where did it originate, what did it do, and is there evidence of compromise in memory that the alert itself didn't surface?

This matters because the most dangerous threats are specifically designed to evade surface-level detection. Fileless malware lives entirely in memory and writes nothing to disk. Code injection hides inside legitimate processes. Early-stage credential theft looks like normal authentication. Without memory forensics, binary analysis, and code reuse detection, an AI investigation is only as deep as the alert data it was handed.

Forensic depth is also what creates the trust threshold, the point at which AI verdicts are accurate and evidence-backed enough to act on without human validation. Below that threshold, AI assists analysts. Above it, AI can safely take on the full investigative workload and escalate only when evidence warrants it.

Closed-loop detection engineering changes everything

One of the most significant structural advantages of a true AI SOC is the closed loop between investigation and detection. Every alert investigation surfaces information about detection quality. Which rules are firing accurately, which are generating noise, and which attacker techniques have no coverage at all?

When this feedback flows continuously into detection engineering, the posture improves without waiting for an annual audit or a customer complaint. Noisy rules get tuned. Broken telemetry gets flagged. New coverage for emerging techniques gets deployed in days, not months. The detection system gets smarter alongside the investigation system.

This is how MITRE ATT&CK coverage moves from a static baseline to a dynamic, improving map of what an organization can actually detect. It is the difference between coverage that reflects what was set up two years ago and coverage that reflects what attackers are doing today.

Pricing that aligns with full coverage

The economics of an AI SOC should match the coverage it provides. Per-alert pricing, still common among AI copilot tools that rely heavily on LLMs, forces customers to be selective about which alerts to send. The result is the same cherry-picking problem that MDR created. High-severity alerts get the attention, low-severity alerts accumulate in a deprioritized queue.

Per-endpoint pricing changes this entirely. The cost is fixed to the number of monitored endpoints, not to alert volume. There is no economic penalty for investigating every alert. Full coverage becomes the default, not a premium option.

This also matters for budget predictability. Alert volumes spike unpredictably during active incidents or when new detections deploy. Endpoint counts are stable. For finance teams trying to plan security spend, the difference is significant.

What ownership looks like under an AI SOC

Detection rules, investigation history, and organizational context should belong to the organization, not to the vendor. This means every detection deployed to a customer's SIEM is the customer's rule. Investigation evidence is available for audit at any time. If the organization decides to expand internal capability, build its own AI agents, or switch tools, they take everything with it.

This is not just a contract term. It is a prerequisite for security maturity and for broader adoption of AI tools like Claude for your security team. Organizations that want to eventually supervise AI systems rather than outsource to vendors need a knowledge foundation to build on. That foundation cannot exist if it lives inside a vendor's platform.

The transition from MDR to AI SOC

Moving from MDR to an AI SOC is not necessarily a rip-and-replace decision for most organizations. The practical path might be augmentation first. Bring in an AI investigation alongside the existing MDR contract, observe what the AI surfaces that the MDR was missing, and let the comparison build the case for a clean transition at renewal.

By the time the MDR contract is up for renewal, the organization typically has months of evidence showing what full alert coverage looks like, what the escalation rate was under AI triage, and what it would cost to maintain the old model versus the new one. The decision is no longer theoretical.

The question security leaders need to answer

The MDR model was designed for a world where attackers operated at human speed, and the primary challenge was staffing coverage. That world is gone. Attackers are running AI-assisted campaigns, moving through environments faster than human triage queues can respond, and specifically targeting the low-severity signal space where MDR leaves blind spots.

The question for every CISO and security leader evaluating their current operations is straightforward. Of the 60% of alerts your team isn't reviewing, how confident are you that none of them contain a real threat?

The answer, informed by Intezer's analysis of 25 million real alerts, is that roughly 54 of them do. Every year. One per week. In the pile that no one is looking at.

The AI SOC doesn't promise to eliminate all threats. No platform does. But it closes the coverage gap that the MDR model structurally cannot. Every alert, every severity, every hour of the day, is investigated with forensic depth, in under a minute. That is what security operations in the AI era look like.

Found this article interesting? See the 2026 MDR renewal checklist by Intezer.

Found this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Google News, Twitter and LinkedIn to read more exclusive content we post.



from The Hacker News https://ift.tt/j04YsB3
via IFTTT

Thursday, June 11, 2026

A tale of two eras

A tale of two eras

Welcome to this week’s edition of the Threat Source newsletter. 

To the surprise of absolutely no one who has seen my face, I’m one of the younger employees at Talos. As my industry veteran colleagues were buying the first iPods, navigating the switch from dial-up to broadband, saying goodbye to floppy disks, and making Myspace accounts, I was playing with my Password Journal and Friend Chips. It’s a funny contrast, but I still experienced the beginning of the “always-on” era. 

Ah, those were the days. One of my most vivid tech memories is begging my dad to play games on his Handspring Visor — a classic personal digital assistant (PDA) launched in late 1999 by Handspring, a company formed by the original creators of the PalmPilot. Handspring stopped producing the Visor line in 2002 and it eventually became obsolete, mostly because its desktop sync feature couldn't keep up with modern OS updates. Despite the tech debt, I spent hours playing Asteroid, Centipede, and Hardball (aka Breakout) on that thing. My dad, meanwhile, mostly used the Memo function to store his passwords... which he still does today. (Yeah, I’m still working on getting him to see the wonders of 1Password.) 

A tale of two eras

You might be wondering what made me reminisce on childhood toys. A few weeks back, my fiancée and I drove a few hours to visit my family. Even if we get in at 9:00 p.m., it’s tradition for us to stay up late eating pizza and talking about random stuff. 

We got on the topic of phones because my parents still have a landline, and I mentioned that walkie talkies were my first introduction to having my own personal device. My dad dug some old ones out, set them on the table, and put them on scan while we chatted.  

At some point, the conversation petered out just when the walkie talkie captured a channel. Radio static, and then a kid’s voice broke our silence: “Your butt crack is out.” 

My dad got an impish grin and brought the talkie up to his mouth. My mom pleaded, “No. Honey, no. Don’t.” The rest of us were already wheezing and crying. 

He pressed the talk button and, in his best crotchety old man voice, bellowed, “Hey, you kids. Get off my lawn!” 

Imagine being those poor kids. It’s a funny story, but if you don’t want people like my dad intercepting your comms, maybe stick to encrypted channels. 

The one big thing 

Talos' Yuri Kramarz published a blog highlighting how AI-driven vulnerability discovery has completely outpaced human patching capabilities. With frontier AI models autonomously discovering and exploiting zero-days in minutes, the traditional vulnerability lifecycle has completely collapsed. To survive this hyper-accelerated threat environment, organizations must abandon patch-reliant strategies and embrace a three-stage fallback model built on foundational security principles. 

Why do I care? 

Speed is the new, terrifying multiplier in the traditional risk equation. When an AI can uncover a decades-old zero-day and write an exploit for it in minutes, relying solely on vulnerability management is a losing game. Defenders must accept that some exploitation will inevitably slip through the cracks. The true measure of security is no longer just prevention, but how well your environment can absorb, detect, and survive the initial blow. 

So now what? 

Stop treating security basics like optional compliance checkboxes. Enforce multi-factor authentication (MFA) everywhere, harden devices using CIS benchmarks, and implement strict network segmentation to limit an attacker's blast radius. Since hardened systems only slow attackers down, deploy behavioral-based EDR, NDR, and XDR to catch the post-exploitation activity that signatures miss. Finally, validate these controls through penetration testing and purple team exercises so your incident response playbooks become muscle memory, not just wishful thinking. Read the full blog for more. 

Top security headlines of the week 

CISA gives U.S. federal agencies three days to fix a VPN bug under attack by Qilin 
Check Point Software said the bug affects several of its remote access tools, firewalls, and VPNs, which act as digital gatekeepers to protect company networks from unauthorized access. (TechCrunch

Anthropic launches Claude Fable 5: Mythos-class AI with cybersecurity guardrails  
The AI giant says this marks the first time a model of this capability class has been deemed safe enough for widespread public and developer access. (SecurityWeek

Microsoft fixes two high-severity zero-days disclosed by researcher 
The vulnerability is a local privilege escalation, meaning it can be chained to a separate vulnerability to give users or processes with low-level privileges the ability to defeat OS protections and gain full SYSTEM rights needed to install malware. (Ars Technica

WhatsApp catches spyware firm NSO defying no-hacking court order 
According to WhatsApp, the spyware maker has violated the permanent injunction. The messaging app reported on Monday that it had recently learned of a social engineering attack that attempted to trick users into clicking on malicious links. (SecurityWeek

High-severity vulnerability in Linux caused by a single faulty character 
The presence of a single mis-issued exclamation point in code implementing nf_tables introduced a use-after-free, a class of vulnerability that corrupts memory by placing malicious code at memory addresses that haven’t been properly freed of their previous contents. (Ars Technica

Can’t get enough Talos? 

Hypotheses, telemetry, and human judgment: Inside Cisco Talos Threat Hunting 
Learn how Cisco Talos Threat Hunting uses hypothesis-driven methods and multi-domain telemetry correlation to find stealthy threats operating below automated detection thresholds. 

Winning the cyber marathon with Tony Giandomenico 
In the high-speed world of cybersecurity, the difference between a breach and a breakthrough often comes down to endurance. Tony Giandomenico, Senior Director of Product Management with Cisco Talos, joins me to discuss Talos Threat Hunting, the challenges of leading major product launches, and the grueling discipline of Ironman triathlons. 

When synthetic logs don’t lie: Generating coherent attack stories for better detection 
Are your detection rules failing because your test data lacks the nuance of a real-world network?  In this episode of Talos Takes, Amy sits down with David Bianco to discuss why traditional synthetic data often falls short and how his new open-source project, EvidenceForge, is changing the game. 

Upcoming events where you can find Talos 

Most prevalent malware files from Talos telemetry over the past week 

SHA256: 9f1f11a708d393e0a4109ae189bc64f1f3e312653dcf317a2bd406f18ffcc507  
MD5: 2915b3f8b703eb744fc54c81f4a9c67f  
Talos Rep: https://talosintelligence.com/talos_file_reputation?s=9f1f11a708d393e0a4109ae189bc64f1f3e312653dcf317a2bd406f18ffcc507  
Example Filename: VID001.exe  
Detection Name: Win.Worm.Coinminer::1201** 

SHA256: 96fa6a7714670823c83099ea01d24d6d3ae8fef027f01a4ddac14f123b1c9974  
MD5: aac3165ece2959f39ff98334618d10d9  
Talos Rep: https://talosintelligence.com/talos_file_reputation?s=96fa6a7714670823c83099ea01d24d6d3ae8fef027f01a4ddac14f123b1c9974 
Example Filename: d4aa3e7010220ad1b458fac17039c274_63_Exe.exe  
Detection Name: W32.Injector:Gen.21ie.1201 

SHA256: a31f222fc283227f5e7988d1ad9c0aecd66d58bb7b4d8518ae23e110308dbf91 
MD5: 7bdbd180c081fa63ca94f9c22c457376 
Talos Rep: https://talosintelligence.com/talos_file_reputation?s=a31f222fc283227f5e7988d1ad9c0aecd66d58bb7b4d8518ae23e110308dbf91 
Example Filename: d4aa3e7010220ad1b458fac17039c274_62_Exe.exe 
Detection Name: Win.Dropper.Miner::95.sbx.tg** 

SHA256: 9896a6fcb9bb5ac1ec5297b4a65be3f647589adf7c37b45f3f7466decd6a4a7f 
MD5: 38de5b216c33833af710e88f7f64fc98 
Talos Rep: https://talosintelligence.com/talos_file_reputation?s=9896a6fcb9bb5ac1ec5297b4a65be3f647589adf7c37b45f3f7466decd6a4a7f 
Example Filename: sample.exe  
Detection Name: Win.Tool.Procpatcher::1201 



from Cisco Talos Blog https://ift.tt/nLwlQ32
via IFTTT

Terraform MCP server is now generally available

Terraform MCP server enables AI assistants like GitHub Copilot, IBM Bob, Claude Code  etc. to interact with Terraform through the Model Context Protocol (MCP). By connecting AI to your infrastructure workflows, teams reduce manual effort, eliminate context switching between tools, and accelerate delivery without compromising security.

Today, we're announcing the general availability of Terraform MCP server, now available for both HCP Terraform and Terraform Enterprise. This represents a milestone shaped by customer and community feedback since we first announced Terraform MCP server last year. In this post, we'll explore how Terraform MCP server improves infrastructure team productivity through AI-assisted workflows, maintains security by design, and provides flexible deployment options for teams of any size.

Accelerate infrastructure workflows with AI

Teams previously spent significant time on repetitive tasks: searching documentation, interpreting plan files, and auditing configurations. Terraform MCP server shifts this burden to AI assistants, allowing engineers to focus on strategic work rather than routine operations.

Generate code using your organization's standards

Before, engineers manually searched private registries for approved modules, copied examples, and verified compliance with organizational policies. This process was time-consuming and error-prone, often resulting in inconsistent infrastructure patterns across teams.

Now, AI assistants can connect directly to your Terraform or Terraform Enterprise private registry. They discover approved modules, understand your organization's patterns, and generate compliant code automatically. This eliminates the need to manually search modules and ensures consistent infrastructure across your organization, reducing both development time and compliance risk.

Access Terraform workspace data and configurations

Managing infrastructure across multiple workspaces requires constant context switching between tools and interfaces. Traditionally, engineers navigate through web UIs or CLI commands to gather information about workspace configurations and variables, a fragmented workflow that slows down troubleshooting and decision-making.

Terraform MCP server provides AI assistants with direct access to workspace data and configurations. Users can ask questions like "Which workspaces haven't been updated in 90 days?" or "Show me workspaces managing more than 1,000 resources," and receive immediate answers. This unified access eliminates context switching, enabling teams to gain faster insights and make informed decisions without leaving their development environment.

Understand plan changes with context

Terraform plan output can be difficult to interpret, especially for complex infrastructure changes. Engineers have traditionally spent time manually parsing plan files, tracing resource dependencies, and assessing the impact of modifications before approval.

Terraform MCP server now enables AI assistants to analyze plan details and explain changes in natural language. This reduces the risk of misinterpreting plans and speeds up code review cycles, helping teams move faster while maintaining confidence in their infrastructure changes.

Security by design

For infrastructure teams, security is non-negotiable. Terraform MCP server acts as a controlled interface that enforces your existing Terraform authentication and authorization. AI assistants receive only the specific information needed to answer questions, and not the credentials or sensitive data, reducing the risk of exposure while maintaining the security boundaries you've already established. The server includes CORS policies, rate limiting, and OpenTelemetry integration for monitoring and security auditing.

Flexible deployment options

Terraform MCP server supports deployment modes that fit how your team works. For individual developers, local execution provides the fastest setup and keeps all data on your machine, ideal for personal development and testing. For teams requiring centralized management, the server can be deployed as a shared service that team members access remotely while maintaining individual access controls through their own Terraform tokens.

Both deployment modes enforce the same authentication model, credentials remain in the deployment environment, while AI assistants receive only necessary metadata and configuration data needed to respond to queries.

Get started with Terraform MCP server

Terraform MCP server works with multiple AI assistants, including IBM Bob, Claude Desktop, GitHub Copilot, and other MCP-compatible tools. To get started:

·      Read the documentation on setting up the MCP server.

·      View the private registry tutorial

·      Go to the GitHub repo

New to Terraform? Sign up for an HCP account to get started today and check out our tutorials. HCP Terraform includes a $500 credit that allows users to quickly get started using features from any plan, including HCP Terraform Premium. Contact our sales team if you’re interested in trying our self-managed offering: Terraform Enterprise.



from HashiCorp Blog https://ift.tt/nfwIHMC
via IFTTT

The Gentlemen Ransomware Claims 478 Victims, Can Spread Like a Worm

A new analysis of The Gentlemen operation has revealed that the financially motivated threat group initially operated as an affiliate responsible for conducting double extortion attacks, while leveraging resources from various ransomware-as-a-service (RaaS) schemes like LockBit (aka Tenacious Mantis), Qilin (aka Pestilent Mantis), and Medusa (aka Venomous Mantis).

According to a detailed report published by PRODAFT, the group, which it tracks as Phantom Mantis, is led by a Russian-speaking cybercriminal tracked as LARVA-368, who goes by the monikers hastalamuerte, ArmCorp, zeta88, nobody0, and santamuerte. The Gentlemen is known to be active since March 2025, claiming a total of 478 victims to date, per data from Ransomware.Live.

"In July 2025, Phantom Mantis transitioned into The Gentlemen, an independent partnership program no longer dependent on other RaaS groups," the Swiss cybersecurity company said. "Additionally, LARVA-368 relies heavily on artificial intelligence for the development and maintenance of ransomware and tools, as well as for assistance with post-exploitation procedures."

As for LARVA-368, the threat actor is assessed to have been a member of the Embargo (aka Primeval Mantis) ransomware group before launching their own operation under the name ArmCorp. It was subsequently rebranded to The Gentlemen four months later.

The individual's identity has since been outed by cybersecurity journalist Brian Krebs as a 36-year-old Alexander Andreevich Yapaev (Япаев Алексанр Андреевич) from the Russian city of Izhevsk. PRODAFT told The Hacker News that its findings match the same persona with "high confidence."

As detailed by Dark Atlas in August 2025, the shift coincided with a payment dispute between LARVA-368 and Qilin, with the threat actor accusing the RaaS operation of carrying out an exit scam and defrauding them of $48,000.

"Although Phantom Mantis was a very active affiliate group with over 20 targets registered on its affiliate panel in less than 30 days, the group's admin (LARVA-368) and LARVA-367 (aka DevMan), a former Phantom Mantis's member, claimed that Pestilent Mantis was scamming affiliates and that there was an alleged 'backdoor' within the Pestilent Mantis's affiliate panel victim chats," PRODAFT noted.

"Although we could not confirm these claims, there is a chance that LARVA-368 and LARVA-367 intentionally spread disinformation with the intent of recruiting Pestilent Mantis affiliates to Phantom Mantis by discrediting the group."

Phantom Mantis has also been observed paying for Premium accounts on underground forums to boost their visibility and fend off competition, with the group's communication and the technical support handled by a separate Russian-speaking persona named The Gentlemen Data.

Some of the other salient aspects of the extortion scheme compiled from various reports are as follows -

  • In an analysis of the ransomware in late last year, LevelBlue's Cybereason team described The Gentlemen as a "highly adaptive, fast-moving ransomware operation" that combines mature ransomware techniques with RaaS features, double extortion, cross-platform lockers, and flexible propagation, and affiliate support.
  • The group has emerged as one of the most active threat actors, accounting for 10% of ransomware activity in April 2026. "The Gentlemen follows an enterprise-focused chain beginning with initial access, via vulnerable internet-facing services or stolen credentials," NCC Group said. "Analysis suggests The Gentlemen can adapt and change tactics during an attack, such as manipulating GPOs, compromising privileged accounts, and using custom methods to bypass endpoint protections."
  • Only about 13% of their victims are based in the U.S. The majority of the victims are concentrated in Thailand, the U.K., Brazil, Germany, and India.
  • LARVA-368 uses The Gentlemen IM app accounts to support affiliates regarding encryption and any intrusion-related issue, such as providing EDR killers to bypass security solutions via the bring your own vulnerable driver (BYOVD) technique.
  • Support services for both The Gentlemen and The Gentlemen Data are available via Tox, SimpleX Chat, and Ricochet Refresh open-source messaging platforms.
  • Potential affiliates are required to provide the administrator at least 1GB of data exfiltrated from a victim to gain access to the affiliate panel, a tactic designed to prevent researchers and law enforcement authorities from gaining access to the infrastructure under the guise of an affiliate. The affiliate panel supports user management, configuring new targets, and downloading ransomware to a specific target.
  • Phantom Mantis provides five versions of ransomware that are designed for Windows, Linux, ESXi, Windows XP+, and Logical Volume Manager (LVM).
  • The group courts affiliates with an aggressive profit-sharing model: 90% for affiliates and 10% for the operator.
  • Initial access is obtained via edge devices such as VPN appliances, firewalls, and other internet-facing systems, with a specific focus on platforms like Cisco and Fortinet FortiGate.
  • Infection chains involve the use of red team utilities like NetExec, RelayKing, TaskHound, PrivHound, and CertiHound to perform Active Directory discovery, certificate abuse, privilege escalation, and file share discovery. A separate set of tools, such as EDRStartupHinder, gfreeze, glinker, and DumpBrowserSecrets, are used for evading security programs, while Velociraptor is employed for command-and-control (C2).
  • The attacks also attempt to clear System, Application, and Security Windows Event Logs, disable Microsoft Defender, and add antivirus exclusions.
  • The ransomware makes use of a hybrid cryptographic scheme: X25519 key exchange combined with XChaCha20 symmetric encryption.
  • Microsoft, which is tracking the cluster under the moniker Storm-2697, said the ransomware is written in Go and obfuscated with Garble to target the Windows environment. "When enabled with the --spread argument, it turns the malware from a single-host encryptor into a self-propagating worm that attempts to deploy its encryptor to every reachable system on the network," the tech giant said. "If the --wipe argument is provided, The Gentlemen ransomware performs an additional post-encryption routine to eliminate recoverable artifacts from disk."
  • According to ZeroFox, the ransomware crew likely runs a multi-channel extortion operation, combining ransomware attacks with email outreach and phone-based pressure tactics targeting victims.
  • The group implements a "highly responsive development cycle," an aspect exemplified by the release of a same-day patch after a decryptor was released in April 2026.
  • The average dwell time of an intrusion ranges from two to six weeks from initial access to encryption, with the group particularly focusing on organizations running VMware infrastructure.

Last month, a leak of an internal Rocket.Chat database used by the group - comprising 3,366 messages between November 2025 to late April 2026 - has shed further light on the group's inner workings, including its use of known security flaws in VMware Aria Operations, Fortinet, Cisco, and Microsoft software, while painting a picture of a criminal enterprise whose members have a clear division of roles and responsibilities.

"The group actively tracks and evaluates modern vulnerabilities, including CVE-2024-55591, CVE-2025-32433, and CVE-2025-33073, and combines them with technique-driven paths like backup and management-controller abuse and NTLM relay workflows, giving them a flexible exploitation pipeline," Check Point said.

That's not all. In March 2026, Hunt.io said it discovered an open directory hosted at "176.120.22[.]127:80" on the Russian bulletproof hosting provider Proton66 that exposed 126 files containing a complete ransomware operator toolkit attributed to a The Gentlemen RaaS affiliate.

This included tools for reconnaissance, privilege escalation, defense evasion, credential theft, lateral movement, persistence, and pre-encryption preparation, essentially spanning all phases of the intrusion lifecycle.

"LARVA-368 is a threat actor specializing in extortion-related activities and has been active since at least 2020," PRODAFT said. "The expertise acquired through previous collaborations with various RaaS groups provided the technical foundation necessary to establish The Gentlemen RaaS."



from The Hacker News https://ift.tt/QY8rldZ
via IFTTT

Enterprise Data Storage Solutions: Architectures, Features, and Trends

Enterprise storage requirements roughly double every few years. Organizations absorb new workloads faster than storage budgets grow. The storage layer is where availability and performance intersect – and where recovery either works or doesn’t. If your design doesn’t match the workload, the consequences show up fast: slow applications, missed backup windows, or ransomware recovery that drags on for weeks.

What is enterprise data storage?

Enterprise data storage is hardware and software built to store, manage, protect, and provide access to large volumes of business-critical data. Consumer storage optimizes for price and simplicity. Enterprise systems add redundant hardware paths, hot-swap components, consistent performance under concurrent load, and the management APIs that production environments depend on. A desktop NAS might hold the same terabytes as an enterprise filer, but a single controller failure on the desktop model takes everything down with it. We’ve seen it happen.

The main architectures fit different access patterns. I’ll explain why the choice matters in a moment.

Why enterprise storage matters

Ransomware has made storage architecture a security decision. Modern attacks target both primary storage and backup repositories. If you think air-gapped backups are overkill, wait until you need them. That assumption is expensive.

Regulatory compliance adds retention and access requirements that mid-market storage can’t meet reliably. Hospitals retain imaging data for years under HIPAA (which carries specific access and audit rules). Financial institutions produce trade records on demand under SOX. Manufacturers keep quality data for product liability periods. Each needs audit-capable storage that can demonstrate chain of custody.

Uptime requirements have tightened too. Applications that carried loose SLAs a decade ago now run payment systems and patient care workflows. Five nines availability is roughly 5.26 minutes of downtime per year. Achieving that typically requires redundant controllers, automatic failover, and often synchronous replication to a secondary site. It isn’t cheap, and it isn’t simple.

Block, file, and object: the access models

Most environments use all three, but that doesn’t mean you should treat them the same.

Block storage presents raw volumes to the operating system, which formats them as local disks. Databases write directly to blocks, and operating systems boot from block volumes. VMware vSphere, Hyper-V, Oracle, and SQL Server rely on block storage because it gives the lowest latency and lets applications control the I/O path directly.

File storage organizes data into a directory hierarchy accessed over NFS or SMB. Multiple users and services can read and write the same files simultaneously. Shared workspaces and home directories are typical file storage use cases.

Object storage treats data as discrete objects with metadata and a unique identifier, accessed via HTTP-based APIs like S3. Because there is no directory structure to maintain, object storage scales far beyond the practical limits of conventional file systems. The tradeoff is latency. This kind of storage isn’t designed for random block I/O and is generally unsuitable as primary storage for databases. It fits data lakes, backup repositories, and compliance archives that otherwise would’ve gone to tape. For a detailed comparison, see block vs object storage on the StarWind blog.

Six architectures that show up in production

Here’s where theory meets the hardware you’ll actually buy. We’ve worked with environments that ran four of these six types simultaneously, usually because different teams bought different things and nobody wanted to rip anything out. That mess is more common than vendors admit, and it’s why the “one platform” pitch never quite lands.

 

Enterprise storage types and architectures

Figure 1: Enterprise storage types and architectures

 

DAS (direct-attached storage)

DAS connects drives directly to a single server with no network layer in between. It gives the fastest access for single-node workloads. The limitation is that DAS can’t be accessed by other servers without copying data. It’s most useful when raw local performance matters more than centralized access.

SAN (storage area network)

SANs present block-level volumes to servers over a dedicated high-speed network. The OS treats these volumes as local disks. Virtualization clusters and high-performance databases run on SAN infrastructure because it provides consistent low-latency block I/O.

That I/O can be shared across multiple hosts without the overhead of a file system layer or the contention that starts when NFS locks fight your database checkpoint threads. Pure Storage FlashArray, Dell PowerStore, and HPE Alletra are the dedicated-appliance segment of the market – as opposed to the software-defined or white-box options.

NAS (network-attached storage)

NAS delivers file-level storage over Ethernet using NFS or SMB. It suits shared file environments, including home directories, collaborative workspaces, video production storage, and backup landing zones.

NetApp ONTAP and Dell PowerScale are widely used enterprise platforms. Mid-range NAS solutions typically include deduplication, compression, snapshots, and thin provisioning. Many enterprise NAS platforms also expose storage over iSCSI. That makes them dual-protocol devices that can handle both file and block workloads from the same hardware. If you’re supporting a small or midsize office, NAS is often all the shared storage infrastructure you need.

Object storage

Object storage manages unstructured data at scale through S3-compatible APIs. DataCore Swarm, for example, provides an on-premises S3-compatible platform with support for S3 Object Lock, which allows organizations to deploy immutable backup targets and compliance archives without sending data to public cloud.

At scale, object storage generally offers a lower cost per terabyte than block or file storage, while its flat namespace can grow well beyond the limits of traditional file systems. The tradeoff is latency.

SDS (software-defined storage)

SDS separates the storage control plane from physical hardware. (This is the same abstraction idea that made VMware popular in compute, but storage admins are often more skeptical of it.) The software layer manages storage services across commodity servers or existing arrays.

It presents a unified interface regardless of the hardware underneath. DataCore SANsymphony runs on standard servers and provides auto-tiering, caching, mirroring, and high availability across heterogeneous storage platforms, including Dell, HPE Alletra, Pure Storage, and NetApp ONTAP. This makes it possible to consolidate SAN services without replacing existing equipment. VMware vSAN and Red Hat Ceph cover similar ground for larger clusters with different trade-offs in management complexity and hardware requirements.

HCI (hyperconverged infrastructure)

HCI puts compute and storage on the same physical nodes, manages networking there too, and treats the whole stack as one system. It reduces hardware footprint and simplifies deployment for remote offices and edge locations where maintaining separate storage hardware isn’t practical. Nutanix AOS and StarWind HCI Appliance are both widely deployed in this segment.

StarWind HCI Appliance is designed for compact two-node or small-cluster configurations where storage and compute share the same hardware, high availability remains local, and there is no dependency on a dedicated storage network.

You can use the table below as a starting point to match your workload requirements with the storage architecture.

 

Storage type Best for Scalability Performance
DAS Single-server workloads Low High
SAN Virtualization and databases Medium High
NAS File sharing and collaboration Medium Medium
Object storage Backups, archives, AI datasets Very high Low
SDS Hybrid environments, virtualization High High
HCI ROBO and edge deployments Medium High

 

How to choose without buying the wrong thing

No single architecture fits every workload. Start with what you actually need.

A virtualization cluster serving dozens of VMs has completely different requirements than a backup repository, a surveillance archive, or a data lake holding training data for a model that only runs on Tuesdays. Block workloads need consistent low-latency I/O. Sequential bulk workloads such as AI training and video ingest require throughput. Archival workloads need low cost per terabyte at scale. Since no single platform optimizes all three equally well, tiered architectures remain common.

Storage deployed at 70% capacity at launch often reaches 90% within 18 months as backup sets grow and new workloads arrive. Prioritize platforms that can scale by adding nodes or shelves without requiring disruptive data migration. In many cases, the labor cost of a forced migration exceeds the initial price difference between platforms that don’t offer graceful scale-out.

Performance planning is commonly underestimated. Teams benchmark storage under synthetic load and miss what happens when production workloads run in parallel. Checkpoint writes and backup operations running alongside peak database traffic can expose limitations that benchmarks never reveal. I’ve sat through vendor presentations where the benchmark numbers looked incredible, but the array fell over when we added backup traffic during a synthetic OLTP test. Ask for a mixed-workload demo. If they won’t do it, that tells you something.

Data protection requirements should define which features are non-negotiable before evaluation begins. The backup and DR architecture should be designed alongside the primary storage selection. Vendor support and ecosystem fit, including clean integration with your existing VMware, Hyper-V, or backup software, reduce implementation friction and day-to-day operational overhead. I’ve bought the wrong array before because the benchmark looked pretty and I didn’t ask about mixed workloads. Never again.

Backup storage and cyber resilience

Backup storage is a discipline of its own. You can’t afford to treat it as an afterthought.

The 3-2-1-1 strategy is the working baseline: three copies of data, on two different media types, one offsite, and one immutable or air-gapped. Immutability is the addition that ransomware recovery patterns made necessary. When attackers compromise primary storage and then locate and encrypt backup repositories, immutable backups with write-once semantics are often the only reliable recovery path left.

S3 Object Lock prevents overwriting or deleting objects for a defined retention period, regardless of credential compromise. DataCore Swarm supports Object Lock, so it works well as an immutable backup target if you’re running Veeam, Commvault, Rubrik, or comparable enterprise backup platforms. If you’re designing a cyber-resilient backup architecture, combining Object Lock, separate credentials, isolated backup access paths, and network segmentation can significantly reduce the impact of a storage-layer attack.

Restore testing is where backup strategies most often fail. Organizations that have never completed a full-scale restore at production data volumes usually discover weaknesses during an incident rather than during a planned exercise.

Healthcare organizations operating under HIPAA, financial institutions subject to SOX and PCI-DSS, and public sector entities all face specific retention and recovery requirements. The backup platform must support demonstrable compliance.

What is actually changing

NVMe and NVMe-oF are moving into mainstream enterprise deployments, not just hyperscale. It gives significantly lower latency than SATA or SAS SSDs do, and NVMe over Fabrics extends that performance over the network. Shared all-flash storage can now approach the latency of directly attached drives, which isn’t something you could’ve said five years ago.

If you’re running a mid-size enterprise, NVMe-oF is no longer exotic. As AI inference and real-time analytics demand lower and more consistent I/O, it is increasingly common as a shared hot-tier architecture. Both StarWind Virtual SAN and DataCore SANsymphony support NVMe-oF as a transport layer. That makes software-defined deployments viable for environments that previously required dedicated NVMe SAN hardware.

AI and GPU workloads are creating storage demand patterns that traditional NAS and SAN platforms weren’t originally designed to handle. Training large models requires high-throughput parallel reads, burst checkpoint writes, fast KV-cache access, and low-latency metadata operations during inference. Storage teams now design tiered AI storage separately from general-purpose shared storage, with NVMe close to compute, a parallel file system for the active training tier, and S3-compatible object storage for the data lake.

Hybrid and multi-cloud storage is the operational reality for most organizations. Primary data lives on-premises, cold data migrates to cloud tiers, and cloud compute handles overflow training runs. Storage platforms with native cloud tiering reduce the complexity of managing data movement between locations, which is why they’ve become popular.

Immutable storage and cyber resilience have moved from best-practice guidance to standard requirements. Some compliance frameworks now explicitly require demonstrable immutability for backup copies and tested air-gapped recovery environments. At the same time, HCI adoption continues to grow in remote and edge environments as edge computing expands in manufacturing and retail, though it’s still rare in heavy industry.

Mistakes that keep happening

Storage errors repeat across organizations of every size.

The most common error is underestimating scalability requirements. Data growth consistently outpaces what teams projected at procurement time, as new workloads and expanding backup sets pile up faster than budget cycles allow. Log retention periods stretch too, often without anyone updating the capacity model. Capacity shortages rarely emerge during planned upgrade cycles; they usually appear as operational emergencies. You can’t schedule your way out of exponential growth.

Teams often try to add backup immutability after deployment, which usually means they haven’t thought through recovery timelines. Immutable copies and backup network isolation are architectural decisions that need to be made before storage is purchased, not retrofitted after a recovery incident makes the gap obvious.

When you use the same platform for both primary and backup, you remove the separation that makes recovery possible when primary storage is compromised. Backup storage should be architecturally distinct, with separate credentials and a network path that production systems cannot reach. One backup copy is equally problematic. True resilience comes from maintaining multiple copies and regularly validating restore procedures.

Insufficient performance testing before purchase remains a common oversight.

Synthetic benchmarks may look impressive, but checkpoint writes and backup operations running alongside peak database traffic can expose limitations that benchmarks never reveal. If you’re evaluating a storage platform, mixed-workload testing should be part of the decision process. I once watched a team skip mixed-workload testing because the vendor’s datasheet looked convincing. The array lasted a few months before the database team started complaining about latency spikes during backup windows. Don’t make that mistake.

Another frequent mistake is failing to integrate storage monitoring into the broader observability strategy. Latency spikes and capacity growth often go unnoticed until they trigger user-facing issues. Queue depths often climb quietly in the background until someone notices the application timeouts. Storage metrics should feed into the same monitoring platform used for compute and networking infrastructure, or you’ll miss the warning signs.

Conclusion

If you have fewer than a hundred VMs and no dedicated storage admin, start with HCI or a dual-protocol NAS. You’ll get shared storage and replication without building a SAN fabric. Budget for NVMe block storage if you’re running Oracle, SQL Server, or anything that counts latency in milliseconds. And whatever you buy, test your restores before you sign the acceptance paperwork.

FAQ

What is enterprise data storage?

Enterprise data storage consists of hardware and software platforms designed to store, manage, protect, and provide access to large volumes of business-critical data. Unlike consumer-grade storage, enterprise solutions include redundancy, high availability, data protection capabilities, and centralized management tools designed for production environments.

What storage is best for AI workloads?

Active training datasets benefit from high-throughput parallel access, either a parallel file system or local NVMe staging. Data lakes and cold datasets suit S3-compatible object storage, while checkpoint writes need a tier built for burst write performance. Most AI deployments use a tiered architecture matched to each stage of the pipeline.

What is the difference between enterprise and consumer storage?

Enterprise storage includes dual controllers, hot-swap components, end-to-end error correction, consistent performance under concurrent multi-user load, snapshot and replication capabilities, and REST management APIs. Consumer storage lacks most of these features and is not designed for continuous operation under shared production workloads.



from StarWind Blog https://ift.tt/RON4n79
via IFTTT