The Leading Data Breach Risks for Enterprises

June 13, 2025

Top data breach risks: cloud sprawl, legacy tech, shadow AI, 3rd parties, copied data—secure it all with data-centric defense.

Public cloud environments, in particular, are seen as both a necessity and a liability. While valued for their speed and flexibility, 70% of cybersecurity leaders identify public cloud infThe proliferation of data across diverse environments and the relentless evolution of cybercriminal tactics have rendered traditional security paradigms severely inadequate.

As organizations strive for agility, innovation, and operational efficiency via digital transformation initiatives, they simultaneously expose themselves to numerous data breach risks that can have devastating financial, reputational, and regulatory consequences.

From the inherent vulnerabilities of distributed cloud architectures and aging legacy systems to the threats posed by compromised third-party services and the growing uncontrolled use of AI, the attack surface for modern enterprises has grown out of control.

As we delve into the leading data breach risks confronting organizations today, we’ll examine the root causes and potential impacts of each which ultimately underscore the need for a paradigm shift towards more data-centric security approaches.

‍

1. Data Distributed Across Multiple Environments and Public Clouds

‍

The shift to hybrid and multi-cloud environments has brought undeniable operational benefits, but it also introduces significant complexity and risk.

Today, 40% of data breaches involve data spread across multiple environments, including public clouds, private clouds, and on-premises infrastructure (IBM). This fragmentation expands the attack surface and makes it harder to enforce consistent security controls

rastructure as the most vulnerable part of their hybrid setup. The data supports this concern: breaches involving only public cloud systems were the most expensive in 2024, costing an average of USD 5.17 million, up 13.1% from the previous year (IBM).

In this increasingly dynamic landscape, where sensitive data is constantly in motion across multiple platforms, traditional DSPM tools fall short. They are simply not built to continuously discover, classify, and protect data at the scale and speed required across complex, distributed environments.

‍

Misconfigurations

‍

Cloud misconfigurations remain one of the most common – and expensive – causes of data breaches. According to Gartner, they play a role in 80% of all data security incidents.

These breaches are typically the result of human error. In fact, Gartner projected that 99% of cloud security failures through 2025 will stem from mistakes made by customers rather than cloud providers (CSA).

Common misconfigurations include weak or overly permissive IAM policies, exposed API keys, insufficient security monitoring, and improper setup of data backup services, all of which can leave sensitive data dangerously exposed.

‍

Shadow Data

‍

Shadow data, defined as unmanaged data assets often invisible to IT departments, is also a contributing factor, as it is linked to one out of every three data breaches (IBM).

This type of data often originates when employees use unauthorized applications, personal cloud storage, or generative AI (GenAI) tools to share or process information, often in an effort to boost productivity outside approved channels.

Breaches involving shadow data take longer to detect and contain, and they cost more. According to IBM, these breaches average $5.27 million, which is 16.2% higher than the average cost of a data breach.

Compounding this issue is a widespread lack of visibility; nearly half of all organizations report insufficient visibility into lateral traffic – the traffic between systems on the same security zone – within their hybrid cloud environments, creating blind spots for detecting malicious activity (Gigamon).

Enterprises can attempt to regain visibility into their data flows to help existing DSPM tools keep up, but it’s a losing battle. New gaps will constantly emerge, and data will inevitably flow beyond the reach of your current security controls.

‍

2. Legacy IT Systems like Mainframes

‍

Legacy IT systems, including mainframes, present substantial and evolving data breach risks.

These systems often struggle with limitations in scalability, flexibility, and compatibility with modern technologies, making it difficult to implement security controls effectively.

Common vulnerabilities include outdated authentication mechanisms, hardcoded credentials in older applications, limited API support for secure integration, and reliance on obsolete protocols.

Moreover, the operational criticality of these systems often means that applying modern security patches can be a complex undertaking, fraught with the risk of significant disruption to business operations.

The financial burden of maintaining legacy systems often competes with security modernization. Legacy systems are reported to consume as much as 70-80% of IT budgets in just maintenance alone, potentially diverting critical funds from necessary security upgrades.

This situation is further exacerbated by a persistent skills gap related to legacy system security, particularly for mainframes. As experienced personnel who possess deep knowledge of these older technologies retire, organizations lose critical institutional knowledge.

This can lead to an increased reliance on insecure workarounds, incomplete or improperly secured integrations with newer technologies, or a general failure to implement necessary contemporary security controls.

The result? A growing attack surface that’s increasingly difficult to protect with traditional security approaches.

‍

3. Third Party Applications, Code and Services

‍

Data breaches originating from vulnerabilities within the software supply chain and insecure third-party APIs represent a significant and growing threat to organizations globally.

‍

Supply Chain Attacks

‍

The frequency and impact of supply chain attacks have escalated alarmingly in recent years, and there are no signs of this trend slowing down.

Gartner predicts that by 2025, 45% of organizations will experience an attack on their software supply chain. Supporting this, the 2025 Verizon Data Breach Investigations Report (DBIR) highlights that third-party involvement in data breaches has doubled, from 15% to 30%, in just one year.

Attackers are becoming increasingly strategic, deliberately targeting weaker security links among an organization's network of vendors and partners to gain indirect access to high-value systems.

A primary tactic in supply chain attacks involves the exploitation of vulnerabilities in commonly used third-party code, including widely adopted open-source components.

The discovery of critical vulnerabilities in widely used software libraries illustrates the long-term systemic risks posed by dependencies deeply embedded within the software supply chain.

Malicious actors employ various techniques, including upstream server attacks, where they compromise code repositories to inject malicious code, and dependency confusion attacks, which trick development build systems into downloading and integrating compromised software dependencies instead of legitimate ones.

The 2025 OpenText Cybersecurity Threat Report found that 62% of organizations experiencing a ransomware attack attributed its origin to a software supply chain partner. Moreover, the report also notes that third-party vendors and service providers can introduce major additional security risks if their own cybersecurity measures are inadequate.

‍

Third-Party API/Code Vulnerabilities

‍

Securing data and systems grows significantly more complex when factoring third-party APIs and externally developed code integrated into an organization's environment.

While enterprises may implement robust security measures within their own perimeters, they often lack visibility into – and control over – the security practices of third-party vendors whose applications and services are deeply embedded in critical workflows. When these external vendors store sensitive data or expose APIs within core applications, any vulnerability or misconfiguration in those components becomes a direct attack vector.

APIs have emerged as a primary and increasingly targeted attack vector, with a reported 167% YoY increase in API-related security incidents between 2022 and 2023.

The root cause of these API security incidents often involve a lack of robust API authentication controls, the limits of traditional security tools/methods to address unique API threats, prevalent API misconfigurations, and critical authorization vulnerabilities.

There is also the concern of "zombie APIs" – i.e., outdated, unmaintained, and/or forgotten API endpoints that remain active and vulnerable. These legacy interfaces often lack basic security controls, making them low-effort/high-reward targets for attackers probing for lateral access or privilege escalation opportunities.

‍

4. Shadow AI and Unsanctioned Tools

‍

Shadow AI is the unsanctioned or unmonitored use of AI applications/services by employees, departments, or even third-party collaborators, effectively bypassing established IT oversight, governance, and security protocols.

Shadow AI is conceptually an extension of the long-standing challenge of "Shadow IT", i.e., the unauthorized use of hardware or software within an organization. However, Shadow AI presents deeper and more complex risks due to how AI apps/services use data to train their models. The exposure of sensitive or proprietary data to public AI systems can lead not only to security breaches but also to legal challenges, such as intellectual property disputes and regulatory violations.

The primary drivers behind the rise of Shadow AI include the easy accessibility of powerful AI tools (many of which are free or offer low-cost tiers), the increasing integration of AI capabilities directly into existing enterprise software, persistent gaps in workforce AI fluency and readiness, and an overarching organizational pressure for enhanced efficiency and productivity.

Shadow AI usage can lead to many data breach risks. These include improper data handling and the disclosure of sensitive information.

Employees, often with good intentions to improve productivity, may upload proprietary company data, confidential documents, sensitive customer information (PII), unreleased product source code, or detailed financial forecasts into public or unvetted AI models.

This can lead to unintentional data exfiltration, irreversible data loss, and a complete lack of organizational control over how this sensitive data is subsequently stored, processed, or used for training the AI model.

Furthermore, the use of unvetted AI tools with regulated data can lead to significant regulation and compliance violations.

Mandates such as GDPR, HIPAA, and others impose strict requirements on data handling and privacy. Feeding regulated data into non-compliant AI platforms can result in substantial fines, severe legal consequences, and lasting reputational damage.

Compounding this risk is the inherent lack of visibility and control that IT and security teams have over Shadow AI. Often, these departments are unaware of which AI tools are being used in the organization, what specific data is being shared with these external platforms, or how that data is being processed and protected.

Studies indicate that the volume of corporate data being fed into AI tools increased by 485% – i.e., nearly 5X – from 2023 to 2024. Moreover, the input of sensitive data increased from 10.7% to 27.4% in the same period. According to Cisco, 60% of IT leaders say they lack confidence in their ability to identify the use of unapproved AI tools within their environments.

Without a strong data-centric security strategy, Shadow AI becomes a critical blind spot capable of bypassing perimeter defenses and undermining even the most mature security programs.

‍

5. Distributed Teams (Remote, Offshore, Contractor, etc.)

‍

The widespread adoption of distributed work models, remote employees, offshore teams, and third-party contractors has expanded an enterprise’s potential attack surface.

Key vulnerabilities stem from the use of Bring Your Own Device (BYOD), reliance on insecure home networks, and the inherent challenges of securing remote access.

Personal devices often lack the enterprise-grade security configurations and controls present on corporate-managed assets. They may operate with outdated operating systems, host potentially malicious applications, or connect to inadequately secured home Wi-Fi networks.

Third-party contractors can also introduce risk if their security practices are lax or if they’re granted excessive, unmonitored access to sensitive corporate data. According to Verizon’s DBIR, 30% of breaches involve a third party, including contractors.

Managing security for a geographically dispersed workforce (e.g., one that includes offshore teams) adds layers of complexity concerning data sovereignty regulations, compliance with diverse international legal frameworks, and the consistent enforcement of security policies.

Effective risk reduction in this model requires visibility into user behavior, granular and attribute-based access controls, and the ability to enforce data protection policies regardless of geography, device type, or employment model.

‍

6. Copied Production Data

‍

The practice of replicating production data for use in non-production environments – such as development, testing, quality assurance (QA), and analytics – introduces a distinct and often underestimated set of data breach vulnerabilities.

While copying live production data can appear to be the most straightforward path to achieving realistic testing scenarios or enabling development agility, it carries significant risks.

Exposing sensitive production data, which can include customer PIIs, financial records, and/or valuable intellectual property, within these typically less secure non-production environments substantially increases the organizational attack surface.

Should these non-production systems be compromised, the sensitive copied data can be readily exfiltrated and exploited.

Common vulnerabilities that facilitate such data breaches include inherent design flaws in non-production systems, coding defects in test applications, and misconfigurations in the security settings of development or staging environments.

Mitigating this risk requires a deliberate Test Data Management (TDM) strategy. Best practices include dynamic data masking, synthetic data generation, and context-aware data subsetting. These methods provide developers and testers with realistic data sets without exposing sensitive information.

Beyond the security implications, failure to implement proper TDM and masking controls can lead to serious compliance violations. Regulations like GDPR, CCPA, and HIPAA apply regardless of environment. A breach in a test system involving real production data is still a reportable event, carrying legal, financial, and reputational consequences.

In short, if your non-production environments handle real data, regulators and attackers will treat them as production targets. So should you.

‍

Takeaway: Taller and Thicker Walls Don’t Work

‍

Traditional perimeter-based security models are no longer sufficient to protect enterprises against sophisticated cyber threats and vulnerabilities.

The idea of a clearly defined and defensible security perimeter has effectively dissolved due to transformative shifts in IT infrastructure, including widespread cloud adoption, the normalization of remote and hybrid work models, deep reliance on third-party services and APIs, and the interconnectedness of digital supply chains.

In this new reality, where data is dispersed and perimeters are porous, the strategic imperative must shift from solely reinforcing external defenses to fundamentally securing the data itself.

If the "walls" can no longer guarantee protection, then the "treasure" – the data – must be inherently protected regardless of its location or how it is accessed.

This calls for a data-centric Zero Trust approach, where protection mechanisms are embedded directly into the data. Technologies like vaulted tokenization exemplify this principle by replacing sensitive data elements with non-sensitive equivalents (tokens).

Even if tokenized data is accessed by unauthorized parties or exfiltrated from a supposedly secure environment, the underlying sensitive information remains protected because the tokens themselves hold no intrinsic value, are not mathematically linked to the real data, and cannot be reverse-engineered through brute force or cryptographic analysis. As a result, vaulted tokenization is inherently resistant to harvest-now, decrypt-later attack strategies, including those enabled by future quantum decryption capabilities.

By focusing on securing the data directly, organizations can build a more resilient defense against the multifaceted risks outlined, i.e., from distributed cloud environments, SaaS, and legacy system vulnerabilities to third-party exposures, shadow AI, and the challenges of managing copied production data.

‍

Next Steps

‍

Leverage a data security platform (DSP) like DataStealth.

Our vaulted tokenization technology doesn’t require any changes to your codebase or complex integrations; it works in the line of traffic. Not only is it scalable, but as it operates at the network level, it can readily work across any environment, be it multi-cloud, on-premises, hybrid, and/or legacy systems (including mainframe).

DataStealth protects your data with quantum-resistant protection, but it also streamlines major compliance (e.g., data residency) challenges.

Ready to see how it works in your environment? Contact DataStealth today to learn more and schedule a demo.

‍

The Leading Data Breach Risks for Enterprises

June 13, 2025

Recent Blogs

11 Types of Data Encryption Explained

Data Quality vs Data Integrity

What is Dark Data?

1. Data Distributed Across Multiple Environments and Public Clouds

Misconfigurations

Shadow Data

2. Legacy IT Systems like Mainframes

3. Third Party Applications, Code and Services

Supply Chain Attacks

Third-Party API/Code Vulnerabilities

4. Shadow AI and Unsanctioned Tools

5. Distributed Teams (Remote, Offshore, Contractor, etc.)

6. Copied Production Data

Takeaway: Taller and Thicker Walls Don’t Work

Next Steps

About the Author:

The Leading Data Breach Risks for Enterprises

June 13, 2025

Subscribe to Our Newsletter

Recent Blogs

11 Types of Data Encryption Explained

Data Quality vs Data Integrity

What is Dark Data?

1. Data Distributed Across Multiple Environments and Public Clouds

Misconfigurations

Shadow Data

2. Legacy IT Systems like Mainframes

3. Third Party Applications, Code and Services

Supply Chain Attacks

Third-Party API/Code Vulnerabilities

4. Shadow AI and Unsanctioned Tools

5. Distributed Teams (Remote, Offshore, Contractor, etc.)

6. Copied Production Data

Takeaway: Taller and Thicker Walls Don’t Work

Next Steps

About the Author:

Subscribe to Our
Newsletter