Use Cases

To Stop Cloud Data Breaches, Find and Fix Cloud Data Misconfigurations

Dave Cole
October 28, 2020

You need not be a cybersecurity expert to know that data breaches in the cloud have become a pervasive, persistent threat. From major enterprises like Pfizer and Razer, to their digital suppliers such as a cloud backup provider, companies of all sizes and stripes fall victim to a never-ending stream of cloud data breaches or related attacks.

Storing data in major public clouds like AWS, Google Cloud and Azure has become the norm, and IT security teams are well aware of the dangers of cloud misconfigurations. Yet it remains a big challenge for even the largest enterprises to secure their data every minute of every day. It almost doesn’t matter whether an internal or external error has led to a data breach, because the digital footprint for enterprises is expanding at such a fast pace that errors will occur and data will be exposed.

Faced with these challenges, data teams’ knee-jerk reaction may be to avoid storing data in the cloud. But that’s a self-defeating strategy, akin to early attempts to ban automobiles over safety concerns. For most companies today, the cloud is no longer optional; data’s home is now there.

The good news is that it’s possible for data to be securely stored and protected in the public cloud, even more secure than enterprise data centers. At the end of the day, cloud data breaches often start with attackers using scanning techniques. They typically take advantage of discovered exposures resulting from common misconfigurations -- errors that could be thwarted altogether with an automated platform like the Open Raven Cloud-Native Data Protection Platform.

Here’s a look at the top culprits of misconfigurations that breed cloud data breaches, along with the steps that teams can take to mitigate these vulnerabilities while still taking full advantage of the flexibility, scalability and financial advantages of cloud-based data storage.

Cloud storage configuration challenges

In today’s complex, sprawling cloud environments, there are a variety of ways in which organizations may inadvertently set themselves up for data breaches.

Access control complexity

By its very nature, the cloud is connected and accessible. The ability to access cloud data from anywhere and at any time is one of the core selling-points of cloud.

From a data security perspective, problems arise when anywhere, anytime access morphs into anytime, anywhere, anyone access. In other words, when you inadvertently leave an S3 bucket open to the public, for instance, or fail to apply granular access control for your cloud databases, you end up with serious data security vulnerabilities.

A common mistake is relying solely on the Block Public Access feature for AWS S3 buckets. Many think that when they enable this feature, they are automatically protected from attackers. But while it's a good step to take, it's often incomplete. An organization can have an exception to Block Public Access that inadvertently exposes its private information to the Internet.

Defining policies for data security and privacy is also a good step to take. But this alone is not enough; you also need to codify your policies and validate that they are not being circumvented in production with continuous monitoring. All it takes is one miscue for an attacker to get it all.

Varying data storage structures

Today’s public cloud environments allow for hundreds of ways to store data. There are object storage services, like AWS S3 and Google Cloud Storage. There are various types of cloud databases, like AWS RDS and Azure SQL. There are data lakes and data warehouses, like AWS Redshift and Google Big Query. There are databases that can lift-and-shift onto EC2 instances. There is non-persistent storage inside containers. And so on.

The flexibility of cloud storage models is part of what makes the cloud so appealing. But it also makes it challenging to audit and secure all of your cloud data. The data monitoring and auditing tools that work for S3 may not work for GCS, for example. Or, tooling for managing a cloud-based relational database may not jive with NoSQL databases. Open source databases like MySQL may likewise require different data management and auditing tools than a proprietary database service like Redshift.

What this all amounts to is disparate data storage strategies, and the varied security configurations that accompany them, make it quite challenging – though certainly not impossible – to implement a comprehensive data protection strategy.

Move fast, break data security

The cloud also lends itself to "move fast, break things" strategies wherein teams spin up new workloads, like databases or storage buckets, then forget to turn them off when they're no longer needed.

If you lose track of those resources – which is easy to do because cloud vendors themselves don't go out of their way to help you find and fix instances where you are consuming their services unnecessarily – they can become great vectors for data leaks or other data security problems.

For example, maybe your developers spin up a VM for testing, load some non-anonymized real-user data into it and then leave it running indefinitely. Or maybe the IT team performs a database backup before migrating from one cloud database service to another, but forgets to delete the backup after the migration successfully completes. In these examples, the flexibility that the cloud allows for creating new resources becomes a vector for data security problems.

Disparate, disconnected tool sets

One of the most common configuration missteps – especially at a time when more than 90 percent of organizations are adopting multi-cloud strategies – is reliance on disparate sets of poorly integrated tools for managing cloud data and workloads.

It’s understandable why teams end up in this boat. When you use multiple clouds, the default management strategy tends to be to use the management tools that come built into each cloud platform.

The problem, however, is that this leaves you in the position of trying to juggle multiple tools. Not only does this add complexity to your operations, but it leaves you unable to use any one tool to achieve holistic visibility into where your data lives or which vulnerabilities it may be subject to. The lack of a single pane of glass for monitoring all of your data makes it easy for overlooked security issues to remain hidden. (PS: Don’t think that sticking with a single-cloud architecture avoids this issue: Even if you’re running on a single cloud provider, seeing in one place where and what your data is, who has access to it and where it can flow is still an ongoing challenge.)

Manual data protection activities

Breaches often happen when organizations are overwhelmed. Security teams are understaffed, yet they feel pressure to move fast along with the rest of the business.

However, activities like discovering sensitive data, identifying where it can flow, and preparing for security reviews often include time-consuming steps that are performed by hand and inputted into spreadsheets.

Tools that can automate activities for security teams not only help to get more done with less, faster, but also help to reduce the errors and oversights that come with manual processes.

At the same time, the pace of change in the cloud is often hectic, given the complexity curve and the velocity with which new services, configurations and APIs evolve. All of this adds to the pressure to automate. Teams have no choice but to use automation to help them secure all of this change, continuously, at scale.

Mitigating cloud data misconfigurations

The key to avoiding the cloud data breaches that many companies face is to take steps to mitigate the misconfigurations that lead to those breaches. There are three key best practices toward this end.

Know the where, what and who of your data – at all times

First and foremost, you need to know in real-time where your data is stored, what data is sensitive, who has access to it, and how it can flow. After all, you can’t protect what you don’t know.

Critically, you must be able to identify this information not merely on an account-by-account basis, but across your entire organization (which can span thousands of accounts in the larger enterprises we work with). Likewise, you shouldn't have one level of visibility into data stored on AWS, while lacking the same level of insight into Google Cloud or elsewhere -- or, for that matter, into data that may exist in the SaaS platforms you use. You need an across-the-board understanding of the security posture of your data, at all times, as well as the ability to understand how data can move across different cloud native services, databases, storage buckets and other assets.

Manage data security policies as code, full cycle

The combined abilities to define data security policies as code, then monitor and enforce them automatically, are best coupled with the holistic visibility just described to solidify protection.

When you can write code to define rules such as how data should be structured, who should have access and how long data may be retained, and when you can apply those rules across your entire environment (even if it includes multiple environments or a hybrid cloud), you eliminate the need to juggle multiple cloud-specific data security assessment primitives. You also, of course, avoid having to try to identify policy violations manually.

It's worth noting, too, that the data security policies you write can’t be too generic. A simple policy like "data should not be exposed to public network" doesn't work in a cloud where some data needs to reside within a particular geographic boundary, but other workloads require public exposure -- or where some workloads (like an application release candidate) initially shouldn't be public-facing, but later will be. Instead, you need rich, context-aware policies that can distinguish between different types of data and different purposes, and that reflect different levels of data sensitivity.

Best practices dictate a just enough access (JEA) approach, meaning that users who have access to your cloud data only have access to the data that they specifically need access to, rather than simply having access to everything. This helps to reduce the blast radius if a breach does occur.

Continuously monitor for risks as they emerge

When you use code-based policies to audit data, it's not enough to run reports periodically. Even an hourly audit won't necessarily suffice to alert you to a major data leak, like a new S3 bucket that contains sensitive data and was inadvertently set up to allow public access. You need to be able to monitor continuously and receive high-risk alerts immediately.

Equally important is the ability to build remediation checks into your monitoring process. Real-time alerting on data security violations is of little use if your team doesn't respond to alerts quickly enough and fails to close the identified security gap. When sensitive data is at risk, your monitoring tools must be able to validate that your team mitigates the issue in as timely a fashion as the nature of the threat requires.

Your continuous monitoring should be extended to verify that you are backing up your data, too. In the new era of ransomware, you don’t want to find out your backup system failed only after you’ve been attacked.

In short, adopt a mentality that misconfigurations will happen. Deploy a data monitoring platform to be alerted of potential misconfigurations on a continuous basis. Identify data that is sensitive, personal or regulated and focus your alerting on that critical data, so as to minimize the alert noise.

Conclusion: Employ a data-centric cloud security approach

Small, simple mistakes can have major consequences. Look at all your cloud assets and data from the perspective of an outside attacker looking in. All it takes is one misconfiguration -- sometimes a simple one -- to expose a big volume of sensitive data and puts your organization at risk.

To put all of the above another way, you might say that the key to managing cloud data security risks is to take a data-centric approach. Data is what attackers are ultimately after. And in the world of the cloud, data requires its own specialized protection platform.

In a data-centric approach, you use a single integrated platform and policies to discover, classify, monitor and protect all of your data, regardless of what the data is or where it's stored in the cloud. A data-centric security model is the basis of Open Raven’s cloud-native data protection platform.

Reduce these causes of misconfigurations and you will reduce your likelihood of a cloud data breach. Automate cloud data protection and get proactive, continuous monitoring to protect your data against breaches.

Request a demo of Open Raven today.

Schedule a demo

Restore visibility and control of your public cloud data.