Discover and Classify Data

Automate Data Classification to Combat Secret Sprawl

Chief Corvus Officer
July 27, 2021

Automate data classification to restore visibility and control of sensitive data

Major breaches often have a common ground: exposure of sensitive data. In particular, we’ve seen headlines for such incidents, as with Solar Winds, involving exposed (and then stolen) developer secrets that allowed attackers to cause harm to both the organization and their customers. In another case, Eclipse had developer secrets exposed for 350 days, which required a costly investigation into over 100,000 files. That is not to say that these two organizations are alone, but far from it.

Over the last few years, exposure of developer secrets on Github tells the story of a growing problem rather than one of unique circumstances. Despite various code scanning tools, secret managers, and best practices, secrets get out. What else can be done to minimize such risks as well as streamline effective responses to incidents? First, the risk must be understood and prioritized, at the top.

Why prioritize securing developer secrets?

Take it from SolarWinds CEO, Kevin Thompson, who being interviewed about their now infamous breach said “code can be an open door to your enterprise.” Developer secrets can provide access to entire code repositories, networks and systems hosting additional sensitive information. While these secrets may not be a large percentage of overall sensitive data by volume, the risk they pose is among the most severe, to say the least. GitGuardian’s State of Secrets Sprawl - 2021 reported that in 2020, over 5000 secrets were found on open Github repositories per day, a 20% increase compared to 2019. Of those, 85% involved developers’ personal accounts and 15% on organizationally managed accounts. The data continues to point to a human problem, a commonality across industries.

"Human error is nothing you can avoid and prevent...add many layers to prevent it in your whole lifecycle." – David Dos Neves, Munich Re

Why are secrets so hard to secure?

Common practices in securing secrets include tactics like telling your team not to store secrets in code, using code scanning tools, secret managers and code reviews, but despite all these efforts, the bad guys seem to find these nuggets of gold with ease. So, why can’t we find and secure them just as easily?

It just takes one mistake. Frankly, it’s an unfair fight. Defenders must play perfect defense whereas we don’t hear about the 99% of failed attempts by bad actors. It just takes one lost secret to cause mayhem.

Secrets sprawl in the cloud. As with any major change, the advantages of moving to the cloud (faster growth and development) came with new problems requiring renewed approaches to security. It’s a daunting task to manage and locate secrets sprawled across the cloud, in various accounts, services, logs, snapshots, etc.

Humans aren’t perfect. Essentially, there’s one common factor: human error. An issue that cannot be prevented requires a solid and timely remediation strategy--so how do we prevent the issues from becoming problems?

Know your secrets, keep them safe

Fortunately, this is exactly why the Open Raven platform exists, to restore visibility and control of sensitive data to security and cloud teams. An automated data classification system that finds your secrets for comparison against relevant infrastructure security controls can provide the very backstop needed to catch mistakes. 

Automate your ‘secrets inventory’. You can’t protect what you don’t know. A data asset inventory (catalog) that’s automatically updated with each scan turns any question about secrets into a quick query rather than a lengthy investigation. Save time providing details for security audits, reviewing the status of your security posture, identifying where new regulatory controls must be implemented, or finding peace of mind after seeing yet another ‘secrets incident’ make headlines.

Prevent issues before they become problems. The great thing about automated data classification, is that we can then monitor for mismatches between your sensitive data and relevant security controls so issues can be quickly identified and remediated; data is the new endpoint. Identifying exposed secrets before the bad guys is a big step in the right (secure) direction.

Know when secrets have been breached. In the event of a breach, it’s critical for incident response teams to know what data has been involved so they can ensure an appropriate response. The blast radius of a ransomware attack that has stolen developer secrets is far greater than one that has not. Accelerate incident response and meet compliance obligations by providing your team with rich data context for accelerated assessment and prioritization.

Sounds easy, but is it?

Work is work, but actually, yes, it is. It just takes a few minutes to connect your AWS account(s) to Open Raven to begin mapping and scanning your environment. The platform was born in the cloud, for the cloud — no sidecar deployments or agents. With a read-only account, we use serverless functions and APIs to take your cloud data protection to the next level while keeping costs predictable. All of this adds up to a small amount of work to harden the security of your data and save time doing it.

For a more in-depth discussion about securing developer secrets, register for our webinar. Amanda Walker (VP of Engineering at Nuna, and Google DLP before that) and our very own Mike Andrews (Head of Engineering, previously at Microsoft) discuss the challenges in securing developer secrets in the cloud and how we’ve approached building a platform you can trust, use and afford.

Schedule a Demo

Read a Case Study: Sauce Labs

Don't miss a post

Get stories about data and cloud security, straight to your inbox.