Research

Why Customization is Critical for Previewing Sensitive Data Findings

Jeff Binder
Security Researcher
December 20, 2022

What is a Sensitive Data Preview?

One of Open Raven’s goals is to help our users understand their data—where sensitive information is stored and who can access it. The scale of many cloud accounts makes it impossible to look through everything manually, which makes automation essential. But sometimes you also have to get into the details. Our Sensitive Data Preview feature helps security teams to view a narrow portion of each data finding directly in line with the results listed in the data catalog so that they can quickly triage the finding and determine a course of action. 

Open Raven – Built for Security while Respecting Data Privacy

Our scanner handles data previews in a different way that is easy to use, thorough, and respectful of privacy. When our scanner records a finding, it creates a partial preview snippet that removes enough of the matching string to maintain privacy. Users can customize per data class the scope and logic used to determine the data snippet returned in the preview.

The data snippet, and not the full finding, is stored in a database table that resides in our customer’s cloud account. The contents of this table are never transmitted to Open Raven’s space, thus providing a second layer of security around data findings.

This method makes the preview snippets easily accessible to authorized users, without requiring additional steps. The previews are available for all data types, for objects of any size, and for objects that have changed since the time of scanning, and they can be viewed immediately without having to wait for the software to retrieve data from the S3 buckets a second time.

Our Sensitive Data Preview feature makes it possible to get a clear sense of what the scanner is finding without exposing sensitive information. Users can customize the data preview generation method for each data class by selecting between three different options. This is useful for organizations with specialized types of data that raise unusual privacy concerns. For example, our data preview generation method for the US Social Security Number data class shows only the first three digits, not the characters that identify an individual. The remaining characters are removed before the finding is recorded, so the scanner never creates a persistent copy of the full finding. 

How AWS Macie Falls Short

Open Raven’s use of partial snippets differentiates us from how AWS Macie handles data previews. As detailed in the documentation, Macie shows up to 128 characters of the finding in full. This text is encrypted, but the information is still transmitted over the internet and displayed in the user interface. This means that extracting the findings increases the attack surface by creating one more avenue by which sensitive data could potentially leak.

In addition to scanning S3 buckets, Open Raven has native support for scanning RDS databases. Macie can only scan RDS data by dumping it into S3 buckets and then scanning them. This creates an additional copy of potentially sensitive data and limits the scan to the state of the database at the time of the dump. Open Raven, in contrast, can directly scan RDS instances in their current state, without creating a copy of the data.

Macie’s approach also requires a multi-step process, in which the software accesses the S3 objects a second time in order to retrieve the text. Because of this, Macie cannot retrieve findings if an object changes after it is scanned. In addition, Macie’s preview feature works only for objects of up to 10 MB and only for certain data types.

Our work on data previews is only one of the ways Open Raven helps our customers understand their data without increasing the attack surface. All aspects of our product follow a central principle: sensitive data never leaves our customers’ accounts.

Don't miss a post

Get stories about data and cloud security, straight to your inbox.

Ready to get started?
Schedule a demo.

Restore visibility and control of your public cloud data.