Open Raven Platform Release: Data Catalog V2, Faster Data Scanning, Data Class Enhancements, And More Rules
It's officially summer, and our product and research teams turned up the heat in June. We published several releases full of new functionality and enhancements designed to help security teams pinpoint data security and compliance risk, apply data security guardrails, prevent incidents and streamline response.
The star of the show is even faster scanning with no size constraints for delivering results at scale. The supporting cast includes the next rev of our Data Catalog with filters that make it easy to track scan progress while we added more data security policies, rules and improved a number of data classes.
Data Catalog V2
We’ve updated the data catalog experience to make it easier to filter down into the data classification you need the most, whether it’s all the developer secrets in a region or German personal data seen anywhere. To achieve this, the Data Catalog now uses the same filtering system seen elsewhere in the platform that includes the ability to filter on accounts, regions, asset names, and more to view data findings.
Secondly, the filters now allow you to find assets by specific scan job, making it even easier to determine buckets that have been, or need to be, analyzed in your environment.
Faster Data Scanning - No size restrictions
The release marks serious changes in our core data scanning capabilities, starting with a dramatic speed improvement for file enumeration. Before files of any type can be opened for data classification, they first have to be listed or inventoried. When dealing with hundreds of millions or even billions of files, this enumeration phase alone can take many hours (or even days) to complete. To accelerate file listing and ultimately complete scans faster, we built a highly efficient file listing service that completes enumeration orders of magnitude faster than before. This means shorter scans and quicker data classification results for all scans, but especially for environments with vast numbers of objects.
You can now create scans for buckets of any size, making it easier to analyze large environments. Scans can still be bound by budget (maximum spend on serverless functions that power data classification), and a time limit is imposed so that scans have a default maximum duration of 10 days. If a scan is stopped after hitting the 10 day limit, it can be restarted, and it will only analyze objects that are new or changed. This allows even the largest environments to be thoroughly analyzed in a methodical fashion, with results available throughout the process in the data catalog and elsewhere.
New Data Classes & Rules
In June, we updated several existing data classes based on customer feedback and telemetry which we use to continuously evaluate our classification efficacy:
- Credit card regex patterns haven been updated with improved conformance to the credit card number format
- VIN (Vehicle Identification Number) data classes have been revamped to offer better awareness and conformance to the VIN standard (ISO 3779).
- TIN (Taxpayer Identification Numbers) data classes have been updated with entropy based validation functions to help reduce false positives by ensuring a minimum randomness to matches
The team also published 13 new and upgraded data classes used to identify:
- Vehicle Identification Numbers (VINs) for Asia
- Manufacturer-specific Vehicle Identification Numbers (mVINs) for Mazda, Ford and Toyota
- Country-specific Tax Identification Numbers (TINs), primarily for the EU region
- Country-specific National Identification Numbers (NINs)
- Credit card numbers
We also published 21 new rules, all of which fall under the category of best configuration practices for key AWS services. These rules check for the best security configurations against AWS Secrets Manager, Lambda, and EC2, as well as EBS and Cloudfront assets that are persisted during asset discovery in the Open Raven platform.
New policies just released are focused on best practices for PCI DSS and CIS Benchmark Controls as well as a set of data services configuration policies that cover RDS, RedShift, S3 and more.
Bugs & Enhancements
- The scan job completion rate now properly calculates children files contained within archives
- Users will receive emails when a scan times out, in addition to completion events
- Users can now select multiple items when filtering using the contains statement