Release Notes

Open Raven Platform Release: Streamlined AWS S3 Data Classification

Bele
Chief Corvus Officer
March 12, 2021

We are pleased to push a release this week which includes a slick, streamlined AWS S3 scanning experience and a load of other improvements across the platform:

  • Streamlined AWS S3 data classification scan experience, which expedites the setup of sensitive data scanning in S3 buckets
  • Support for one time scans
  • Extended Apache Parquet files capabilities and improved handling of very large files
  • Filtering by supported MIME types when building a data inventory
  • Updates to data classes: ODBC/JDBC database connector, Facebook API Tokens

Streamlined AWS S3 Data Classification Scanning

Our updated approach to data classification scan jobs provides an efficient way to configure discovery of sensitive data in your S3 buckets. This new interface, combined with our expansive predefined set of data classes, gives you the fastest way to start a scan and identify problems with minimal effort.

Data Scan Jobs with a table for name, description, schedule, restrictions, data to find, and status.

Navigate to Scans → Data Scan Jobs to find a new experience for creating data scan jobs.

Create new scan job panel with the details to add on the left and a table with S3 buckets to select on the right.

Upon creating a new data scan job, you’ll be presented with a straightforward, single screen for setting up a scan. You can now specify:

  • Scan schedule. Set to run once or repeat as often as every hour.
  • Data to find by data collection. Use any one of our pre-configured data collections or create your own data collection.
  • File types or names to ignore. The job scans for all of the supported file formats by default, but you can narrow this down however you would like.
  • Sample size. The job scans all files by default, but you can scan a subset if you prefer (e.g., for faster completion).

Finally, select the S3 buckets you wish to analyze from the complete list already found by our discovery engine. You can filter the list by AWS account ID, AWS region, or by a few familiar S3 bucket security configurations:

  • Public accessibility
  • Encryption status
  • Back-up status

You will also notice two bars at the top of the page which measure the S3 bucket volume to scan to make sure it is properly sized for successful completion.

Additional Improvements

Also included in this release:

  • Extended support for scanning Parquet files and improved handling of scanning very large files
  • Filtering by supported MIME types when building scan inventory
  • Updates to data classes: ODBC/JDBC database connector, Facebook API Tokens

We have continued our progress in supporting scanning of even larger Apache Parquet files, and very large files in general. Some of this work requires sophisticated techniques such breaking files into chunks (“file chunking”). Read the multi-part blog post here.

We understand that not all files in your S3 buckets will have file suffixes (e.g. .log, .parquet, or .txt) and in many cases, have no file extension at all. No problem. We have improved the scanning engine’s ability to determine file type with extended MIME type analysis, as well as using file extensions when they’re desired.

Finally, and just as important, a few incremental updates were made to two important developer credential data classes: the JDBC / ODBC database connector string and the Facebook API token. The database connector class now captures a larger set of strings, including those used for Redshift, MySQL, PostgreSQL, SQL Server, and many more. Our team is committed to ensuring the accuracy of these data classes and the changes here boost accuracy.

Don't miss a post

Get stories about data and cloud security, straight to your inbox.