We are pleased to push a release this week which includes a slick, streamlined AWS S3 scanning experience and a load of other improvements across the platform:
Our updated approach to data classification scan jobs provides an efficient way to configure discovery of sensitive data in your S3 buckets. This new interface, combined with our expansive predefined set of data classes, gives you the fastest way to start a scan and identify problems with minimal effort.
Navigate to Scans → Data Scan Jobs to find a new experience for creating data scan jobs.
Upon creating a new data scan job, you’ll be presented with a straightforward, single screen for setting up a scan. You can now specify:
Finally, select the S3 buckets you wish to analyze from the complete list already found by our discovery engine. You can filter the list by AWS account ID, AWS region, or by a few familiar S3 bucket security configurations:
You will also notice two bars at the top of the page which measure the S3 bucket volume to scan to make sure it is properly sized for successful completion.
Also included in this release:
We have continued our progress in supporting scanning of even larger Apache Parquet files, and very large files in general. Some of this work requires sophisticated techniques such breaking files into chunks (“file chunking”). Our co-founder and Chief Product Officer Mark Curphey writes extensively about this in a multi-part blog post here.
We understand that not all files in your S3 buckets will have file suffixes (e.g. .log, .parquet, or .txt) and in many cases, have no file extension at all. No problem. We have improved the scanning engine’s ability to determine file type with extended MIME type analysis, as well as using file extensions when they’re desired.
Finally, and just as important, a few incremental updates were made to two important developer credential data classes: the JDBC / ODBC database connector string and the Facebook API token. The database connector class now captures a larger set of strings, including those used for Redshift, MySQL, PostgreSQL, SQL Server, and many more. Our team is committed to ensuring the accuracy of these data classes and the changes here boost accuracy.
To stay on top of future releases and announcements, you can now subscribe to the Open Raven blog via an updated RSS feed, in addition to email subscription. To subscribe via email, visit the Open Raven blog and click “Subscribe to product release notes” to add your email address.
As a reminder, we would love to hear from you. Just drop us a line at email@example.com, or fill out the form.