Release Notes

Open Raven Platform Release: Expanding Unstructured Data Classification With Metadata Scanning

Hamilton Yang
Director, Product Management
March 10, 2023

In this edition of our monthly release notes blog, we're highlighting a significant new capability - Metadata Scanning. File metadata is data about the file itself, such as its size, type, creation date, and author and differs from the content inside the file. Metadata is stored in the file's header, helps identify the file and its properties, and is often used to help identify and organize files. Scanning and classifying metadata expands the power of our unstructured data analysis capabilities, allowing for quick, accurate classification of objects without opening them.

Metadata Scanning

A unique challenge with managing data risk in unstructured data is a potential data leak via a file's metadata. Metadata is often automatically generated by the application that produced the file. Common examples of automatically generated metadata include the author of a Microsoft Word or PDF document or the geographic coordinates indicating the location of a photograph. 

One example of automatically added metadata involves digital images. First is an example of the metadata added by a smartphone when taking a photograph.

Another example is the metadata added by Microsoft Word.

Metadata scanning also offers efficiency benefits when profiling large data stores. For example, users can quickly identify data stores that contain CAD files by scanning for files with a .cad extension. File extensions are considered metadata. Another example involves document management systems that automatically apply metadata such as sensitivity or restricted use levels (e.g., red, yellow, green). Scanning metadata for this information helps SOC analysts and security engineers know where to focus deeper data classification scans.

To start scanning for metadata, use one of the available default metadata classes, or set up a custom metadata data class. 

Creating a custom data class that uses metadata classification. 

Custom metadata data classes can be either part of an existing custom data class or set up as a standalone, metadata-only data class. Because metadata field names differ within file types, when defining a metadata data class, users need to define the file name pattern, such as the file extension, and then at least one condition, such as the name of the metadata field or a match pattern, and a data preview generation method. Once sensitive metadata has been found, results can be reviewed in the Data Catalog and rules can be applied, same as for other data types.

Thanks for reading this month's installment of our release notes series. Have questions? Drop us a line at hello@openraven.com. Or, schedule a live demo to see Metadata Scanning in action.

Don't miss a post

Get stories about data and cloud security, straight to your inbox.