Discover and Classify Data

Safe and Private Cloud Data Classification Without Backhauling

Michael Ness
Security Researcher
October 19, 2022

Finding and classifying cloud data at scale requires performing complex analysis on terabytes or even petabytes of information. Backhauling that data to a common hub or cloud service that provides a centralized home for classification and analytics creates significant security risks. Open Raven's unique serverless architecture allows deep analysis and accurate classification of cloud data where it lies without opening holes for potential attackers to exploit.

Backhauling Data Introduces Risk

In networking, the term "backhaul" refers to the link in the network that transmits data to the network core or backbone from smaller networks at the edge. When we talk about backhauling data for classification, we're discussing something similar: the practice of moving data from the customer's cloud repository to a core compute hub outside of the customer's cloud for analysis.

Backhauling data requires creating an entry point to the data to allow the service to access repositories to support the classification process. Opening up access to data is always a risky process, and in doing so, many things can go wrong. For data to leave the confines of your environment for classification, you will need to ensure appropriate limitations are in place for specific types of data that may be too sensitive for external access. These limitations may derive from regulations like GDPR and CCPA or internal security policy requirements. Regardless, ensuring your most sensitive data stays safe is not an easy task, especially when you may not know where it all resides in the first place.

Finally, remember that any time another entity has access to your data, you inherit all of its risks and expand your attack surface. If the service provider's environment is breached, then potentially, so is your data. Backhauling your cloud data means you will need to spend time and effort carefully understanding the third-party risk the provider brings to your doorstep, including vulnerabilities and misconfigurations, which can increase the risk to your data.

Open Raven Keeps Data Safe, and Private

The Open Raven Data Security Platform solves these issues by avoiding data backhauling altogether. We safely bring the compute power and analysis logic to the data instead of the other way around by deploying serverless functions inside the customer's environment. We assume a role created by the customer with read-only permissions, which executes the scanner code and only sends back metadata. Another benefit of using serverless architecture is that Open Raven can deploy exactly the right amount of compute for the task, e.g., massively parallel for baseline analysis and lightning quick for incremental. 


When it comes to protecting data at scale, backhauling potentially sensitive data introduces as much, if not more, risk than it resolves through potential unintended access and third-party risk. Open Raven avoids these issues through serverless architecture, where classification occurs within your environment, and only metadata leaves. You can learn more about how Open Raven uses modern cloud architecture principles to find and secure sensitive data in our technical blog Finding and Determining What Data To Classify. While you're at it, you might like to check out the complete series, Designing and Building Data Classification Systems for Security and Privacy to see how Open Raven was built from the ground up to help organizations tackle their toughest cloud data challenges.

Don't miss a post

Get stories about data and cloud security, straight to your inbox.