arrows & pagination
Easy setup, quick results
Open and customizable
Safety for data & budgets
Data Catalog V2, Faster Data Scanning, Data Class Enhancements, And More Rules
It's officially summer, and our product and research teams turned up the heat in June. We published several releases full of new functionality and enhancements designed to help security teams pinpoint data security and compliance risk, apply data security guardrails, prevent incidents and streamline response.
The star of the show is even faster scanning with no size constraints for delivering results at scale. The supporting cast includes the next rev of our Data Catalog with filters that make it easy to track scan progress while we added more data security policies, rules and improved a number of data classes.
Data Catalog V2
We’ve updated the data catalog experience to make it easier to filter down into the data classification you need the most, whether it’s all the developer secrets in a region or German personal data seen anywhere. To achieve this, the Data Catalog now uses the same filtering system seen elsewhere in the platform that includes the ability to filter on accounts, regions, asset names, and more to view data findings.
Secondly, the filters now allow you to find assets by specific scan job, making it even easier to determine buckets that have been, or need to be, analyzed in your environment.
Faster Data Scanning - No Size Restrictions
The release marks serious changes in our core data scanning capabilities, starting with a dramatic speed improvement for file enumeration. Before files of any type can be opened for data classification, they first have to be listed or inventoried. When dealing with hundreds of millions or even billions of files, this enumeration phase alone can take many hours (or even days) to complete. To accelerate file listing and ultimately complete scans faster, we built a highly efficient file listing service that completes enumeration orders of magnitude faster than before. This means shorter scans and quicker data classification results for all scans, but especially for environments with vast numbers of objects.
You can now create scans for buckets of any size, making it easier to analyze large environments. Scans can still be bound by budget (maximum spend on serverless functions that power data classification), and a time limit is imposed so that scans have a default maximum duration of 10 days. If a scan is stopped after hitting the 10 day limit, it can be restarted, and it will only analyze objects that are new or changed. This allows even the largest environments to be thoroughly analyzed in a methodical fashion, with results available throughout the process in the data catalog and elsewhere.
New Data Classes & Rules
In June, we updated several existing data classes based on customer feedback and telemetry which we use to continuously evaluate our classification efficacy:
- Credit card regex patterns haven been updated with improved conformance to the credit card number format.
- VIN (Vehicle Identification Number) data classes have been revamped to offer better awareness and conformance to the VIN standard (ISO 3779).
- TIN (Taxpayer Identification Numbers) data classes have been updated with entropy based validation functions to help reduce false positives by ensuring a minimum randomness to matches.
The team also published 13 new and upgraded data classes used to identify:
- Vehicle Identification Numbers (VINs) for Asia
- Manufacturer-specific Vehicle Identification Numbers (mVINs) for Mazda, Ford and Toyota
- Country-specific Tax Identification Numbers (TINs), primarily for the EU region
- Country-specific National Identification Numbers (NINs)
- Credit card numbers
We also published 21 new rules, all of which fall under the category of best configuration practices for key AWS services. These rules check for the best security configurations against AWS Secrets Manager, Lambda, and EC2, as well as EBS and Cloudfront assets that are persisted during asset discovery in the Open Raven platform.
New policies just released are focused on best practices for PCI DSS and CIS Benchmark Controls as well as a set of data services configuration policies that cover RDS, RedShift, S3 and more.
Bugs & Enhancements
- The scan job completion rate now properly calculates children files contained within archives
- Users will receive emails when a scan times out, in addition to completion events
- Users can now select multiple items when filtering using the contains statement
Granular Scan Control, Improved (and Expanded) Data Classes, Alongside a Ton of New Rules
This month, team Open Raven continued to expand the power and capabilities of the platform. Our latest release contains enhancements to existing functionality and several new capabilities. These include incremental scanning, scan budgeting, custom tags for Lambdas, improved data class accuracy, and a ton of new rules covering everything from data stores to compliance frameworks, IAM, and ransomware protection. Here are the highlights.
A unique aspect of Open Raven’s architecture is that we analyze data where it resides using serverless functions (e.g., AWS Lambda) within a customer’s own account-- no data is ever moved or accessed by the Open Raven platform itself. This not only is better for safety and privacy, but also delivers on a cost/performance profile that alternative approaches cannot match. Since the functions run within the customer’s own account, the associated costs for execution must be both transparent and predictable. No matter how inexpensive we’ve made data scanning, in the short history of cloud computing, no one has ever had a “good” surprise on their AWS bill.
In this spirit, we’ve released a number of things to drive down costs, improve transparency and prevent unhappy surprises. First, we’ve recently released a number of performance improvements along with incremental scanning so that only new objects are scanned. And we’re not even close to finished; more to follow in June. Second, we tag the functions running Open Raven data scanning so that it’s clear what they’re doing and as of this month you can now request a custom tag for Lambdas as well. Third, we’ve released a new feature that establishes a maximum “budget” for data scanning.
The budget feature works by setting a limit inside the scan UI. It’s a slider that looks like this:
Scans that have a defined max cost will be canceled once the budget is reached using an average of Lambda pricing across AWS regions. If the max budget limit is reached and the scan is canceled, all scan results up to the point at which the scan was canceled will be available in the Data Catalog and elsewhere. Note that the final cost will be within 10% (+/-) the set amount so if you’re extra cautious, set it under the actual threshold you’d like.
Data classes are the key means of classifying customer data at scale into categories such as country-specific passports and phone numbers, API keys, credit card numbers, and more. They are created with scalability and expandability in mind to serve our petabyte-scale classification loads. There is a continuous two-fold development effort in tuning the existing data classes used and extending to new ones.
We are primarily focusing on tuning existing data classes in order to further improve accuracy (e.g., avoid unintended matches). We developed new methodologies to reduce these false positives, and earlier this month, we released 38 upgraded data classes used to identify:
- Country-specific National Identification Numbers (NINs)
- Country-specific Vehicle Identification Numbers(VINs)
- Country-specific Tax Identification Numbers(TINs)
- Country-specific Driver License Numbers(DLNs)
- Country-specific Phone Numbers(PNs)
- Provider-specific Developer Secrets and API Keys
These changes come as improvements based on customer feedback and telemetry which we use to continuously evaluate our classification efficacy.
Policies & Rules
Last August, we released Magpie, an open-source CSPM available on GitHub. This month, we enhanced the capabilities of the Open Raven Data Security Platform to discover data-related resource configurations and assess their security posture by integrating Magpie discovery capabilities and rules into the platform.
This integration introduces to the platform more than 260 new rules governing AWS Aurora, DynamoDB, RDS, Redshift, and S3; compliance with CIS AWS Foundations Benchmarks and PCI-DSS; IAM best practices; and includes additional rules for protecting against and recovering from ransomware attacks.
Bugs & Enhancements
- TSV files are now parsed similarly to CSV files instead of as raw text
- Auto-sort arranges all Data Catalog views from highest to lowest counts
- The Data Catalog now correctly shows only the findings from the data classes selected instead of all data classes
- In Maps, the backup icon is now displaying correctly for backed up assets in map view
- The AWS Account Name field in Data Scan Jobs is correctly filled with AWS Account ID numbers
- Fixed the Account/Settings Modal so that when hovering over the OR Avatar it aligns for a user to be able to select something from the menu
It’s all about scaling — core scanning engine, map, data catalog, across the UI, discovery, policies, and rules
This month’s theme is all about scaling — in the core scanning engine, map, data catalog, and across the UI in general. Beyond making the product faster, easier, and all-around better for large environments, we are also wrapping up the journey to make our main open-source project Magpie, a pipeline into the platform for new discovery features, rules, and policies.
The close of April brings differential scanning into the platform as a complement to the exhaustive baseline scans we have done to date. Scans will now check if each target object has been scanned using the data collection in the past and skip over that object if so. If the object has been updated, then it will be rescanned. Depending on the objects being scanned, this can immensely reduce the duration of the analysis as well as the associated Lambda costs.
Sampling & Scan Limits
Sampling a percentage of a bucket now reduces the amount of data and objects checked against scan limits such that even extremely large buckets can now be analyzed. Coupled with incremental scanning, this allows for eventual, full analysis of any sized bucket over time.
Reliability & Performance
Over the course of this month, we’ve made a bunch of minor “under the hood” improvements to accelerate scans and boost their reliability.
Visualizing thousands of cloud assets can be powerful — or overwhelming. With this release, we’ve improved the logic for how large environments are laid out within the Map for easier viewing. The default zoom level limits asset display, the assets are now much closer, and lines between them are as short as possible.
Open Source: Discovery, Policies & Rules
Our main open source project Magpie is now a functional pipeline for new discovery capabilities, policies, and rules. The benefits are most obvious for our rules and policies, which over this week and next will expand dramatically to include what’s visible in GitHub now here. On a quick glance, you’ll see many rules covering important data store configuration problems, as well as an early look at some Google Cloud Platform (GCP) rules. The GCP discovery and rule capabilities shown within the repo aren’t making the transition to the platform. Yet.
Bugs / Enhancements / Tweaks
- Sorting a column now defaults to descending first instead of ascending.
- Scanner settings are now found in the Settings area of the UI, along with everything else.
- Scheduled policies will now correctly start at their specified time instead of starting at the previously scheduled time.
- Toggle switches in the product now turn to green when enabled instead of black.
- A slight change in colors throughout the product. For example, blue to dark blue.
- The asset list now prioritizes the object name when rendering columns instead of partially hiding them.
- Fixed a bug that removed violation icons from not showing in the asset list.
- Tag tooltips in the asset list have been re-enabled.
- Tooltips with many values no longer run below the page.
- Tooltips size on the Data Scan Job page are now consistent.
- The default value for the DMAP Scan Queue setting now correctly states “Select”.
- Pending scans now correctly have the options settings available.
- The Ignored Violations tab in Policy Violations no longer cuts off text early.
- Data Catalog values now take advantage of the entire screen space.
- Data Catalog export now includes the Account ID field.
- Filtering on data classes that have no results now correctly filters out all results.
Redesigned Asset List & Data Scanning Performance Boost
Team Open Raven has been hard at work in 2022 on several significant platform enhancements focused on improving performance, efficiency, and usability. Here are the highlights:
- A fully redesigned Asset List with additional filters and controls
- Significant enhancements to scan efficiency, robustness, accuracy, and control
Asset List Redesign
We recently launched a redesigned Asset List that offers a more powerful way to filter and search cloud resources while also visualizing essential configuration details at a glance. Users can search for individual resources by entering an asset name or ID in the search bar. This is particularly useful when you have many assets of a given type.
New filtering controls enable you to select assets based on one or more descriptors, refine results using Boolean expressions, and take actions on multiple assets simultaneously.
Filters can be saved as an Asset Group with customized names for frequently grouped resources. For example, you can create a filter for AWS S3 buckets not part of an AWS Backup plan. The first action available is the ability to select multiple assets in an Asset Group and add them to an AWS Backup plan. Additional actions will be available in upcoming releases.
Leap Forward: Scan Efficiency, Robustness, and Control
Streamlined Data Classification
Scanning is at the core of our platform's data discovery and classification for data at rest. Scan optimization is an obsession for us: reducing the time needed, lowering the cost, improving accuracy, etc. We just wrapped up an important initiative that dramatically boosts the performance of data classification. In this release, we changed the core engine to actively update and process data class logic during a scan, eliminating the need for multiple classification passes over an object's data. This reduces scan times by up to 6x based on internal testing and cuts the associated Lambda costs since fewer are needed to do the same work.
More Control Over Scanning
We added two improvements to understanding and controlling scans as part of this release cycle. First, we've updated Data Scan Jobs with filtering similar to the redesigned Asset List, making it easier to view and create scans. Second, we've added the ability to cancel a scan that's in progress.
Mark Data Scan Findings as False Positives
With this feature, users can mark objects or data class findings either temporarily as false positives (i.e., data in the file was over-matched but generally correct) or "always a false positive" (i.e., will never have data of that finding's data class). When marking something as a false positive, it not only updates your Open Raven platform but also provides anonymized telemetry back to Open Raven to improve data class accuracy for the future.
Stay Engaged and In Touch
Subscribe to our blog or RSS feed for news on the latest releases and announcements. As a reminder, we would love to hear from you. Just drop us a line at firstname.lastname@example.org.
Improved Scanning, Maps, and a Sneak Peek at JIRA Integration
In November, we released significant back-end improvements resulting in faster scans, improved scan efficacy, and faster map load times. Of course, a big thanks to our customers for providing valuable feedback and participating in our current Beta programs. The first is for an upcoming Auto-scan feature, and the second is a new integration with JIRA. Read below for highlights of recent updates and progress toward future releases.
Data Scanning - Faster, More Accurate, More Affordable
The changes that have improved data scan times span the entire classification engine. The result of these back-end improvements is that customers can scan larger data sets faster, with better results, and with as much as a 30% reduction in the cost of the AWS Lambda functions used to run the scans. Click here to read more about classification techniques.
Improved Maps Load Times
We utilized ongoing advancements in the underlying 3D Maps architecture to reduce loading times for larger AWS Organizations from minutes to seconds.
Beta Program Updates
Auto Scan, scheduled for release in early 2022, reduces the time it takes to see valuable data classification results when the complexity and scale of an AWS organization are overwhelming. This feature suggests data scanning configurations automatically, provided as three options: Fast, Faster, and Fastest. If you, like others, wonder, "Where should I start?" don't worry. We have you covered.
A new JIRA integration, also slated for early 2022, adds another valuable option to fit into existing workflows. Before, if an issue required the attention of a DevOps Team using JIRA, our customers manually created JIRA Issues and used the "copy violation URL" action (among other exports) to share critical details, all gathered from Violations. This integration will add a 1-click escalation to "Create a JIRA Issue," saving valuable time. JIRA Issues populate with all the rich data context collected during asset mapping and data scanning—even down to censored previews of data findings. This integration streamlines response times to detected and escalated issues by taking the manual effort out of ticket creation, escalation, and updates.
If you're interested in participating in these or any future Beta releases, ask the Open Raven Service Team for access or contact your Account Representative.
We're very excited about our upcoming releases and look forward to sharing them with you.
Until that time, register for our upcoming webinar, a discussion led by customer and Apiture CISO, Sean Darragh, and Open Raven Co-Founder and CEO, Dave Cole.
Auto-Scan, Speed & Scale
Getting Started Just Got Easier with Auto-Scan (*beta)
Connecting an AWS organization to Open Raven for asset discovery and mapping is simple enough, but the question “what should we scan first?” can be hard to answer when looking across many accounts and regions. The new Auto-Scan feature offers 3 default options to make it easier to determine where to scan first, from a quick scan that will complete in a few hours to a more comprehensive scan that will complete over the course of a few days. Behind the scenes, Auto-Scan evaluates which of your data stores are most likely to contain sensitive data, and will begin data scans with selected, high-priority buckets so that the most important results come as quickly as possible.
Scan more data, faster
Given the sheer amount of data in a modern enterprise, scanning efficacy and speed will always be a top priority for us. In this release, we dug deep into our scan infrastructure to improve how we enumerate S3 objects. It worked. This change has increased our speed to start scanning objects up to 4x faster. You should now see scans of larger data sets complete faster and with greater accuracy.
Are we there yet? Maps done faster
In a related speed improvement, we also enhanced Maps so environments with large numbers of resources and objects populate faster. Customers approaching the one-billion object marker often had load time and performance issues that smaller environments did not. Again, we made some ground-breaking moves in the underlying technology used for Maps by replacing traditional mesh instances with thin instances. This change resulted in a significant performance increase since thin instances do not create new objects in the scene and therefore do not incur compute overhead. Whether or not that detail matters to you, the result is significantly faster loading times while also resolving a handful of related UI issues.
Few things matter more than speed and scale when it comes to securing modern data in the cloud. This release focuses on both making it faster to configure a scan and significantly boosts the performance of any data scan, allowing Open Raven to scale faster and farther than before. Large environments understandably result in Open Raven drawing large Maps; this release also improves load times for the Maps view allowing faster visualization and filtering of results.
Dashboards and Search with Integrated Splunk
Splunk is one of the world's leading SIEMs, and for good reason. It utilizes a powerful search processing language (SPL) for security teams to search, report, and analyze log events across countless services, at scale. Until today, security teams would need to go to other tools, services or even interview service owners to gather critical data context. By integrating Splunk's powerful search and analytics function directly into Open Raven's modern data security platform, security teams can now include various security controls and data types with their search, dashboards and analytics for more valuable insights and quicker prioritization.
Full Asset Listing, Search and Reports
We've repeatedly heard the need to fully understand which data exists and where it is located. Why? Well, there’s no reason to keep data that no one’s using — an easy path to risk reduction. For data that is being used, how is it protected? If there is unusual behavior, is it with a system that has access to sensitive data? Such answers are important to properly assess and prioritize attention.
Asset listing in Splunk allows you to analyze the AWS asset discoveries made by Open Raven. Easily apply free-form searches to investigate assets using SPL and eventually convert the results into charts.
In addition to this, you can leverage the standard Splunk export, generating CSV, JSON, and XML files containing asset configuration details, data classification findings, and more. Plug this export into your own workflow, or use Splunk directly within Open Raven for flexible analytics.
Open Raven provides default dashboards to describe security misconfigurations, summarize policy violations, or highlight sensitive data in your environment. Our team is also happy to work with you to provide custom dashboards based on your data security initiatives.
You can now access Splunk search and dashboard capabilities by navigating to “Analytics” in the main menu.
Interested in fitting this new data into your existing workflows? No problem. Open Raven provides a number of easy-to-use integrations like AWS EventBridge that integrate directly into your SIEM or custom workflow.
A Data-Smart Map of Your Cloud
Earlier this summer, we released a first-of-its-kind, automated data catalog built specifically for security and cloud teams. This week, we are excited to announce that the details from the data catalog are now visible directly within Open Raven’s Maps view.
Maps help you quickly and easily view your cloud resources and find answers to common, but historically tough, questions about key security controls: VPC access, security group access, encryption status, etc. Today, we’ve made this map data smart. You can now browse S3 buckets and see summaries of the sensitive data types found within each, with quick links into the data catalog for more granular detail.
Note Check out all Maps has to offer in our previous post, “Navigate Leaky S3 Buckets with Maps”.]
Bringing data intelligence into an easy-to-understand, visual experience with detailed asset properties enables teams to quickly find answers about their data sans all the spreadsheets and interviews (to which teams have grown accustomed). For example, in just a few clicks, easily locate:
- All publicly accessible S3 buckets with personal data
- All data stores with financial data
- All data stores with EU personal data
- Region-specific stores with developer secrets
- Healthcare data not encrypted at rest
...the list goes on.
To view the findings of a data classification scan, navigate to the map on the left-hand menu, then select the “Data” layer to toggle S3 buckets with “Data Classes Found”. Then click on the S3 bucket marked with the “Data” icon. You'll see a details panel with a list of data classes found.
In the details panel, you will see a list of top five data classes found, as well as a link to view more detail in the Data Catalog. Let’s say you want to investigate the locations of developer secrets, specifically, AWS keys. Use the data filters in the map, select “Developer Secrets”, then click on the highlighted buckets for more information.
As you explore, you can identify specific objects, and view as far as a preview of individual redacted data findings. This preview gives you a sense of the data found, where the finding is located within the file, and any keywords that triggered the match. For further investigation or action, click the direct link into the AWS Console next to the object in question (assuming those permissions are granted).
And because you just might need to put this data into some workflow or share findings with others, you can easily export the findings to CSV. We hope this will help security teams better understand their risk and, if necessary, prioritize actions for remediation.
Another big thanks to our customers for helping shape the first data platform purpose-built for security and cloud teams - keep the feedback coming! Stay tuned for our next update, where we’ll have both new additions and improvements: new search, updated scanning and classification, to name a few.
One-Click Preview of Data Findings
In our last post, we introduced the first ever data catalog for security teams. Today, we're excited to release a feature that came directly from you, our customers, that further improves use of the data catalog by adding a preview of data findings directly within Open Raven. With this addition, security teams can save even more time identifying and protecting sensitive data.
The first release of the catalog showed how many instances of specific data classes were found in each asset type. To investigate further, users would need to directly access the data asset, then manually look through it. Getting access isn’t always quick or easy, but sifting through potentially large JSON or parquet files is a tall order. You spoke, we listened. This release puts users a click away from a preview that shows the locations of the findings within each asset.
How it works
Open Raven’s data preview feature allows you to see the instances of each data class directly in-line with the findings listed in the data catalog. Navigate to the data catalog and click through until you reach a specific object. Once located, you’ll be able to click an available row for each entry in the ‘data class’ column to open an object details panel. Within this panel, you’ll be able to preview the findings with the following details: instances of sensitive data (appropriately redacted), relevant keywords in proximity of the match, and (when applicable) the location: line or row number, page number, etc.
The side panel organizes data findings by data class. Each data class can be expanded to focus on a specific format. And because you just might need to put this data into some workflow or share findings with others, you can easily export the results to a CSV.
You speak, we listen
Thanks again to all our customers and partners for continuing to provide valuable feedback as we build out the platform.
Until then, come join our discussion next week: Keeping Developer Secrets, Secret
A New Data Catalog
The first data catalog for security teams
We’re excited to announce the first data catalog for security teams, automating the generation of cloud data asset inventories to save time and enrich information security solutions.
Data teams have catalogs to provide a unified, starting point for extracting useful data insights about their business. Shouldn’t security teams have the same for insights into securing the business? We agree. Without such information, teams designing and implementing information security solutions for threats like ransomware are left assuming where sensitive data is, rather than knowing. Knowing how much of what types of data you have, and where, impacts almost everything from preventative measures to incident response and recovery plans. In addition, such information is useful for streamlining efforts in data governance and compliance.
Whether a cursory look, one-time investigation, first steps toward compliance regulations, or applying for a cyber insurance policy, the data catalog makes answering historically tough questions about your data, fast and easy.
How does it work?
The data catalog is automatically updated after each scan is run. Scan schedules can be configured with various filters to better prioritize the frequency at which data inventories are updated--by depth, file type, location, account, data class, etc. The main page of the data catalog provides an overview by data collection, showing the total record and relevant storage bucket count for each. Quick filters are available to view by region, specific account, or data class. Click on an individual data collection to see details further segmented by data class.
Selecting “all” or a specific data class provides the list of relevant S3 buckets, the account ID, bucket size, and the number of records discovered within each. Deep links directly to the AWS console allow 1-click access to further investigate or take action.
Select a storage bucket to view a per object breakdown, again with a deep link directly into the AWS console from within our portal.
If we go back and switch to the “Regions” view, we'll find that the same data is broken down per AWS region, with filters to view region or account specific details. It's no surprise that engineering teams may use common regions like us-east-* and us-west-*, but with contractors, third-party engagements, and other external work your organization may be doing, sensitive data may end up in regions that are otherwise rarely used.
Having access to an automated data asset inventory allows security teams to spend less time discovering where work needs to be done, and more time doing it.
Key Use Cases
Streamline regulatory scope management
- With a readily available data asset inventory, security teams can quickly assess and reduce regulatory scope by seeing opportunities for consolidation, or simply identifying data residing in inappropriate places.
- Rather than spending time manually investigating where new security controls must be applied, teams can move quickly to implementation planning.
Automate cyber security insurance application and renewal requests
- The onslaught of ransomware has shifted insurance providers to requiring more information from organizations to better identify and limit risk, like a full data asset inventory. Such information is now readily available and automatically updated with each scan.
Eliminate mistakes during mergers & acquisitions
- The effort to onboard an organization is massive, and that includes how to arrange and move information systems and the data therein. Often, this can result in a further reduction in data visibility, and a ton of time. An up-to-date view of what each side has streamlines identifying what should be moved, and to where. The catalog serves as a single, unified view of data across the entire cloud estate.
Use Case: Inventory Cloud Data and Assets
Learn about classification’s role in security
Policy Violations When Data Is the Endpoint
Historically, security teams have primarily relied on alerts from their infrastructure and endpoints to identify potential areas of data exposure. The downside of this infrastructure-centric approach is that it assumes knowledge of the data present on the system－an increasingly dangerous and often inaccurate assumption as data reaches massive scale and sprawl. As a result, when an organization becomes data-mature, relying on infrastructure originated alerts rapidly starts to have negative implications for their security posture. Ransomware has brought this problem into clear focus: in the event of an incident, you need to know exactly what data is involved as it has a profound impact on how you respond.
Solving this increasingly common security problem, Open Raven automatically discovers exposed data across your AWS estate by monitoring and alerting for mismatches between your data and infrastructure configurations.
In this post, we’ll describe how Open Raven pinpoints and helps you manage these data-driven alerts in our new “Violations” UX.
Data Exposure Alerting
Managing cloud risk without any data context leaves only vulnerability severity and threat modeling to drive risk. This approach often leaves you missing some problems and other times chasing false alarms. For instance, an S3 bucket open to the Internet is either expected if it holds marketing collateral for the company website… or a drop-everything problem if instead it houses a trove of unencrypted customer data. The data context that Open Raven uses to assess these critical data security criteria allows stretched security teams to do more in less time.
In the new Violations page, Open Raven provides a manageable and prioritized list of data and infrastructure misconfigurations, and includes details often missed by other security tools. With quick access to assets in the AWS Console and straightforward integrations into your workflow, you will now be able to remediate urgent problems faster than ever before.
How does it work?
After selecting or creating a policy, Open Raven monitors your AWS estate and builds a list of alerts with rich, data-driven context to help you focus on what matters most. The Violations page shows you a list of problematic assets prioritized by severity. In a single glance, you can focus on specific accounts or across multiple regions. Summary points like sensitive data findings and the violation scope help identify misconfiguration and drive next steps quickly.
Click on any violation row alerts to view a list of assets and details that explain what needs attention. A direct link to the AWS Console makes navigating to the asset in question straightforward.
In the screenshot below, you’ll see an S3 bucket that has been previously scanned against the out-of-the-box policy named “Open Raven Security Policy.” One of the rules in that policy, “No unencrypted Personal Data,” was triggered, generating a “High Risk Violation.” This rule monitors for the presence of sensitive personal information (e.g. US Social Security numbers, credit card numbers, bank account numbers, etc.) in S3 buckets.
It appears that the “benchmarking-bucket-1” S3 bucket is in violation. The alert details show that the bucket contains US Social Security numbers and sensitive personal data belonging to individuals based in the EU. You will also notice that this bucket is located in the us-west-2 AWS region, which implies a possible compliance breach of the EU’s GDPR. This is extremely problematic as compliance violations like these may be fined on a per-record basis.
Clicking on the asset row (the S3 bucket) reveals a details panel that contains the list of objects in violation of the rule. The list here makes it easy to prioritize remediation depending on the data or misconfiguration findings. Again, a direct AWS Console link to the object is provided for immediate action or further investigation. This makes it even easier to secure your assets and reach your compliance goals.
Furthermore, you can track the workflow state using Violation Status. Open Raven sets violations to “Open” by default and closes them automatically when the issues are resolved in a subsequent policy evaluation. If a violation status is manually set to “Closed,” Open Raven will reevaluate the policy to confirm that the remediation actually occurred and the issue was solved. In instances where you believe a violation was raised by mistake, you can mark it as “False Positive” (we promise not to spam you with these－we hate false positives too).
Closing the loop
We know every organization has its own workflow when it comes to taking action on alerts. As a result, we’ve designed Open Raven to be a highly flexible security solution.
If you prefer to move quickly and via the AWS Console, our violation events have deep-linking to the affected asset to make manual resolution as quick as possible. If you use Slack or email alerting, Open Raven has several built-in integrations for popular services. And if you have a custom or automated workflow, webhook and AWS Eventbridge integrations are configurable within the UI so that alerts can be sent wherever you need them most.
Yes, security can be this simple!
Thanks for reading
We appreciate all you do to make our platform and posts like these more useful. We welcome and encourage your feedback: email@example.com.
For more information:
Classification and Account Management
We are excited to share several updates to the Open Raven platform which include support for 23 additional, default data classes and various improvements across the platform to make onboarding easier and lay the groundwork for bigger updates ahead.
Classify More in S3
A new set of data classes have been added for your immediate use in S3 data classification. These include country-specific bank account numbers, personal data classes used in Canada, Brazil, and Australia, and generic sensitive data classes commonly associated with individuals or assets (e.g. GPS coordinates, email addresses, and URLs).
Here is the full list of new data classes available on the Open Raven platform:
Bank account numbers
Canadian specific classes
- Permanent residence number
- Social insurance number
- Passport number
- Phone number
- Driver’s license identification number
- Personal Health Number
Brazilian specific classes
- Taxpayer identification number
- National ID Number
- Phone number
Australian specific classes
- Taxpayer identification number
- Driver’s license identification number
- GPS Coordinates
- Vehicle identification number (VIN)
- MAC Address (local and universal)
- DNS Name
- Email Address
- IP Address (IPv4 and IPv6)
Additionally, our team has added data classification support for Excel spreadsheets, or those files with an .xls or .xlsx extension. There is no action needed to start scanning Excel files - Open Raven will automatically recognize these file formats today. If you prefer to ignore Excel files, or any other file types, you can easily remove them in the “Scan Options” section when creating or editing your scan jobs.
Connecting Accounts Just Got Easier
We recently wrote about how you can use Terraform to set up Open Raven and AWS discovery in a previous blog post, and our work continues with pushing the AWS Accounts experience forward. Now you’ll find that discovering AWS accounts and organizations can also be done in the AWS Console, either manually or via CloudFormation. Your discovered accounts now appear in a new table, too. Our new table design displays both individual accounts and AWS Organizations in one place and in a more user-friendly manner.
Even more has gone into improving the Accounts workflow including backend operational improvements and better status reporting. Check it out by adding a single account, or discover your entire AWS organization!
Over the next few weeks, we will continue to improve the AWS Accounts page, and have some major updates that continue making the answers to critical data security questions like “What kind of sensitive data do I have?”, “Where is that sensitive data?”, and “Where should it be?” easier and more accessible.
Expect to see our new Data Catalog, which lists sensitive data inventory based on data classification jobs, and a new Policy Violations interface that lists data and configuration violations. These updates are a testament to our mission in helping security and cloud teams know their data and keep it secured without breaking the bank.
Did You Know?
It takes minutes to connect Open Raven to your AWS accounts and run a data classification job across the 190+ out-of-the-box data classes in your S3 buckets. Check out how our customers are using the platform:
- Find and eliminate toxic data from logs (More info)
- Prevent Financial Data Exposure (More info)
- Audit and inventory sensitive data in S3 buckets (More info)
Reach out today and take control of your cloud data!
Streamlined AWS S3 Data Classification
New Release - Streamlined AWS S3 Data Classification
We are pleased to push a release this week which includes a slick, streamlined AWS S3 scanning experience and a load of other improvements across the platform:
- Streamlined AWS S3 data classification scan experience, which expedites the setup of sensitive data scanning in S3 buckets
- Support for one time scans
- Extended Apache Parquet files capabilities and improved handling of very large files
- Filtering by supported MIME types when building a data inventory
- Updates to data classes: ODBC/JDBC database connector, Facebook API Tokens
Streamlined AWS S3 Data Classification Scanning
Our updated approach to data classification scan jobs provides an efficient way to configure discovery of sensitive data in your S3 buckets. This new interface, combined with our expansive predefined set of data classes, gives you the fastest way to start a scan and identify problems with minimal effort.
Navigate to Scans → Data Scan Jobs to find a new experience for creating data scan jobs.
Upon creating a new data scan job, you’ll be presented with a straightforward, single screen for setting up a scan. You can now specify:
- Scan schedule. Set to run once or repeat as often as every hour.
- Data to find by data collection. Use any one of our pre-configured data collections or create your own data collection.
- File types or names to ignore. The job scans for all of the supported file formats by default, but you can narrow this down however you would like.
- Sample size. The job scans all files by default, but you can scan a subset if you prefer (e.g., for faster completion).
Finally, select the S3 buckets you wish to analyze from the complete list already found by our discovery engine. You can filter the list by AWS account ID, AWS region, or by a few familiar S3 bucket security configurations:
- Public accessibility
- Encryption status
- Back-up status
You will also notice two bars at the top of the page which measure the S3 bucket volume to scan to make sure it is properly sized for successful completion.
Also included in this release:
- Extended support for scanning Parquet files and improved handling of scanning very large files
- Filtering by supported MIME types when building scan inventory
- Updates to data classes: ODBC/JDBC database connector, Facebook API Tokens
We have continued our progress in supporting scanning of even larger Apache Parquet files, and very large files in general. Some of this work requires sophisticated techniques such breaking files into chunks (“file chunking”). Our co-founder and Chief Product Officer Mark Curphey writes extensively about this in a multi-part blog post here.
We understand that not all files in your S3 buckets will have file suffixes (e.g. .log, .parquet, or .txt) and in many cases, have no file extension at all. No problem. We have improved the scanning engine’s ability to determine file type with extended MIME type analysis, as well as using file extensions when they’re desired.
Finally, and just as important, a few incremental updates were made to two important developer credential data classes: the JDBC / ODBC database connector string and the Facebook API token. The database connector class now captures a larger set of strings, including those used for Redshift, MySQL, PostgreSQL, SQL Server, and many more. Our team is committed to ensuring the accuracy of these data classes and the changes here boost accuracy.
Stay Engaged and In Touch
To stay on top of future releases and announcements, you can now subscribe to the Open Raven blog via an updated RSS feed, in addition to email subscription. To subscribe via email, visit the Open Raven blog and click “Subscribe to product release notes” to add your email address.
As a reminder, we would love to hear from you. Just drop us a line at firstname.lastname@example.org, or fill out the form.
New Integrations, Parquet & Avro Support, and More
The Open Raven team has pushed yet another release, including new features and continued improvements to the product that you’re going to love:
- Integrations including webhooks and email notifications
- Columnar support added for big data files like Avro and Parquet
- New asset group experience inspired by Spotify
- Expanded data classification with an additional 45 privacy data classes
Integrations and Webhooks
Webhooks unlock new and exciting ways to connect with external systems. This release allows users to set:
- HTTP method (We currently support GET and POST requests)
- Header parameters
- Body or query string parameters
Webhook configurations enable users to integrate with a variety of SaaS applications and we are preconfiguring integrations for you within the coming weeks including Slack, Jira, PagerDuty, 4ME and ServiceNow. AWS EventBridge provides a Firehose API to pump your data to your favorite SOAR or SIEM.
Columnar Support added for Parquet and Avro Files
Two of the most commonly requested file formats for us to scan have been Apache Parquet and Avro, binary file formats that store data in a columnar data format and are commonly used by big data analytics frameworks like Apache Spark and Hadoop.
In this latest release, when scanning a Parquet or Avro file, we now check if any of the Data Class Keywords match any column / field names in the file. For any that do match, we compare all the values in that column / field against the Data Class Match Patterns, ignoring the Keyword Distance value. If no keywords match column / field names, we stop scanning the file. The result is higher accuracy in classifying data within these files.
Asset Groups Drag and Drop
Everyone loves Spotify, right? We do and we’ve launched a new user experience for building asset groups that’s drag and drop, organized into “fixed” groups and “live” groups inspired by the Spotify interface. Fixed groups are like a playlist of a specific album or fixed set of songs and Live groups are like a playlist that always has the latest releases from your favorite artist.
Adding to any group is as easy as dragging asset(s) into the group’s navigation entry.
Data Classification and Privacy Classes
We’ve expanded our data classification functionality with this release by adding an additional 45 privacy data classes used by countries in the EU. These include GDPR-protected classes with phone numbers, passport numbers, and taxpayer identification numbers. This builds onto an extensive list which already includes other sensitive privacy classes like drivers license IDs and other government issued identifiers.
We are excited to continue to broaden our data classification capabilities and provide protection that better aligns with GDPR. To get started, you can use our pre-built "Privacy" data collection, or build your own data collection specific to your use case or your data handling policy. Expect to see our list of supported data classes continue to grow.
Stay Engaged and In Touch
To stay on top of future releases and announcements, you can now subscribe to the Open Raven blog via an updated RSS feed, in addition to email subscription. To subscribe via email, visit the Open Raven blog and click “Subscribe to product release notes” to add your email address.
As a reminder, we always love to hear from our customers and users. We welcome your ideas.
Open Raven Platform 1.06
Santa has arrived early with a full stocking from Open Raven, now live in all of your workspaces. Updates in this release include:
- Improved asset views and filters
- Tracing external connectivity back to your data stores
- Integrations and notifications - Webhooks, AWS EventBridge Firehose, Slack and more
Read ahead if you want to know what we will be delivering in 2021.
Improved asset views and filters
As you can see from the screenshot below, we’ve given a makeover to asset lists. We kept the best parts of the previous list experience, and added a vastly improved filtering system that makes finding your assets and building asset groups easier and more intuitive.
In the new year, we will be moving to a new user experience for building your asset groups that’s as easy as creating a playlist on your favorite music streaming service. Drag and drop assets, and use smart groups that are automatically built by us based on what we think you should care about. Yeah, we have got your back!
Trace external connectivity back to your data stores
Customers asked and we delivered. You can now click on an external connection in our 3D maps and trace back to the assets and data stores that are connected. Neat, eh? We plan to add geo lookup to those external connections in the new year, making it easy for you to see where they are and who owns the IP addresses.
Integrations - Webhooks, AWS EventBridge Firehose, Slack and more
Yes, integrations are finally here. You can now set up webhooks for anything that supports them and get notified when you have policy violations when they occur. In today’s release the generic webhook interface is available so you can set up integrations yourself and in the new year we’ll document and add the integration specific headers for things like Slack, Jira, PagerDuty, ServiceNow and well… ask and we’ll do it for you.
You can now integrate with popular ticket and IT service management systems like Jira, ServiceNow and 4ME using email or use the email group alias feature to send an email to your team’s distribution list.
This release also brings AWS EventBridge Firehose integration. Pump your entire policy events or asset changes to a firehose to analyze or integrate with your SIEM or SOAR.
The road ahead
It’s been a big year for us and thanks for being part of the journey. We raised a Series A from Kleiner Perkins, won awards from several industry groups, hired a truly world-class team and most importantly have you as our first customers. Thanks! In 2021, you can expect:
- A global data catalog with sensitive data search
- Perimeter attack surface discovery
- SAML and MFA
- Data classification for apps like SalesForce, Slack, ServiceNow & over 20 more.
- Data classification on structured data in RDS, SnowFlake and most non-cloud native SQL databases like MongoDB, PostgreSQL, MySQL
- Data classification on cloud file systems
- Open Raven for Google Cloud Platform (GCP)
2021 is going to be a stellar year and we look forward to working with you to stop data breaches.
GA Release of the Open Raven Cloud-Native Data Protection Platform 1.0
Today is a major milestone for Open Raven: we announced the release for general availability (GA) of our commercial product, the Open Raven Cloud-Native Data Protection Platform. A colossal amount of work by the Open Raven team that owes a considerable debt to the fantastic group of design partners who have been piloting the platform. If you’ve taken the time to speak with us or especially if you’ve used the earlier versions of the platform, we greatly appreciate it and want you to know that it’s made all the difference. In a year chock full of distractions and mayhem, we certainly don’t take it for granted.
Open Raven Platform 1.0 is a giant leap beyond the Community Edition version we announced in February that focused on cloud asset and data discovery.
Some of the key product capabilities in Platform 1.0 include:
- Data Inventory - we can now not only show you where all your data is located across AWS, but we quickly inventory the files allowing for a MacOS like experience for navigating S3 contents. We’ve tested this to work to the billions of files limit thus far. Using the same serverless analysis we introduced earlier this year for fingerprinting, it’s a true differentiator vs. alternative solutions that simply can’t horizontally scale in the same fashion (and stay within a reasonable budget).
- Data Classification - Using the same Lambda-based model, Open Raven can now automatically classify a wide variety of personal, sensitive and regulated data types. At launch we’ll offer a range of defaults (PII, Healthcare, Financial, Developer Credentials, etc.) alongside the ability to customize with pattern matching, data adjacency tuning & the ability to call validation APIs during classification for assuring accuracy. Why trust when you can verify?
- Policies & Monitoring - We now allow you to harness the rich context of nearly any AWS resource, plus an in-depth knowledge of your data to monitor data protection in real-time (e.g., for data exposure, data misuse, compliance violations, etc.). Policies can be chosen from a set of defaults or built from scratch as code-based statements.
- Real-time Maps - Our new 3D maps not only received a red-hot makeover, but show real-time status and support a number of important use cases. One of our favorites is the ability to quickly visualize how data can flow across an environment - especially geographic regions - which is more important than ever given that Privacy Shield was invalidated this past July.
With Open Raven Platform 1.0, we’re frequently asked how we compare to AWS Macie, Amazon’s native data classification service for S3. We’re dramatically different from Macie. If you’re only interested in data classification, then Open Raven is better, faster, and less expensive for identifying sensitive, personal or regulated data. But most SecOps, privacy and cloud teams we talk to want more than a data classification service. They want an end-to-end operational platform, tightly integrating discovery and classification with policy-based monitoring and enforcement to proactively protect their data in AWS from breaches and leaks. And they’d rather not have to build such a platform themselves.
Open Raven key differentiators from Macie include data location (e.g., on EC2), more powerful classification, end-to-end operationalization, visualization across an entire AWS estate, as well as short-term scope beyond S3 and mid-term scope beyond Amazon (i.e., Snowflake and Google Cloud Platform in 2021). Macie’s opaque pricing model is also a sticking point for many. Our straightforward pricing model (we bill by data store w/ 10GB included per store) has been well received by our design partners.
Our design partners are using Open Raven Platform 1.0 to solve a growing list of key challenges with protecting cloud data. The use cases we see most often include:
- Find every instance of a specific type of data (especially developer secrets)
- Know when data is exposed, and when the protection doesn’t match the data type (e.g. personal, sensitive, regulated)
- Run a quick assessment to identify data at risk
- Eliminate any customer data mistakenly left behind by customer facing teams
- Monitor where data can flow and who has access, then respond to violations
- Streamline data-related audits and reporting
- Automate privacy operations
Getting Started with Open Raven - A Few Changes
This month, we quietly launched a SaaS-based free trial of Open Raven Platform 1.0. While we still offer our hybrid deployment - where we deploy directly into a customer’s cloud environment – we now lead with a SaaS based trial which is quicker and easier for many. We received consistent feedback that SaaS was fine in most environments, albeit the hybrid model is still available when required by the circumstances.
With the trial version readily available, we have begun to transition our perpetual free entry point (what we called Community Edition) to a full open source offering. We are hammering out the details of a clearer, more compelling open source strategy to be announced early next year, now being driven by our recently hired Open Source Director, Dave Lester.
It’s Time for a Modern, Cloud-Native Approach to Data Protection
We are consistently reaffirmed that the timing of the Open Raven Platform 1.0 release couldn’t be better. The need for visibility and control of cloud data security and privacy is pervasive as organizations accelerate their transition to the public cloud and accumulate massive amounts of data along the way. The existing amount of manual work, time, expense and resulting number of incidents are clear indications that existing solutions, from cloud providers and security vendors alike, aren’t working.
We’ve been working since early last year to reimagine data protection and we can’t wait to help you. To map your entire cloud infrastructure in minutes. To pinpoint critical data sitting in petabyte-scale S3 buckets. To make compliance absolutely painless and even a little pretty. Drop us a line to Request a Demo today.
Open Raven Platform 0.9 and a Few Release Updates
Open Raven Professional 1.0 is another week closer to general availability and yet again more functionality has been added into this week's preview, 0.9.
This week it's mainly under the hood stuff but we're still on track to be “mainly code complete” ™ on November 3rd with GA on November 17th. “Mainly code complete” ™ is not my attempt at an alternative truth joke but a wink-wink acknowledgement that some things will still be in-flight right up until GA…like Webhooks for Slack and PagerDuty integration for instance…things we never originally planned to do but think are important and users deserve so we are doing them anyways. We are just like that.
As always just go to your cluster url ie acmecorp.openraven.net/dev and turn on the ProPreview feature flag, and as if by magic, your UI will change in front of your eyes and data classification features will appear. It's like David Copperfield but without the hairy chest.
As well as a metric ton of bug fixes and performance improvements (see below) this release is really about adding some new tightly QA’d data classes to find developer credentials :
- AWS secret keys
- ODBC connection strings
- JDBC connection strings
- OpenSSH keys
For open SSH keys, we have unique data classes developed for various formats like PPK, the same techniques we are applying for X.509 certs (i.e. expired certs, types of certs etc). There are 50 data classes across privacy data, financial data, health data and developer credentials in development right now and of course you can write your own. Adding your own basic regex ones are as simple as, well, writing the regex so go wild.
We have (in R&D) a validator function as the first demo of the ability to actually test data that has been found is real data. The AWS validator will take an AWS secret keys found and attempt to login to your AWS, validating its real and determining the account the key is associated with. What, you enjoyed spending your days trying to figure out if credentials were real and what they were for? Of course not…
I mentioned last week we're working on a data fabrication tool that we plan to open source. It's been a good week on that and moved forward. You can inject data sets into documents of various sizes including inside the formats such as MSFT Word tables and charts so we and you know exactly what we can find. It will get extended for every scenario we and you can think of, with the goal to be totally transparent and allowing us and you to generate test data that you can test both our accuracy and compare us against other tools like Amazon Macie. To be clear, we plan to eat Macies lunch and are not afraid to talk about it, Hannibal Lecter style “Some Macie with a side of fava beans and a nice Chanti anyone?” Joking (not joking) aside, the truth is we know the problem we're tacking is a hard one and we won't get it all right out the gate, but by being open and transparent and as fast as hell, you can judge our performance out of the gate and predict what the best long term solution will be. I think we'll beat them out the gate anyways, so buy your Open Raven Halloween novelty mask here!
The next release will look the most complete to date with almost all UI features we are promising for the Platform 1.0 release so get ready. And yeah, you see AWS Marketplace in the change log. More on that next week.
And if you're interested in bugs, nits and minutia (aka engineering reality), we've been working on them as well. See below, but just know we've got it covered, so get back to looking at the very pretty maps!
- When the window size is too large some columns lose alignment in list view
- Dynamic asset-groups do not currently save.
There are a bunch of bug fixes including:
- Making the dmap scheduling toggle work properly
- Fix the UI bug that TCNA hit around AWS account polling
Still behind feature flags:
- Lots of updates to Data Class including lots of new classes
- Improvements to the way we handle scanning
- Movement on policy UI
Under the hood:
- Lots of improvements to enable SaaS
- Updates to the flatcar ami to address CVEs
- Update to kube 1.19 for new clusters
#### Raw List of Changes ####
- [!129] Add AWS SecretManager discovery
- [!133] Add AWS storagegateway discovery
- [!35] Of/fix concurrent account polling
- [!37] Of/configurable batch size
- [!38] Of/scheduling
- [!30] Latest S3 Jar (404 handling, regex fix)
- [!834] style CodeEditor and use AceEditor as implementation
- [ENG-4444] (Sub-task) Validation for required fields should not use the Red Error pattern
- [!838] initial policy list integration with clone, edit, delete, create, and sort
- [!839] initial rules integration with sorting, delete, edit, create
- [!840] Reload table after saving new data scanning item
- [!841] scaffolding for Triage
- [!842] AWS Discovery should show all accounts from cross-accounts API and ES
- [!843] add validation for the policy and rules forms and adjust their mappings to schema changes
- [!844] Policy pages tweaks
- [ENG-4506] (Task) Create Pro Edition v1.0 feature flag
- [!846] Feature/ENG-4488 3D-UI Design Updates
- [!848] Epic/maps v2
- [!785] update README.md file instructions to run product UI locally to account for...
- [!849] Triage Section
- [!209] Add policy service helmfile boilerplate and add utility script to help with boilerplate
- [ENG-4442] (Story) Write violation reports and audit log entries
- [!212] spread out the cluster_type usage
- [!213] turn down logging level on OPA and add /api/policy-rule to policy ingress
- [!214] Stop the noise
- [ENG-4286] (Initiative ) Clusters provisioned this way should have a `cluster_type:saas` in Datadog
- [ENG-4297] (Initiative ) AC: Stop publishing the `admin.conf` to the cloud formation outputs
- [!36] Fix some loose ends from !35
- [!37] Update kubernetes to v1.19.3
- [!38] Update OIDC to reflect SaaS environment
- [!39] Make DmapStack optional and fix a sentry kaboom
- [!55] Changes required to support cluster_type=saas
- [ENG-4302] (Initiative ) Product Changes: Update clusters to 1.19
- [!57] Initial bare-bones SaaS stack
- [!58] Apply ACM by accountId
- [!59] Flatcar 2605 7 0
- [!60] Update saas with new AMIs, fix patch process
- [ENG-4457] (Sub-task) Classifier Loading [S3 Scan Service]
- [!24] ENG-4483 : Clean up gitlab-ci.yml for s3-scan-service
- [!25] prevent sort errors on indexes that haven't been mapped
- [!26] Removed possibly troublesome DataClasses and added more credentials
- [!27] Of/fix idempotence check
- [!28] Of/fix object total size
Open Raven Platform 0.8 and a Few Related Updates
In an era of general doom and gloom, I bring you some good news to hopefully brighten up your week. The Open Raven Platform is another week closer to general availability (GA) and new functionality has been added into this week's release (0.8). Each Tuesday we'll be pushing features, bug fixes and tweaks and we plan to be code complete on November 3rd, with GA on November 17th.
I'll be posting regular blog updates until we go GA but invite you to be "in da club" and you can try out these features now. Just go to your cluster URL, i.e. acmecorp.openraven.net/dev, and you will see the week's feature flags. Toggle them on and off and have a play. It's that simple. If you want a guided tour or help, just contact us or mail us (email@example.com) and we’ll hop on a Zoom together. And yes, I did see the news this week and no it won't be that kind of Zoom. Next week, I’m threatening to include a video narrating some features so get ready (or beware).
The first thing you will be able to play with is Data Classification. Cheaper, faster and better than AWS Macie. There I said it. Sorry AWS marketing but it's true. Open Raven deploys as an AWS Lambda function, so “it scales”, better because you can customize data collections and data classes, much better data matching and you can actually validate that the data is real using the validation API versus just matching a pattern. Think about finding an AWS key and using the validation function to tell you what account it was for. Oh yeah!
In this week's release, we have a small number of data classes (names, addresses, SSN's) in the general PII category and will ship a full suite of PII, PHI, Financial and Credentials for US and UK geos at GA. Healthcare and some credentials are already in QA. Support for text (including JSON, etc.) and office files are in the build now, compressed and all the others that Macie supports in the coming weeks with Apache Parquet and Avro files as a very fast follow. When we have Parquet, you will be able to look at big data from Spark, Amazon EMR, etc., and if you want to investigate Amazon AppFlow you can pipe Slack, SFDC, logs, etc. into an S3 bucket and classify the data from your SaaS apps or app log files. Cool eh? Yes, it is. We even know how to open images, do OCR with tesseract and classify the contents - a later build, but we'll do it. No more dumps of scanned credit cards, health records or employee files on your watch!
Talking of data, we have the boffins (British phrase for propeller heads – US equivalent is geeks) working on a tool to generate very large data sets so you can test Open Raven. I’ll update you next week on progress and then if you are interested in getting your hands on a load of fake (but very realistic) data, let me know. Why there has never been an open-source project to create realistic fake data to test with, versus trying to anonymize customer records is beyond me. Another winter project, I guess.
Our engineering team (who really are amazing) have also added a feature that some people said wasn’t possible. You miss 100% of the shots on goal you never take right? Now, when we initially analyze the buckets to see what files match the criteria we want to classify, we index every single file into an Elastic index. This means you now have the ability to search for files across your S3 fleet just like you would search from your laptop file finder. Search for file duplicates, files of a certain type or name and even of a specific size. And here is the cool part, it supports up to 1 trillion files! You can also see how much data you have, how much in what type of files, what locations etc. We'll wrap that up into a pretty report or visualization for you as soon as we are code complete, but it's super cool and very useful. And as if that isn't enough, I have added a teaser of the Live 3D maps that will ship with Platform 1.0. Give them a look. What started as a vanity project has become truly useful. You can see the security groups, peering relationships and account connectivity as well as security policy violations live as they happen. Instantly see all external connections to your AWS. SimCity for data security. Reserve some space on your SOC wall now!
Finally based on feedback from our users, we are moving to a pure SaaS model where Open Raven hosts the software for you. We will still support the current VPC model for customers that really need it, but this will make deployment and management easier for everyone. If you want us to convert your current deployment to the new SaaS hosting, just let us know. All new trials moving forward from next Tuesday will default to SaaS. And if you’re interested in bugs, nits and minutia (aka engineering reality), we've been working on them as well. See below, but just know we've got it covered, so get back to looking at the very pretty maps!
- [ENG-4427] (Improvement) Enable single run discovery (start, discovery, shutdown
- [!130] Add Amazon ElastiCache discovery
- [!131] Of/index s3objects
- [ENG-4229] (Improvement) Add quiet period to DMAP when scheduling requests
- [ENG-4161] (Story) Update audit log with scan status
- [!27] Enable s3 message listening by default
- [!828] Clear search on "clone", "edit", "delete" and "create"
- [!809] Data Analysis Data Classes edit / create
- [!812] Revert "Merge branch 'temporarily-remove-data-scanner-page' into 'master'
- [!813] add prod env var for asset groups url
- [!814] fix cron parse and fix groups table filter on null
- [!816] Data analysis mr items
- [ENG-4388] (Task) Accounts flow minor updates
- [!818] close icon for modal
- [ENG-4385] (Task) Telemetry for connected accounts flow
- [!820] Redirect to Kibana console instead of Kibana home
- [!821] fast follows for Data Analysis
- [!822] fix audit log ScanName sort
- [!823] fix explore header actions delete button for asset groups
- [!824] fix capitalization on config page
- [!793] Asset groups v2 integration
- [!825] remove feature flags
- [!827] ENG-4423 - fix pagination change on explore list view
- [!826] temporarily fix DMAP assets in 3d mode
- [!201] make asset groups a released service instead of an unreleased one
- [!203] S3 bucket object scanning
- [!205] Add default profile
- [!206] Manual bump of AWS for failed automation
- [!207] Of/remove feature flag
- [ENG-3836] (Story) Add support for issuing s3 scheduling requests
- [ENG-4295] (Initiative ) AC: Implement SSM for SaaS clusters
- [!21] Enable s3 by default
- [!19] Add missing property in prod profile
- [!13] Of/add elastic scheduling support