Securing Personal Data in Motion

A solution to discover, secure and govern personal identifiable information (PII) while “in flight” - as it arrives from a batch or streaming data source or moves between compute platforms -  has been launched by StreamSets Inc., Designed with data privacy regulations in mind, StreamSets Data Protector is designed to reduce risk of expensive and embarrassing violations.

Until now, solutions for handling personal data have relied on “after the fact” scanning of data stores which, while valuable, can only discover sensitive data once it lands and potentially has already been shared. Companies are missing the opportunity to encrypt, mask, generalise or discard personal data as it arrives rather than storing it in the clear.

StreamSets Data Protector extends protection to the point of initial data ingestion, leveraging Dataflow Sensors that are part of StreamSets Data Collector. These sensors discover PII by comparing incoming data to built-in patterns such as national ID, tax ID or driver license numbers, bank account or credit card numbers, or IP addresses, or additional patterns created by the customer.

Without the automation StreamSets provides, laborious hand-coding is required to continuously check each data source against dozens or hundreds of PII patterns. This approach becomes impossible, especially as unstructured data and data drift — unexpected changes to the structure and semantics of the incoming data — come to the fore.

“Data protection is crucial in today’s increasingly regulated environment, where numerous rulesets apply and violations bring heavy fines and the potential of brand damage,” said Girish Pancha, CEO of StreamSets.

“Current solutions are insufficient, as they only deal with data after it has already landed, and are blind to data drift that can add new PII to the mix. StreamSets Data Protector closes this compliance gap by extending policy-based control over sensitive data out to the point of ingestion, while gracefully handling data drift whenever it occurs.”

StreamSets Data Protector gives enterprises an automatic, centralized and data drift-resistant way to implement data protection policies across all inbound pipelines. The key capabilities of StreamSets Data Protector are to discover sensitive data, secure it “in flight” and provide centralized governance to ensure continuous policy compliance:

  • Discover — Dataflow Sensors detect sensitive data as it arrives. Incoming data is checked against hundreds of built-in identifiers or patterns defined in enterprise data catalogues. Enterprises can also customise protection by designing their own identifiers.
  • Secure — Once sensitive data is detected, processors can perform a number of standardised operations such as the application of reversible or irreversible obfuscation algorithms, and also take actions such as route, filter, quarantine or alert.
  • Govern — Enterprise-wide policies are centrally managed and applied to pipelines while audit reports trace where personal data came from and how it has been handled. It includes the concept of Security Zones that allow security architects to design defence-in-depth strategies around data. It complements data governance solutions for data at rest, integrating with catalogues such as Alation, Apache Atlas, Cloudera Navigator, IBM Information Governance and Waterline Data.

 

“We’re excited to continue to work with StreamSets to deliver Cloudera’s industry-leading modern platform for machine learning and analytics optimized for the cloud,” said Eddie Garcia, chief security officer at Cloudera.

“StreamSets Data Protector is yet another layer of defence, helping companies build robust dataflow pipelines that immediately detect and secure sensitive information to ensure it doesn’t get into the wrong hands. StreamSets’ direct integration with Cloudera Navigator uniquely enables us to deliver comprehensive, secure and compliant architectures required for meeting a wide range of regulations, including GDPR.”

www.streamsets.com