Author: Zeashan Pappa, Senior Solutions Architect, Privacera
We are excited to announce our newest partnership with StreamSets!
As the focal point of this partnership, we’ve introduced the Privacera-StreamSets connector that decrypts and encrypts data in Streamsets ETL (extract, transform, load) pipelines to help joint customers harden data security throughout each stage of the data processing lifecycle. Whether customers are migrating their data to the cloud, or accessing it for data processing, Privacera ensures data is accessible only to authorized users, decreasing risk of data breaches and compliance or privacy violations.
Tightly integrated with the StreamSets DataOps Platform, Privacera’s Crypto Processor provides powerful format-preserving encryption algorithms for column- and key-level data security. This data is protected in motion through a pipeline and preserves source formats. Leveraging Apache Ranger’s KMS (key management service), Privacera delivers joint customers centralized key management, externalization of master keys, and the ability to author policies on those keys, in order to meet regulatory or risk requirements and ensure sensitive PII/PHI/PCI data stays protected.
One of the most common use cases for Privacera’s Crypto Processor is for organizations that want to encrypt sensitive data as it moves from on-premise networks to the cloud.
Privacera uses schemesーa combination of an algorithm, format, and an encryption keyー that can be used organization-wide to encrypt or decrypt data, or even more granularly, depending on business needs. Privacera supports several algorithms for schemes, such as Standard 256 bit AES and FPE (format-preserving encryption), with various system schemes included out-of-the box.
The Privacera-StreamSets integration extends the default processor options available within the StreamSets DataOps Platform, enabling users to drag and drop the Privacera Crypto Processor into pipelines like any other processor and operate on data when configured.
Configuration can be done through the configuration screen of the Privacera Crypto Processor, allowing key fields to be entered, such as the location of the Crypto Config, which dictates the path to the Privacera environment from which users will pull scheme information.
Schemes are accessed by the Privacera Crypto Processor within Streamsets through RPC calls made to Privacera. Schemes are mapped to columns through the use of metadata files, or CSVs. Once a scheme is mapped to a column, that column is encrypted or decrypted as data moves through the pipeline.
Once configured, users can preview pipelines to see the encrypted or decrypted results. (It is important to note the encryption process within the Privacera Crypto Processor happens directly on the StreamSets DataOps Platform worker nodes, so the ability to protect data scales directly with worker nodes).
The Privacera Crypto Processor provides a powerful, centralized way to encrypt sensitive data at scale as it moves through the data pipelines in the StreamSets DataOps Platform. Once landed, data can be decrypted and used by other connected products within the Privacera ecosystem, such as Databricks, Hive, EMR, or Snowflake.
Privacera also offers a powerful, horizontally-scalable encryption as a service product, Privacera Encryption Gateway (PEG), which allows users to integrate Privacera’s encryption capabilities in virtually any application through bulk and transactional RESTful APIs and provided client libraries.