Keep compliant and carry on handling sensitive data in GenAI apps

Thursday, August 29, 2024 - 15:44

by Susanne Richter-Wills

Remember the not-too-distant past, when AI was less a utility and more of a buzzword about the future of tech? Well, that future is now, and AI has become an integral part of operations in nearly every industry. As with all innovation, adopting this new tech comes with its own headaches.

This is especially true for industries dealing with sensitive data: healthcare, insurance, and financial services are where the stakes are highest. The kinds of protected data go by different names - Personally Identifiable Information (PII) or Protected Health Information (PHI), for example - but the care you have to take in using them with AI applications is singular.

It’s maybe a bit ironic that some of the industries that benefit most from AI are the ones most likely to have sensitive data. Doctors are freed from cumbersome - but necessary - note-taking, insurance companies can process some claims in a fraction of the time it would otherwise take, and banks can significantly speed up loan application processes.

In all these scenarios, the extra time can result in happier customers/patients and improved outcomes across the board. However, these benefits come with significant responsibilities, particularly in ensuring the privacy and security of sensitive data.

Feeding sensitive data to traditional software is one thing: there are all kinds of compliance guidelines around how users can interface with the data, as well as how to securely store it.

GenAI applications are another beast altogether - without explicit instructions otherwise, the software actually has some autonomy in handling sensitive information. This is why companies need to take extra care with PHI and PII when looking at GenAI solutions and services.

Common GenAI pitfalls to avoid:

1. Don’t share data without proper safeguards - Software as a service (SaaS) is so prevalent, that integrating GenAI applications into your operations usually means it’s necessary to share data with third-party providers or vendors. However, doing so without ensuring that these parties have robust data protection measures and legal agreements in place can lead to serious consequences. Unauthorized data sharing can result in data breaches, legal penalties, and damage to your organization’s reputation. Ask your vendors if they have the relevant certifications, and don’t be afraid to research if they’ve ever had violations.

2. Beware of inadequate anonymization - Removing direct identifiers from data may seem like a straightforward way to protect privacy, but it’s often insufficient. Inadequate anonymization can leave data vulnerable to re-identification, especially when processed by GenAI systems that excel at analyzing large datasets for patterns. To mitigate this risk, it’s crucial to work with vendors who are experts in data anonymization. Ask about their methods, such as how they ensure data is accurately redacted and pay attention to the answer. Is it all about their software? That’s a red flag. The truth is, without a human to verify redaction accuracy, there’s no way to guarantee that data has been properly anonymized. Additionally, inquire about their rate of over-redaction to avoid unnecessarily compromising the utility of your data.

3. Non-compliance with data minimization principles - GDPR mandates organizations to collect and process only the data necessary for a specific purpose. Ignoring this principle not only increases the risk of data breaches but also leads to non-compliance with regulations, which can result in substantial fines. Before deploying GenAI applications, it’s best to conduct a thorough assessment and determine the minimum amount of data required. Over-collecting data might seem beneficial initially, but it can expose your organization to greater risks and legal challenges.

4. Ignoring retention requirements - Compliance with data retention policies is crucial when dealing with PII and PHI. Retaining sensitive data longer than necessary not only violates GDPR and other regulations but also increases the risk of data breaches. It’s essential to adhere to strict retention schedules and ensure that data is securely deleted once it is no longer needed. Partner with vendors who understand and comply with retention requirements.

Best practices for secure data handling in GenAI applications:

1. Choose vendors with strong data security protections - When selecting a vendor for your GenAI applications, prioritize those who offer comprehensive data security measures. Key features to look for include:

Snippeting/microtasking: This technique involves breaking down data into smaller, manageable pieces that are processed individually, reducing the risk of exposing sensitive information. It’s a crucial feature of our own service and ensures pure anonymity.

Secure automated classification: Automated classification is an essential part of any large-scale document processing operation, but requiring that someone view a document in its entirety (as is needed to classify a document) is a security risk. Replace sensitive data with synthetic labelled data before classification, however, and you protect data privacy without negatively impacting classification accuracy.

Certifications and compliance: We touched on this in the “Just Don’t” section above, but it’s important enough to bear repeating: ensure that your vendor holds the necessary certifications and can prove complianc e- ISO, GDPR, HIPAA, etc. A certification demonstrates a commitment to high standards of data security and compliance while having the certifying authority vouch for you.

2. Ensure precision in data handling - Precision is crucial when processing sensitive data in GenAI applications. Whether it’s redacting PII or adhering to data minimization principles, accuracy is key to maintaining compliance and security.

This goes beyond getting it right; they need to work with only the data that is necessary (see: Data Minimization above). Extracting too much or too little can lead to significant problems. Accurate redaction and document processing are essential to safeguarding sensitive information.

3. Work with vendors who act as data pass-throughs - Data retention poses a significant challenge when handling sensitive information. If there’s no reason for your service provider to be holding onto the data, it’s your responsibility to make sure they don’t. Retaining PII or PHI longer than necessary not only violates GDPR and other regulations but also increases your organization’s risk. It’s crucial to collaborate with vendors who act solely as data pass-throughs, meaning they process but do not retain your data. This practice significantly reduces the risk of data breaches and ensures compliance with retention policies.

Securely integrating GenAI for compliance-heavy businesses is a top priority for us, so we know that the dos and don’ts we’ve outlined in this blog post are just a starting point. Whether you operate in healthcare, insurance, or financial services, the proper handling of PII, PHI, and GDPR-regulated data is critical. As GenAI continues to evolve, so must our approaches to data protection. For your organization to leverage GenAI benefits while maintaining compliance and protecting sensitive information, a deeper conversation is required.

Susanne Richter-Wills is VP Partnerships EMEA at ScaleHub

Search form

Keep compliant and carry on handling sensitive data in GenAI apps