
In previous blogs, we have discussed various privacy and AI laws that create restrictions on the processing of personal information. Today, we discuss de-identified data, which is generally exempt from data processing restrictions under the various privacy laws. De-identified data is simply data that no longer contains a personal identifier that would link it to an individual.
De-identification of data may be your best bet to hedge against liability that stems from processing personal data. As noted, de-identified data is usually carved out as an exception under privacy laws. For example, California, Connecticut, Colorado, Texas, Virginia, and Utah, to name a few, state that “personal information does not include information that is deidentified.” In addition, federal laws like the Family and Education Records Privacy Act (FERPA) and the Health Insurance Portability and Accountability Act (HIPAA) also provide similar exceptions for de-identified data.
So, what is de-identified data and how does an organization de-identify it?
Defining “De-Identified Data”
First, we must take a step back and understand how various jurisdictions define de-identified data in order to identify whether any gaps exist across state lines or even internationally. For this, we will look at high watermark state legislation (collectively the “Acts”) in the privacy sphere, namely:
- California Consumer Privacy Act/California Privacy Rights Act (CCPA/CPRA)
- “Deidentified” means information that cannot reasonably be used to infer information about, or otherwise be linked to, a particular consumer provided that the business that possesses the information.
- Colorado Privacy Act (CPA)
- “De-identified data” means data that cannot reasonably be used to infer information about, or otherwise be linked to, an identified or identifiable individual, or a device linked to such an individual, if the controller that possesses the data.
- Connecticut Act Concerning Personal Data Privacy & Outline Monitoring (CTDPA)
- “De-identified data” means data that cannot reasonably be used to infer information about, or otherwise be linked to, an identified or identifiable individual, or a device linked to such individual, if the controller that possesses such data.
- Texas Data Privacy and Security Act (TDPSA)
- “Deidentified data” means data that cannot reasonably be linked to an identified or identifiable individual, or a device linked to that individual.
- Virginia Consumer Data Protection Act (VCDPA)
- “De-identified data” means data that cannot reasonably be linked to an identified or identifiable natural person, or a device linked to such person.
- Utah Consumer Privacy Act (UCPA)
- “Deidentified data” means data that cannot reasonably be linked to an identified individual or an identifiable individual.
For the most part, how de-identified data is currently defined is mostly the same across state lines. One thing to immediately note is the distinction with Connecticut, Colorado, Texas, and Virginia, as their respective definitions of de-identified data includes data that cannot be reasonably linked to a “device” that is linked to a person.
Obligations for Possession of De-Identified Data
The Acts also include three sets of obligations an entity must follow if they wish to keep or process de-identified data: (a) take reasonable measures to ensure that the information cannot be associated with a person; (b) publicly commit to process data in de-identified form and not to attempt to re-identify data; and (c) contractually obligate any data recipients to comply with the same requirements. The CCPA/CPRA is the only privacy Act that has a significant distinction, where in addition to ensuring information cannot be associated with a person, the CCPA/CPRA also requires a business to ensure that the data cannot be associated with a “household;” defined as a group of cohabitants who share common devices or services, thus expanding the zone of protection.
Standard of De-Identifying Data
The Acts, unfortunately, fail to specify how to de-identify data. The key to understanding how to de-identify comes down to the organization’s commitment “not to attempt to re-identify data.” So the ultimate question is how to ensure that data cannot be re-identified? The answer may lie within the guidelines set forth by HIPAA § 164.514(b).
45 C.F.R. § 164.514(b) lays out “implementation specifications: requirements for de-identification of protected health information.” The first method is the Expert Determination Method, which calls for a qualified expert to use generally accepted scientific statistical and scientific principles to determine whether a covered entity has sufficiently de-identified information. This method is fairly unpopular due to the vague and broad specifications, in addition to the potential cost in furnishing an expert for their time and analysis. The second, more common method is the Safe Harbor Method. The Safe Harbor Method is the preferred choice because it explicitly lists 18 categories of identifiers that should be removed to ensure that data is considered “de-identified.” In our next blog, we will discuss these 18 categories in more detail, including a couple of studies that have delved into statistical analysis of the results and attempts to de-identify.
Concluding Considerations
If personal sensitive information, which includes health and children’s data, is generally considered the most highly sensitive category of personal information, then it should follow that the HIPAA Safe Harbor method would be the ideal method to not only de-identify data, but also to keep the data de-identified. As a preview into our next blog, the limit of HIPAA’s Safe Harbor Method is that it may only apply to the United States. For international entities, particularly those dealing in Europe, HIPAA may not be enough. Regardless, the benefits that de-identification carries cannot be understated. Even though the data may be de-identified, the data that remains is still a valuable source of data for processing and even potentially for training LLMs, all while reducing the liability an entity carries under U.S. Privacy laws.
The information you obtain at this site, or this blog is not, nor is it intended to be, legal advice. You should consult an attorney for advice regarding your individual situation. We invite you to contact us through the website, email, phone, or through LinkedIn. Contacting us does not create an attorney-client relationship. Please do not send any confidential information to us until such time as an attorney-client relationship has been established.