HIPAA De-identification: Safe Harbor vs Expert Determination Methods
De-identification is the one place in HIPAA where protected health information can legally stop being protected. Once data is de-identified under the Privacy Rule, it is no longer PHI and the Rule’s restrictions fall away, which is exactly why the standard for getting there is precise and unforgiving. What makes this topic distinct is that the stakes are binary: do it correctly and you can share or analyze data freely; do it loosely and you have an unauthorized disclosure of PHI dressed up as anonymized data.
The two methods, and only two
45 CFR § 164.514(b) defines the two acceptable paths. Safe Harbor requires removing 18 categories of identifiers: names; geographic subdivisions smaller than a state (with specific ZIP code rules tied to population); all date elements more specific than year for dates directly related to an individual; phone, fax, email; Social Security, medical record, health plan, and account numbers; certificate and license numbers; vehicle and device identifiers; URLs and IP addresses; biometric identifiers; full-face photographs; and any other unique identifying number, characteristic, or code. Critically, the covered entity must also have no actual knowledge that the remaining information could identify someone.
Expert Determination is the alternative. A person with appropriate statistical and scientific knowledge applies accepted methods to determine that the risk of re-identification is very small, and documents both the methods used and the justification. This path preserves more analytical value, full dates or finer geography, but it shifts the burden onto a defensible, written expert analysis rather than a fixed checklist.
Re-identification risk is the real test
Both methods exist to manage one danger: that supposedly anonymous data can be linked back to a person. Even after obvious identifiers are gone, combinations of birth date, gender, and ZIP code can single people out, which is why Safe Harbor is so aggressive about dates and geography and why Expert Determination demands a formal risk assessment. Before you release any data set, the question is not whether you removed names, but whether anyone could realistically reconnect a record to an individual.
Where de-identification fits in your risk analysis
Decisions about de-identifying, sharing, and storing data belong inside your broader Security Risk Analysis. The risk analysis required by 45 CFR § 164.308(a)(1)(ii)(A) should account for where you create limited data sets, who performs Expert Determinations, and how de-identified outputs are governed, because a flawed de-identification process is itself a risk to PHI. Mapping these data flows keeps a research extract or analytics export from quietly becoming an uncontrolled disclosure.
The proposed 2026 Security Rule update
Organizations handling data sets should also track the proposed update to the HIPAA Security Rule. The Notice of Proposed Rulemaking (NPRM) was published in December 2024 and is not finalized; if adopted, it would provide a 240-day compliance window after the final rule is published. While the NPRM focuses on technical safeguards rather than rewriting de-identification standards, its emphasis on asset inventories and tighter access controls reinforces the need to know exactly where identifiable and de-identified data live across your systems.
How Medcurity helps
Medcurity helps healthcare organizations document data flows, run a thorough Security Risk Analysis, and keep policies, including de-identification and data-use practices, organized and audit-ready. Pricing is $499/year (about $42/month) for the core platform; larger organizations can request a quote. The result is a clear record of how identifiable data is protected and how de-identified data is produced and governed.
Frequently Asked Questions
What are the two HIPAA de-identification methods?
HIPAA recognizes exactly two methods under 45 CFR § 164.514(b): Safe Harbor, which requires removing 18 specified identifiers, and Expert Determination, in which a qualified statistician determines and documents that the risk of re-identification is very small. There is no third shortcut; a data set that does not satisfy one of these methods is still PHI.
Is removing names and Social Security numbers enough to de-identify data?
No. Safe Harbor requires removing all 18 identifiers, which include names and Social Security numbers but also dates more specific than year, ZIP codes below the first three digits in small-population areas, device identifiers, full-face photos, and any other unique identifying number or characteristic. Stripping only the obvious fields leaves the data identifiable and therefore still protected.
What is the difference between de-identified data and a limited data set?
A limited data set still contains some identifiers, such as dates and city or ZIP, so it remains PHI and requires a data use agreement under 45 CFR § 164.514(e). Properly de-identified data is no longer PHI and is not restricted by the Privacy Rule. They are not interchangeable, and treating a limited data set as fully de-identified is a common compliance error.
When should we use Expert Determination instead of Safe Harbor?
Use Expert Determination when you need to retain detail that Safe Harbor would strip, such as full dates for longitudinal research or finer geography. It allows richer data but requires a documented statistical analysis and the expert’s written rationale, which becomes the evidence you rely on if the determination is ever questioned.
Related reading: HIPAA compliance for research and our HIPAA compliance checklist.