Overview
There is often confusion among researchers concerning the question of whether or not the dataset they are working with can be considered de-identified data. From an ethical perspective, the sufficient de-identification of participant data is preferable in almost all cases and used whenever possible, as the de-identification of datasets helps to protect sensitive participant data and minimize the risk of harm in the event of an unanticipated exposure of private research data.
Removal of Identifiers
In order for datasets to be considered sufficiently de-identified, the following identifiers of individual(s) must be removed, as applicable:
- Names
- All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:
- The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and
- The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000
- All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
- Telephone numbers
- Vehicle identifiers and serial numbers, including license plate numbers
- Fax numbers
- Device identifiers and serial numbers
- Email addresses
- Web Universal Resource Locators (i.e. URLs)
- Social security numbers
- Internet Protocol (IP) addresses
- Medical record numbers
- Biometric identifiers, including finger and voice prints
- Health plan beneficiary numbers
- Full-face photographs and any comparable images
- Account numbers
- Any other unique identifying number, characteristic, or code which, when used alone or combined with other readily accessible information, could lead to re-identification
- Certificate/license numbers
Datasets which do not contain any of the above information can be considered sufficiently de-identified. Furthermore, removal of all such identifiers meets the HIPAA Privacy Rule requirements for the sharing and research use of de-identified data.
Resources
Methods of Deidentification of PHI in Accordance with the HIPAA Privacy Rule