AI's Ability to Identify Individuals in Anonymized Data Sets
Written on
Understanding AI's Capability in Recognizing Social Patterns
It is well-known that numerous companies gather user data and then sell it to interested buyers. Despite efforts to anonymize this information, artificial intelligence has demonstrated an ability to pinpoint individuals among over 40,000 anonymized users, achieving a success rate exceeding 50%.
The Study on Phone Service Users
Last year, a collaborative study from researchers in London and Switzerland, titled "Interaction data are identifiable even across long periods of time," delved into how social interaction patterns can be leveraged to identify individuals within supposedly anonymized datasets.
The researchers developed a neural network designed to identify patterns in users’ weekly social interactions. During the initial test, they utilized interaction data from 43,606 mobile phone subscribers over a span of 14 weeks. This dataset included timestamps, durations, types of interactions (calls or texts), pseudonyms of participants, and the initiator of each communication.
Subsequently, the interaction data was organized into graphs, where nodes represented users and their contacts, while edges depicted the interactions. This graphical representation was then analyzed by the neural network to find similar graphs within the anonymized dataset.
The researchers assessed the neural network's effectiveness in matching targets to their anonymized data across three scenarios:
- One week after the latest records: The neural network successfully identified the target only 14.7% of the time.
- With information about the interactions of a target’s contacts: This scenario yielded the highest success rate, with 52.4% correct identifications, as the neural network had more context about each individual.
- Twenty weeks after the latest records: Even at this extended interval, the neural network managed to correctly identify individuals 24.3% of the time, indicating that social behavior remains identifiable over longer durations.
Exploring University Student Data
To investigate whether these findings were consistent elsewhere, the researchers replicated the study with data collected over four weeks from 587 anonymous university students. The dataset included pseudonyms, encounter times, and signal strength (indicating proximity to others).
In this follow-up study, the neural network correctly identified individuals in the dataset 26.4% of the time.
For a deeper understanding, check out the full paper.
The implications of this study highlight the urgent need for enhanced methods to safeguard individuals' anonymity and privacy. We welcome your thoughts or questions regarding this article!
Subscribe to DDIntel Here. DDIntel curates significant insights from our primary site and our widely-read DDI Medium publication. Explore more valuable work from our community.
Join AItoolverse (alpha) to receive 50 DDINs. Follow us on LinkedIn, Twitter, YouTube, and Facebook.
The Role of Anonymization Techniques
This video titled "How to Anonymize Your Data Before Putting in ChatGPT (LangChain + Presidio)" discusses techniques for anonymizing data before it is processed by AI systems.
The Importance of Data Protection in AI
In the video "Personal Data Pseudonymization Versus Anonymization In The Age Of AI & Big Data," experts explore the differences between pseudonymization and anonymization, especially in the context of AI and big data.