Platform Developed to Protect Personal Data

Platform Developed to Protect Personal Data

Incorporates web scraping techniques, cutting-edge natural language processing technology, artificial intelligence, data science, and cybersecurity

Zenaida Alzaga

In order to protect personal data and prevent breaches of sensitive information, researchers at the Instituto Politécnico Nacional (IPN) have developed a digital tool called PICIS, which can identify, classify, monitor, and prevent the theft of data hosted on websites.

Dr. Eleazar Aguirre Anaya, professor and researcher at the Cybersecurity Laboratory of the Centro de Investigación en Computación (CIC), explained that data theft and breaches are global issues, largely due to a shortage of trained personnel capable of responding to cyber threats and attacks, which can originate from any location via internet-connected criminal organizations that invest heavily in human and financial resources to carry out these crimes.

In 2024, PICIS was recognized at the Cybersecurity LATAM Awards 2024, which honors leaders and projects that are transforming cybersecurity across Latin America.

The researcher noted that the COVID-19 pandemic accelerated the technological revolution through the massive adoption of digital tools. However, the global supply of specialized personnel and technologies capable of safeguarding sensitive personal data—such as health status, physical characteristics, DNA, fingerprints, facial features, ideology, and political or religious beliefs—remains limited. This shortfall can lead to crimes such as identity theft, extortion, or kidnapping.

To build the Platform for the Identification, Classification, and Monitoring of Sensitive Information (PICIS), a team of experts—including Drs. Eleazar Aguirre Anaya, Gina García Gallegos, Moisés Salinas Rosales, and Raúl Acosta Bermejo—developed machine learning–based AI models to identify and classify information.

Dr. Aguirre Anaya, a Level I member of the National System of Researchers (SNII) under the Secretariat of Science, Humanities, Technology, and Innovation (Secihti), explained that during the first phase of the project, data was collected and processed using pre-existing models developed at CIC. This information was classified into 55 types of personal data, including sensitive categories. The model was installed and configured on CIC equipment.

The first version of PICIS, he noted, supports multiple user roles—administrator, technical support, data analyst, supervisor, and service coordinator—each authenticated and authorized to perform role-specific actions within the system.

The platform incorporates web scraping to extract data for encryption and protection; natural language processing (NLP) to interpret text content; machine learning models to classify the 55 data types; and neural networks to detect sensitive data based on contextual analysis.

PICIS is currently undergoing evaluation by the Centro Nacional de Cálculo (Cenac) and by the Transparency Agency for the People (formerly the National Institute for Transparency, Access to Information, and Protection of Personal Data), now under the Secretariat for Anti-Corruption and Good Governance, to validate its functionality and impact in data protection.

Expanded Storage Capacity

Dr. Aguirre Anaya revealed that the second version of PICIS will feature enhanced storage, processing, and monitoring capabilities. It will be adapted to Google Cloud technology and aims to identify 160 types of unitary data, enabling advanced monitoring and traceability.

This version will incorporate data science algorithms, new AI models, NLP, big data, virtualization, and cybersecurity frameworks. Once complete, the platform is expected to serve as a Data Protection Management System for both public entities and private organizations.

This development is part of the multidisciplinary project titled "Sensitive Information Identification and Classification System in the Cloud", funded under Secretariat of Research and Graduate Studies (SIP) project No. 233, and led by Drs. Eleazar Aguirre Anaya, Raúl Acosta Bermejo, and Sandra Dinora Orantes Jiménez, along with Nidia Asunción Cortez Duarte from the Escuela Superior de Cómputo (Escom).

The project also involves 10 professors from CIC, Escom, and the Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas (UPIITA), as well as 30 undergraduate, master’s, and PhD students from these academic units.