Ensuring PII Security in Enterprises with an Azure Framework
Reading Time: 6 minutes
In the digital age, all companies are increasingly relying on consumer data to create personalized customer experiences. However, with the growing utilization of personal data of consumers, it becomes crucial for businesses to prioritize Personally Identifiable Information (PII) security. Inadequate protection of PII can result in significant financial consequences, such as facing fines of up to 100M Euros ( as per Atlas VPN analysis) for violating GDPR regulations in the event of a data breach.
Ensuring the protection of PII holds paramount significance as data breaches and incidents exposing sensitive information can be detrimental. Interestingly, a number of companies with established policies are oblivious to their inadequacy in safeguarding PII.
In this blog, we will explore the importance of PII security measures for companies dealing with sensitive consumer data and provide insights into their implementation using Azure.
Key factors driving the need for PII security measures:
Several factors emphasize the urgency for companies to prioritize PII security and implement effective security measures:
- Escalating data breaches: The frequency and scale of data breaches have increased in recent years. Companies become susceptible to cyber attacks due to the vast amount of valuable PII they possess. As per Verizon, the frequency of ransomware attacks for the year 2022-23 was greater than the previous five years combined. Implementing robust security measures is crucial to mitigate these risks and protect against potential breaches.
- Regulatory compliance: Companies must adhere to various data protection regulations, such as the General Data Protection Regulation (GDPR) in the European Union and the California Consumer Privacy Act (CCPA) in the United States. The CCPA states that businesses can face a penalty as high as $7,500 per violation [source: TechTarget].
- Evolving consumer expectations: Consumers are becoming more aware of their data privacy rights and are demanding greater control over their PII. Companies need to address these evolving expectations by implementing stringent security measures and transparent privacy practices. Nearly 81% of consumers exhibit some concern about the security of their PII available online, whereas 49% of them are ‘very’ or ‘extremely concerned’ about providing PII access to online accounts, according to a study by pyments.com.
A robust PII security framework using Azure
To protect users from identity theft and fraud, implementing robust data and information security is essential. Leveraging the Azure platform, Sigmoid has developed a PII security framework to ensure authorized access to sensitive data. Our framework allows access to consumer and employee PII data without compromising data security, in line with global data protection regulations.
Keeping in mind the sensitivity of such data, we recognize the necessity for a framework that is both secure and seamless. We ensure high security for PII by following best practices for data anonymization and data access. Our framework encompasses the following 7 approaches:
- Hosting and orchestrating the transformation pipeline leveraging Azure Data factory, and Azure Data Lake Storage as persistence layer
- Configuring data from multiple source files and hashing PII and non-PII data formats through a custom workflow set-up on Databricks
- Storing hashed PII and non-PII data into a sourced layer
- Holding the secure access keys for hashed data through Azure Key Vault
- Hosting a PII API on SQL server through a customized Azure App
- Providing access to the anonymized information (hashed PII data) subsequent to access token request through an API gateway
- Using Azure Active Directory to ensure a robust governance mechanism by managing identities, user access, user management with single sign-on, multi-factor authentication, and conditional access
Fig.1. Sigmoid’s PII security framework using Azure
Framework design
Our framework is designed to store users’ personal information and provide data scientists, analysts, and other users access to this information securely for computational and analytical purposes. The framework comprises a PII blob container, database system, and API portal. The blob container maintains configuration files for the entire framework. Custom codes read and write the data. The processed data is stored in the database system. The framework is designed to ensure that downstream applications and use cases receive the required data in compliance with regulations.
Workflow pipelines
Data is fed into the framework through .yaml configuration files filled in by application users. The vault pipeline detects newly added configuration files, extracts details, and uploads them to blob storage for consumption by the Databricks workflow. The Azure Data Factory (ADF) pipeline triggers the Databricks workflow as per pre-defined schedule. Information is read from files in multiple formats (csv, parquet, delta table) depending on the source. All the configuration and pipelines are automated using DevOps practices.
Data segregation and PII hashing
The PII is identified and stored in the database separately. One copy is stored as-is, while the other is hashed using a deterministic algorithm. Hashing is a one-way operation, ensuring irreversible anonymization, in compliance with GDPR regulations. The second set of hashed PII data, combined with non-PII data from internal teams, is stored in separate data marts.
Data usage and access
Anonymized data supports analytics without revealing PII as the hashed data is consistent across sources while the data fields like email addresses remain secure. Users from data science, business intelligence, and other functions can access anonymous data through the UI in the API module using authorized credentials from Azure Active Directory (AD) with role-based access control (RBAC). Users are given access to only aggregated data with all the technical safeguards put into place. When PII deletion requests occur, we mask the database to maintain pipeline continuity without disruption.
How do companies benefit from a PII security framework?
Let us consider an example. CPG companies and retailers gain valuable information like attribution and lookalike audience insights without exposing individual customer data. They are increasingly leveraging DTC channels to collect consented first-party data (e.g., email addresses) by creating compelling digital experiences, offering discounts, or asking for event sign-ups. It is helping these companies build rich databases of consumer PII data. Extracting insights allows them to understand consumers better and strengthen relationships. However, tracking conversions and attributing them back to advertising campaigns becomes challenging when consumers use different delivery apps off-channel.
Using our secured framework, marketers can automatically push hashed first-party data from websites, form fills, etc. and anonymized transactional data from third-party delivery apps into a privacy-safe environment. Our solution recognizes the two hashed email addresses from both the sources (or a phone number) as the same, and records a conversion. Consumer PII is never revealed to either party, and the record-level data isn’t accessible. CPG brands can gain attribution insights to measure and optimize campaigns, and even power future marketing efforts through lookalike modeling.
Companies can leverage customer data analytics for deriving valuable insights and informing strategic decision-making, all in the realm of this PII security framework. PII data can be leveraged in use cases like personalized marketing, micro-segmentation, fraud detection, real-time customer service, employee analytics and other analytics strategies, while protecting the privacy and rights of individuals. Ultimately, the responsible use of PII data can lead to improved customer experiences, effective marketing strategies, and better business outcomes.
Conclusion
Fully understanding PII is tricky. Following all the regulations can be challenging. Implementing the systems in the right way requires sound technical expertise. Architectural knowledge and strong control over the PII data are necessary for any company to feel confident about their data, which requires strategic planning to get things right. With just the right governance, data protection rules, and awareness, PII can prove to be a very impactful asset for analysis.
About the Authors
Gitesh Shinde is a DevOps Lead at Sigmoid with a strong focus on DevOps practices and automation. His current role involves overseeing and driving the implementation of DevOps principles within the organization, optimizing software development and deployment processes, and fostering a culture of collaboration and efficiency.
Anand Muddebihal is an Associate Technical Lead at Sigmoid with 12 years of experience in data engineering across industries such as retail, banking, and financial services. He is a seasoned expert in designing and delivering data solutions across domains for leading F500 brands.
Featured blogs
Subscribe to get latest insights
Talk to our experts
Get the best ROI with Sigmoid’s services in data engineering and AI
Featured blogs
Talk to our experts
Get the best ROI with Sigmoid’s services in data engineering and AI