Thinking Through Personal Data Protection
Historically, iovation has collected little personal data; our approach has always depended on anonymous, non-identifying data. Think browser user agent strings and device screen resolutions, not social security or bank numbers. This data minimization strategy allowed us to build effective fraud-prevention and authentication services without collecting huge swathes of personally identifiable information (PII). In truth, as a business, iovation has no interest in personal data, only patterns indicative of malfeasance, and in streamlining authentication for trustworthy users.
As a result, we possess little data attractive to identity thieves or KBA looters, reducing the chances of being targeted for exploits. Still, in the event of a breach, even minimally-identifying data could be correlated with other sources to roughly identify individuals.
Given our commitment to responsible data stewardship, as well as the invalidation of Safe Harbor and the advent of the GDPR, we saw an opportunity to reduce these modest but very real risks without impacting the efficacy of our services. A number of methodologies for data protection exist, including encryption, strict access control, and tokenization. We undertook the daunting task to determine which approaches best address data privacy compliance requirements and work best to protect customers and users — without unacceptable impact on service performance, cost to maintain infrastructure, or loss of product usability.
The most straightforward method to protect private data is to encrypt it. Most technology companies today use industry-standard encryption algorithms, such as AES, to protect some — if not all — data in transit and at rest. Many organizations adopt TLS for all public communications, but rely on firewalls to protect internal data from exploits. Given the need to protect private data from internal and external exposure, this pattern is no longer sufficient. We recommend converting all internal and external network communications to TLS as soon as possible.
Encryption at rest is a less-clear win. Although OS-supported storage encryption has become de rigueur in the industry, and is absolutely essential for protecting data on portable computers, its use for service platforms addresses only hardware theft. Most breaches happen online, taking advantage of network and service vulnerabilities to gain access to live systems. Encrypted storage hardly matters when exploits allow access to a running systems with full access to decrypted data. Nevertheless, encryption of data at rest usefully de-scopes some services from compliance audit, so we strongly recommend encrypting the storage of all data not otherwise protected.
The most common feature of PCI DSS is the “Cardholder Data Environment”, or CDE, where secure data storage and tightly controlled access policies minimize exposure and vulnerabilities. Possession of cardholder data demands such an environment, but even in the absence of such data, deploying services to a PCI-compliant architecture better protects any data. Require strong multifactor authentication to mediate access, and firewalls and network partitioning to control the paths to access and minimize exposure of the services. Call it a Secure Data Environment, “SDE”.
One might opt to protect private data by moving all data and services into an SDE. Sadly, such an approach does not reduce compliance audit scope: all such services would still be in scope for auditing. Furthermore, an SDE increases the difficulty for engineering teams to diagnose and debug real time performance issues, since a very limited number of employees can access the environment. Such an approach therefore makes for an expensive new architecture that, while better protecting personal data, otherwise provides limited advantages to justify the expense. Tokenization, as you’ll see below, better addresses privacy needs while protecting private data.
Still, some services cannot otherwise be protected without undue impact on their usability. For example, a UX providing full-text search capabilities would suffer serious usability regressions were its data to be tokenized, since partial searches and sort ordering cannot be performed on randomized token data. Building an the SDE just large enough to host full-text search services and other user affordances ensures that data remains well-protected while allowing users to continue to work effectively.
Tokenization is the process of substituting data with an opaque value with no inherent meaning of its own. The token effectively replaces the original value, with no direct way to recover that value. When organizations protect private data with unique tokens, those tokens cannot be used to correlate data from other sources, because each organization uses different tokens for the same values.
Happily, compliance auditing traditionally de-scopes tokenized data sources, thanks to the meaninglessness of the tokens themselves, the improbability of correlation with other sources, and the infeasibility of recovering the original values. This model nicely fits the scoping and complexity requirements of a solid data protection strategy: as long as aggregation & analysis services like FraudForce can compare equivalent tokens for equivalent values, they will continue to operate just as well as before, but with no unprotected private data at all.
So which techniques should you adopt to improve personal data protect in your organization, thereby doing the right thing for your customers and users while ensuring best practices for GDPR compliance? The short answer is: all of the above. Some systems will be encrypted and gated by strict access controls. Others will live in an expanded and closely-regulated Secure Data Environment. The rest will process and store tokenized private data fields. In general, start with the assumption that systems will be tokenized, and work out the exceptions as necessary.
Therein lie the details, of course. Tokenization, for all its advantages, requires a huge undertaking. In future posts, we’ll look at the challenges of tokenization, how to select a solution, details of good design, and a vision for an overall data privacy architecture. Stay tuned.