Machine learning or artificial intelligence – which holds the most potential for securing transactions in the future? In practice, the differences in the disciplines’ predictive capabilities blend together amid the hype. What really matters for today’s security pros is the ability to take trust beyond basic scoring and leverage everything we know from our Internet-scale experience.

Moving Beyond the Device

At first, we at iovation focused on uncovering patterns for device reputation. Our systems recognized users’ machines over time with high assurance. (And still do.) As devices took actions on our subscribers’ systems, our users cataloged evidence of fraudulent and abusive (i.e., ‘bad’) behavior. We amended those data with the relationships we detected across multiple devices and accounts. Administrators could then write rules to flag devices showing associations with other bad devices or accounts, records of past bad behavior or other risk signals.

In this early period, we used some machine learning techniques and simple predictive analytics to refine device recognition patterns with user input and device-account associations.

Also, we encountered a persistent challenge for machine learning: data quality. Some of our users were flagging devices even without confirmed evidence of fraud or abuse. That led to a self-fulfilling prophecy. When the flagged devices reappeared, it made things look that much worse in the user’s system. In those instances, we would explain what was happening and recommend that evidence is reserved for confirmed cases of fraud or abuse.

By 2009, we had 100 million devices in our data set (a number that has grown to more than 5 billion today). We could look closer at millions of transactions, analyzing every aspect of devices involved in fraud for patterns. As our database grew, customers expressed interest in additional axes of risk, such as anomalies in velocity, geo-location, suspicious combinations of device attributes, and other variables. They also wanted to be able to identify trustworthy devices.

Using all the data

Our application of machine learning sharpened our subscribers’ fraud detection, but only tapped into the 2% of transactions that were fraudulent. TrustScore changed that. It was a predictive trust model that looked out for devices with good reputations.

We applied machine learning to the 98% of transactions coming from legitimate customers. TrustScore provided some value, but it was a little too narrow in its approach. It could only comment on devices with tenure in the system, not new devices, and it only predicted trustworthiness, not risk. Our subscribers really wanted a complete predictive score that would call out both trustworthy and risky transactions. That is SureScore.

Instead of focusing solely on the trustworthiness of a device, this approach allows us to make real-time predictions without any knowledge about the particular user involved in the transaction. We analyze clues that the device alone doesn’t offer, such as transactional, contextual and behavioral indicators. That level of nuance exposed more opportunity.

We found a sizeable gray area in the space between clear threats and good customers. Some trustworthy users trigger fraud-prevention measures by happenstance but are otherwise harmless. Identifying the characteristics of honest customers – instead of scanning only for the bad ones – helped minimize this group’s time spent in the review queue. This brought a measurable benefit to the business, too. Beyond catching fraud, our modeling improved the efficiency of our customers’ workflow by reducing the number of cases that require manual intervention.

Machine learning needs human judgment

As I mentioned earlier, the potential of machine learning models is influenced by the quality of the data and the decisions based upon them. For example, in banking, thousands of rapid log-ins and transactions from a single source are a hallmark of fraud rings. Or they could come from popular finance software like Yodlee or Mint. When writing policies and setting rules, institutions have to make judgments that go beyond what a predictive algorithm is capable of learning from transaction data alone.

You need contextual insight to recognize the difference between an aggregator and a fraud ring, even when they exhibit the same behavior. A lot of data cleansing goes into quality machine learning. Neglect that, and it will directly impact the algorithm’s efficacy.

This reminds me of a short and relevant story. While working at a different company, I once evaluated an intrusion-detection system. The system’s algorithm recognized 100% of intrusion attempts on what was the industry’s standard sample dataset. However, as we dug into the results, we realized that almost all of the attacks in the sample data included automated, scripted elements. The intrusion-detection system’s machine learning had decided to focus on network sessions with short durations.

When we simply slowed down the scripts, the same attacks went through undetected. The algorithm didn't have the extra keys to understand how trivial session duration might be to defeat.

This is all to say that you can't blindly trust machine learning to solve your problems. It's going to help. It's going to catch problems that static rules might not. But predictive models still need tuning and oversight from experts in the systems at stake.