In the world of cross-device identification, there’s a lot of talk about comparing the performance of device maps (also known as device graphs). Claims typically involve terms such as “accuracy,” “precision” and “recall”, but many people don’t understand exactly what these terms actually mean, and how they relate to one another. In order to effectively evaluate how cross-identification vendors describe the accuracy of their data, it is important to delve a bit into the relevant metrics and how they are related to one another.

There are generally three terms used to describe the performance (or “correctness”) of a probabilistic device map: accuracy, precision and recall. This article will examine these terms, how they are related to each other and which ones are the most relevant in the context of device maps.

## Accuracy

“Accuracy” is one of the commonly-used terms to describe device map performance, although it is actually the least useful! Firstly, it is important to realize that this term is used in two very different ways:

- as a vernacular term intended to mean, generally, how “good” a device map is (this, for example, is how we at Crosswise use the term in our website)
- along with a numerical value, the result of a particular mathematical formula known as “accuracy” (in the realm of “binary classification”)

Any device map vendor stating a numerical value for the “accuracy” of its device map is necessarily using the mathematical definition. This definition can be described as, “the percentage of all of a device map’s correct matches and correct non-matches from among all the possible device combinations in the map.” Yes, this is not easy to digest, so let’s take a look at the mathematical formula for “accuracy” and try to understand it this way:

This formula gives a percentage figure indicating the correctly identified matches, plus the correctly identified non-matches, out of all possible matches.

At first glance, this seems to make a lot of sense: it is a single percentage figure representing how “correct” the device map is, overall. However, there is a clear reason why this formula is nearly meaningless in the context of device maps: the extremely lopsided ratio between device matches and non-matches. We’ll explain.

The above accuracy formula works well when the number of correct matches and incorrect matches in a domain are somewhat close, perhaps 55:45 or even 70:30. However, in the field of matching consumer devices used by a single individual, the number of actual matches which exist is a tiny fraction of all *possible* matches that exist between all devices which exist in a region. Therefore, the “true negatives” number used in the formula will utterly dwarf all the other elements of the formula, meaning that *regardless of how many correct matches the device map identifies (the “true positives” figure), the accuracy result will always be very close to 100%!*

We can demonstrate this with a simple example: let’s assume that in a given geographical region, there are 100 devices. The total number of possible combinations among all these devices is (read as “100 choose 2”) = 4,950. Of these, let’s say that there are 30 actual matches (meaning, that there are 30 people who each own two of the 100 devices for a total of 60 matched devices), with the other 40 devices each being used by a single person.

Now, let’s look at the accuracy figures of three device maps of widely varying performance (for the sake of simplicity, none of these hypothetical device maps contains any false positives, i.e., incorrectly-identified matched devices where no match actually exists):

## Device map 1: 100% correct match rate (all 30 actual pairs identified)

## Device map 2: 50% correct match rate (only 15 of the 30 actual pairs identified)

## Device map 3: 10% correct match rate (only 3 of the 30 actual pairs identified)

Despite the vast difference in the maps’ true performance, the resulting accuracy figures are nearly identical! Clearly, these “accuracy” figures do not provide useful information regarding the relative performance of these three very different device maps. As mentioned above, the reason for this is that the number of “true negatives” will always dwarf the rest of the formula’s components.

Stated differently, it is very easy for a device map to accurately indicate the overwhelming percentage of non-matches – *by simply leaving them out of the map!* – causing the accuracy percentage for almost any probabilistic device map to always be close to 100%.

(As a side note, given this understanding of what “accuracy” really means, the accuracy figures quoted by device map vendors of 91.2% or 97.3% or the like are surprisingly low!)

To summarize our discussion of accuracy: this is a valuable metric in many applications, but it is not useful when describing a device map. This is because the ratio of actual device matches to non-matches is so lopsided that the accuracy formula will almost always result in a figure approaching 100%.

## Precision and Recall

The best way to truly describe the performance of a device map is by using a pair of metrics known as “precision” and “recall”. Let’s start with quick definitions of these two key terms:

- Precision – the percentage of a device map’s matches which are correct matches
- Recall – the percentage of actual matches in a region which are correctly matched in a device map covering that region ( “recall” is sometimes referred to as “reach”)

Right off the bat, it is important to note that “precision” and “recall” must be used together to indicate the true overall performance of any probabilistic device map. This is because if a device map contains a high percentage of correct matches, but they are a very small percentage of the actual matches in a region, its precision is very high, but it will be fairly useless as it covers so few consumers. On the other hand, a device map might contain all the possible matches, in which case its recall will be very high, but only a fraction of the matches it contains will be correct.

## What is Precision?

Precision measures how many of the device pairs contained in a given device map are correctly matched, without considering how many other matched device pairs exist in the region. The formula is:

Re-using our example from the Accuracy section, above (with 100 total devices and 30 true matches), let’s see how four hypothetical device maps fare in terms of precision:

## Device map 1: All 30 true matches identified and no matches incorrectly identified

## Device map 2: 25 true matches identified and no matches incorrectly identified

## Device map 3: 25 true matches identified and an additional 10 matches incorrectly identified

## Device map 4: All 30 true matches identified and another 20 matches incorrectly identified

If you think about it, this definition of precision is an excellent indicator of the performance of a device map. This is because it simply measures how “correct” the map is, both in terms of actual matches accurately identified and false matches which were incorrectly identified. However, as mentioned above, precision alone can be very misleading because it ignores how many of the region’s actual device matches are covered by the device map (more about this later). So we need to pair precision with recall. Read on!

## What is Recall?

Recall, sometimes called “reach,” is the complementary metric to precision. Recall simply measures how many of the actual device pairs in a region are identified within the device map. In formula form:

Continuing with our same example of 100 total devices and 30 true matches among them, we can revisit the same three examples we used in the Accuracy section, above:

## Device map 1: 100% correct match rate (all 30 actual pairs identified)

## Device map 2: 50% correct match rate (only 15 of the 30 actual pairs identified)

## Device map 3: 10% correct match rate (only 3 of the 30 actual pairs identified)

You may have noted that recall ignores any incorrect matches contained in the device map – evaluating these false positives is the job of precision, not recall. Recall simply indicates how much reach, or coverage of a given region, a device map’s correct matches provides.

## The Necessary Trade-off between Precision and Recall

As mentioned before, precision alone can be very misleading without considering recall as well. This is because a probabilistic device map matches devices to individual consumers by analyzing numerous “signals” which may indicate that multiple devices belong to a single person. If the device map vendor wanted to always achieve 100% precision, all it would have to do is include *only *those matches where the signals were so clear that it is almost certain that two devices belong to one person. So it would have only true matches – with no incorrect matches – and the map’s precision would always be 100%.

In other words, one can shoot for super-high precision by including only a tiny number of matches in the device map. This will result in near-100% precision, but with so few results that using the device map in a real-world application would be pointless.

Likewise, a probabilistic device map vendor could shoot for super-high recall – by including every match that may possibly be correct – but then precision will suffer. Let’s illustrate with a final example.

If the device map vendor sets a very low threshold for the signals that identify a match between two devices, the device map in our sample region would contain all 4,950 possible combination pairs that exist among 100 devices. Given that there are only 30 *actual* matches in the region, this would result in a recall of 100% because all 30 of those matches are, in fact, included in the device map:

However, the precision of this device map would be an absurdly low 0.6%:

Of course, such an inaccurate device map would not be useful, despite the fact that it actually does map every actually-paired device in the region!

The bottom line is that there will always be a trade-off between precision and recall. Achieving the optimum balance between the two is one of the main goals of the data science behind probabilistic device matching.

## Conclusion

We have demonstrated three important truths regarding how the performance of consumer device maps is measured:

- Accuracy is very misleading because, in the real world of cross-device mapping, the number of actual incorrect device pairs (from among all possible device-pair combinations in a region) is many orders of magnitude larger than the number of correct pairs. This leads to very high accuracy figures even when few (or no) correct device-pair matches are identified.
- Precision and recall are much more representative metrics with which to measure the performance of a device map.
- Understanding the trade-offs between precision and recall makes it clear that optimizing a device graph for either precision or recall alone is easy, but that its business value will be low. In other words, it is never helpful to look at one metric or the other; they must always be examined together.

Armed with an understanding of the terms accuracy, precision and recall, you are now better able to evaluate the claims made by device map vendors. If a vendor claims that a single number can reflect their device map’s overall performance, beware!

Last updated: