Fine-grained Targeting Detection at Scale with Statistical Confidence

View the Project on GitHub columbia/sunlight

Our research.

Today’s Web services leverage users’ information – such as emails, search logs, or locations – and use them to target advertisements, prices, or products at users. Presently, users have little insight into how their data is used for such purposes. To enhance transparency, we are building a new set of tools system that detect what data – such as emails or searches – is used to target which ads in Gmail, which prices in Amazon, etc. The insight is to compare ads/prices witnessed by different accounts with similar, but not identical, subsets of the data.


Sunlight is an analysis pipeline that provides causal targeting detection with statistical confidence, and at scale. In the paper, we propose a 4 steps pipeline to form and assess targeting hypotheses. Our pipeline is build in a modular way, and allows extensive comparison of different algorithms. We highlight a fundamental scalability trade-off between the number of hypotheses we can make and the confidence we have in these hypotheses.

We also conducted large studies of ad targeting on Gmail and the Web, and found some evidence of targeting that contradicts Google's privacy FAQs.


Authors and Contributors

Mathias Lecuyer, Riley Spahn, Yannis Spiliopoulos, Augustin Chaintreau, Roxana Geambasu, Daniel Hsu