MatomoCamp 2023

Maximizing Privacy through Clean Insights on Matomo
11-10, 12:00–12:45 (Europe/Paris), Livestream Room 1
Language: English

While adopting Matomo on its own improves the privacy of users and visitors, sometimes more must be done. We will talk about the kinds of privacy different situations require and explain our techniques and tools for achieving them in simple language.


While adopting Matomo on its own improves privacy of people, sometimes more must be done. It may be that your application, site, or service is being deployed within a different legal or ethical context, or with a higher risk community of users or type of application and data. There may also be a desire or need to do a specific kind of time-limited study, with a specific set of consent requested from people interested in participating. Whatever it is, it is clear there is a need for more than a binary “yes or no” question when it comes to measurement, analytics, and telemetry.

In this talk, we will discuss the kinds of privacy different solutions require and explain the techniques and tools we have developed for achieving them with Clean Insights using simple language. Clean Insights gives developers a way to plug into a secure, private measurement platform through an alternative client SDK that works on top of the Matomo server. It is focused on assisting in answering key questions about app usage patterns, and not on enabling pervasive invasive surveillance of all user habits. Our approach provides programmatic levers to pull to cater to specific use cases and privacy needs. It also provides methods for user interactions that are ultimately empowering instead of alienating.

Our techniques for this include:

DATA MINIMIZATION - Take only what you need
Only the minimum amount of usage and behavioral data should be gathered to answer a determined set of questions. The frequency, range, and level of details of measurements should be as small as possible.

SOURCE AGGREGATION - No Needles, Only Haystack
Possibly identifying data should not be held in any part of the system longer than necessary, aggregated at the source at the earliest possible time.

DETAIL GENERALIZATION - Dilute, Rinse, Repeat
Dilute the attributes of data subjects by modifying the respective scale or order of magnitude (i.e. a region rather than a city, a month rather than a week)

ENGAGED TRANSPARENCY - Get Consent Early & Often
Always get consent, and the scope of the data collection and algorithms used should be made publicly available and well explained.

Nathan is the founder and director of Guardian Project, an award-winning, open-source, mobile security collaborative with millions of users and beneficiaries worldwide. Their most well known app is Orbot, which brings the Tor anonymity and circumvention network to Android devices, and has been installed more than 20 million times. He was part of the original team that created Clean Insights as part of a Harvard University "Assembly" program run by the Berkman-Klein Center.

John is an software engineer, data scientist and policy wonk. He's a two time Berkman Klein Assembly Fellow (cybersecurity & privacy; and mis- & disinformation), and has degrees in technology policy, AeroAstro and mechanical engineering. In addition to working on a handful of fun and interesting things like Clean Insights at the Guardian Project, John's a pilot and a volunteer EMT. He's worked on things ranging from educational technology to flight test to mental health systems of care.