Linkage Monitor Analytics

Status: Draft Proposal, not implemented
Authors: @elharo
Contributors:
Last updated: 2020-12-10

Objective

Measure usage of linkage monitor.

Background

We'd like to have a good overview of:

How many repositories and projects use the linkage monitor
How many PRs it checks
How many linkage errors it finds.

Overview

We'll use Firelog ingestion to collect metrics about

Number of runs
Linkage errors detected
Number of repositories installed in
Number of artifacts

This will all be behind the send-analytics flag which is off by default. We'll update com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor so we run it from the script like

java -jar ${JAR} --send-analytics com.google.cloud:libraries-bom

If the flag is present, the monitor pings GA when it runs. If the flag isn't present, it doesn't ping.

Infrastructure

Clearcut for storage and reporting
Firelog over HTTPS for sending data from the monitor to Clearcut

Detailed design

We hook into com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor. No other packages will include analytics code or depend on GA in any way.

In particular the Maven enforcer rule and the dependencies library will not have any dependencies on analytics.

We will collect:

URL of the Github repository
Github repository name
Github organization
Linkage monitor version
Java version
Maven version
PR number and URL
Linkage errors detected
Amount of time the tool ran

We do not include any user data or personally identifiable information, as can be seen above. Everything we include is already published in the publicly visible output of the individual linkage monitor runs, whether installed in Kokoro or Github Actions. We're simply accumulating this already public information across runs and repos.

TBD: how do we display and report on the data once it's arrived in Clearcut?

Caveats

Latency

HTTP requests will be asynchronous and should not block our existing code.

Scalability

Firelog and Clearcut handle much larger systems and traffic than this.

Dependency considerations

If Clearcut or Firelog goes down, metrics might be lost. However the asynchronous nature of the client library means the linkage monitor will not fail to run.

Data integrity

We rely on Google analytics to store and retrieve all data. Worst case, this data is not critical and can be lost.

SLA requirements

Same as GA.

Security considerations

We need an API key for Firelog that is not published in the Github repository but is bundled into the jar file as part of the build process.

Privacy considerations

We collect information about open source repositories and build systems only. We do not collect any information about any people.

Furthermore, we whitelist the Github organizations we collect information from. Organizations include:

GoogleCloudPlatform
googleapis
google
census-instrumentation
grpc

We may further restrict this by repository; for instance, to allow collection of information from Apache Beam but not all Apache projects.

Testing plan

TBD

Work estimates

TBD

Launch plans

TBD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly