Skip to content

Linkage Monitor Analytics

Elliotte Rusty Harold edited this page Dec 10, 2020 · 13 revisions
  • Status: Draft Proposal, not implemented
  • Authors: @elharo
  • Contributors:
  • Last updated: 2020-12-10

Objective

Measure usage of linkage monitor.

Background

We'd like to have a good overview of:

  • How many repositories and projects use the linkage monitor
  • How many PRs it checks
  • How many linkage errors it finds.

Overview

We'll use Firelog ingestion to collect metrics about

  • Number of runs
  • Linkage errors detected
  • Number of repositories installed in
  • Number of artifacts

This will all be behind the send-analytics flag which is off by default. We'll update com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor so we run it from the script like

java -jar ${JAR} --send-analytics com.google.cloud:libraries-bom

If the flag is present, the monitor pings GA when it runs. If the flag isn't present, it doesn't ping.

Infrastructure

  • Clearcut for storage and reporting
  • Firelog over HTTPS for sending data from the monitor to Clearcut

Detailed design

We hook into com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor. No other packages will include analytics code or depend on GA in any way.

In particular the Maven enforcer rule and the dependencies library will not have any dependencies on analytics.

We will collect:

  • URL of the Github repository
  • Github repository name
  • Github organization
  • Linkage monitor version
  • Java version
  • Maven version
  • PR number and URL
  • Linkage errors detected
  • Amount of time the tool ran

We do not include any user data or personally identifiable information, as can be seen above. Everything we include is already published in the publicly visible output of the individual linkage monitor runs, whether installed in Kokoro or Github Actions. We're simply accumulating this already public information across runs and repos.

TBD: how do we display and report on the data once it's arrived in Clearcut?

Caveats

Latency

HTTP requests will be asynchronous and should not block our existing code.

Scalability

Firelog and Clearcut handle much larger systems and traffic than this.

Dependency considerations

If Clearcut or Firelog goes down, metrics might be lost. However the asynchronous nature of the client library means the linkage monitor will not fail to run.

Data integrity

We rely on Google analytics to store and retrieve all data. Worst case, this data is not critical and can be lost.

SLA requirements

Same as GA.

Security considerations

We need an API key for Firelog that is not published in the Github repository but is bundled into the jar file as part of the build process.

Privacy considerations

We collect information about open source repositories and build systems only. We do not collect any information about any people.

Furthermore, we whitelist the Github organizations we collect information from. Organizations include:

  • GoogleCloudPlatform
  • googleapis
  • google
  • census-instrumentation
  • grpc

We may further restrict this by repository; for instance, to allow collection of information from Apache Beam but not all Apache projects.

Testing plan

TBD

Work estimates

TBD

Launch plans

TBD

Rollback strategy