-
Notifications
You must be signed in to change notification settings - Fork 76
Linkage Monitor Analytics
- Status: Draft Proposal, not implemented
- Authors: @elharo
- Contributors:
- Last updated: 2020-12-10
Measure usage of linkage monitor.
We'd like to have a good overview of:
- How many repositories and projects use the linkage monitor
- How many PRs it checks
- How many linkage errors it finds.
We'll use Firelog ingestion to collect metrics about
- Number of runs
- Linkage errors detected
- Number of repositories installed in
- Number of artifacts
This will all be behind the send-analytics
flag which is off by default. We'll update com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor so we run it from the script like
java -jar ${JAR} --send-analytics com.google.cloud:libraries-bom
If the flag is present, the monitor pings GA when it runs. If the flag isn't present, it doesn't ping.
- Clearcut for storage and reporting
- Firelog over HTTPS for sending data from the monitor to Clearcut
We hook into com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor
. No other packages
will include analytics code or depend on GA in any way.
In particular the Maven enforcer rule and the dependencies library will not have any dependencies on analytics.
We will collect:
- URL of the Github repository
- Github repository name
- Github organization
- Linkage monitor version
- Java version
- Maven version
- PR number and URL
- Linkage errors detected
- Amount of time the tool ran
We do not include any user data or personally identifiable information, as can be seen above. Everything we include is already published in the publicly visible output of the individual linkage monitor runs, whether installed in Kokoro or Github Actions. We're simply accumulating this already public information across runs and repos.
TBD: how do we display and report on the data once it's arrived in Clearcut?
HTTP requests will be asynchronous and should not block our existing code.
Firelog and Clearcut handle much larger systems and traffic than this.
If Clearcut or Firelog goes down, metrics might be lost. However the asynchronous nature of the client library means the linkage monitor will not fail to run.
We rely on Google analytics to store and retrieve all data. Worst case, this data is not critical and can be lost.
Same as GA.
We need an API key for Firelog that is not published in the Github repository but is bundled into the jar file as part of the build process.
We collect information about open source repositories and build systems only. We do not collect any information about any people.
Furthermore, we whitelist the Github organizations we collect information from. Organizations include:
- GoogleCloudPlatform
- googleapis
- census-instrumentation
- grpc
We may further restrict this by repository; for instance, to allow collection of information from Apache Beam but not all Apache projects.
TBD
TBD
TBD