Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for signing/verifying remotely referenced taskcluster.yml files #187

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bhearsum
Copy link
Contributor

This is an addendum to #182. I'll note that the contents of the RFC only cover verification, because that's the only part that Taskcluster the platform cares about.

In the Firefox CI cluster, I expect that we'll be signing these through Autograph (most likely via https://github.com/mozilla-releng/adhoc-signing at first), and copying the signatures into wherever we publish the .taskcluster.yml files.

@bhearsum bhearsum requested a review from a team as a code owner October 16, 2023 15:30
@bhearsum bhearsum requested review from lotas, petemoore and matt-boris and removed request for a team October 16, 2023 15:30
@bhearsum bhearsum force-pushed the 182-fix branch 2 times, most recently from c2fa32c to 077b489 Compare October 16, 2023 15:33
Copy link
Member

@petemoore petemoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC doesn't cover how the service obtains the key(s) to validates the signatures.

For a multitenant environment, I think it would be better for the repo to stipulate if it requires a signature, and which signing keys it accepts, rather than have a single global key that can be used for signing across the entire deployment, or a single set of keys that apply to all projects. This feels like it should be repo config, empowering the project users who the CI is for.

For example, the .taskcluster.yml in the repo that wants to include the shared .taskcluster.yml file could look like this:

---
version: 1
config-from:
  source: github.com/taskcluster/taskgraph/data/taskcluster-yml-github.yml@main
  signature:
    required: true
    source: taskgraph.sig
    accepted-keys:
      - ed25519: <base64 encoded key> 
      - ....
      - ....
context:
  project-name: mozillavpn
  scopes:
    - secrets:get:project/mozillavpn/*

I think this design is much more flexible, more transparent, and puts the control in the hands of the projects that use it. My concern with the platform deployment approach is it assumes a taskcluster deployment is controlled by a central team, blocks project teams when those staff are not available, and does not support multi-tenant type environments. It is also more opaque, difficult to troubleshoot why the wrong signing key might be in use, more difficult to change the signing key(s) if they need updating (because hidden behind platform config and only visible to operational staff).

I think having it in the .taskcluster.yml makes each .taskcluster a little bit bigger, but the config in there is unlikely to change frequently, and if a key is rotated, it makes it much more visible, provides an auditing history, keeps a git history of the changes that occurred, and who made them, and allows you to roll out changes gradually if required, but with a script you can update all repos in one go if required. This supports the environment changing at mozilla too, if it stops being a single team that control all the CI pipelines of the whole company, and some teams need to move quickly but would like to adopt the same security approach. It is more flexible regarding changes to the organisation.


To accommodate integrity checks, Taskcluster-GitHub will require that any remotely referenced `.taskcluster.yml` files have an associated detached GPG signature which can be verified by a public GPG key that it has been configured with.

Integrity checks will be on by default, but can be disabled by setting `allow-unsigned-remote-references` to `True`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty releng-specific, let's invert this and have feature disabled by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with this being off by default is that a misconfiguration will result in integrity checks being lost, and no easy way to notice it (things will just silently continue to work). Maybe I'm overconcerned about this though? I've asked the SecOps folks for their opinion as well.

@lotas
Copy link
Contributor

lotas commented Oct 27, 2023

We were discussing this with Pete.. :)

The biggest question so far was what problem are we trying to solve? Protect what and from whom?

Some extra ideas that popped up: use scopes

We can put github:allow-includes scope and make github repo roles include it. So if some repo needs this - you can just add scope. This way you can stay flexible and don't lock into deployment.
Going further you can also add more control by adding scopes that would include allowed urls: github:allow-includes:github.com/releng/baseline, etc..
just a thought

@bhearsum
Copy link
Contributor Author

The RFC doesn't cover how the service obtains the key(s) to validates the signatures.

The current draft has them specified in Taskcluster-Github's config.yml. I can see that it is perhaps not specific enough though, maybe that's what your referring to?

For a multitenant environment, I think it would be better for the repo to stipulate if it requires a signature, and which signing keys it accepts, rather than have a single global key that can be used for signing across the entire deployment, or a single set of keys that apply to all projects. This feels like it should be repo config, empowering the project users who the CI is for.

We're getting close to a point where what I believe the needs for Firefox CI are are close to incompatible with is wanted by Taskcluster in general. Specifically, I think we want all of the following for Firefox CI:

  • Repositories should not be able to opt out of signature checks if they are using a remotely referenced .taskcluster.yml
  • Some (possibly all) repositories should not be able to specify their own keys (I'm thinking of level 3 repositories here, where we are very strict about things that go into CI.)
  • Some (possibly all) repositories should only be allowed to pull remotely referenced .taskcluster.yml files from location specified by the Taskcluster-GitHub deployment. (This was a SecOps ask in the RRA.)

Many (all?) of these are quite at odds with what Taskcluster in general seems to want. I'm struggling to come up with a viable path forward here. I'm tempted to say that SecOps and the Taskcluster team needs to work together to come up with it - I feel that I'm largely acting as an intermediary here.

I think this design is much more flexible, more transparent, and puts the control in the hands of the projects that use it. My concern with the platform deployment approach is it assumes a taskcluster deployment is controlled by a central team, blocks project teams when those staff are not available, and does not support multi-tenant type environments. It is also more opaque, difficult to troubleshoot why the wrong signing key might be in use, more difficult to change the signing key(s) if they need updating (because hidden behind platform config and only visible to operational staff).

I think having it in the .taskcluster.yml makes each .taskcluster a little bit bigger, but the config in there is unlikely to change frequently, and if a key is rotated, it makes it much more visible, provides an auditing history, keeps a git history of the changes that occurred, and who made them, and allows you to roll out changes gradually if required, but with a script you can update all repos in one go if required. This supports the environment changing at mozilla too, if it stops being a single team that control all the CI pipelines of the whole company, and some teams need to move quickly but would like to adopt the same security approach. It is more flexible regarding changes to the organisation.

I understand what you're saying about flexibility, but we're not talking about something here that has no workarounds. If you want to include a .taskcluster.yml from a non-approved source, you would have two options:

  1. Talk to RelEng and either that source added, or move the .taskcluster.yml to an already approved source.
  2. Live without the remote reference, and copy in the contents.

There is no hard block stopping work here in any case - you can always do whatever you want in the .taskcluster.yml in a repo you control.

@bhearsum
Copy link
Contributor Author

We were discussing this with Pete.. :)

The biggest question so far was what problem are we trying to solve? Protect what and from whom?

The goal is to ensure that the remote .taskcluster.yml that is processed was authored and published by a known good source. (To guard against man in the middle attacks, compromised GitHub accounts, etc.)

Some extra ideas that popped up: use scopes

We can put github:allow-includes scope and make github repo roles include it. So if some repo needs this - you can just add scope. This way you can stay flexible and don't lock into deployment. Going further you can also add more control by adding scopes that would include allowed urls: github:allow-includes:github.com/releng/baseline, etc.. just a thought

I'm not sure I fully understand this suggestion...are you saying that these scopes would control which repositories remotely referenced .taskcluster.yml files could come from? If so, that seems like a reasonable alternative to mapping project repositories to these repos in the Taskcluster-GitHub configuration. (It doesn't solve the integrity checking part of this - but it does address another thing that SecOps wanted.)

@ahal
Copy link
Contributor

ahal commented Nov 13, 2023

I think the disconnect here is stemming from the fact that the Taskcluster team are approaching this with the lens of developers as the target users and a "hacker ethos" (empower them as much as possible).

I think normally that's the right approach, but in this case our aim is to lock things down, the opposite of empowering them. Think of it from the lens of selling Taskcluster to an enterprise user and the request makes a lot more sense. Enterprise users (and fxci) need controls to prevent footguns and security oopsies. I think Taskcluster is best suited for large enterprises, so IMO it makes a ton of sense to build these controls directly into the platform.

That's not to say we need to enforce these controls on anyone. Every instance can be free to use or not use them as they see fit.

With that in mind, @petemoore is there any compelling reason not to specify the keys as a deployment configuration?

@ahal
Copy link
Contributor

ahal commented Nov 13, 2023

Also there's no reason they couldn't be configurable in both the deployment and the .taskcluster.yml if you wanted.. but I don't think fxci would use the .taskcluster.yml version, so would likely be a case of YAGNI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants