Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: requests to remote always include user-agent #21684

Merged
merged 6 commits into from
Dec 3, 2024

Conversation

lukevmorris
Copy link
Contributor

I have a grpc Remote Execution API server that I use as a backend remote cache for both pants and bazel. I want to be able to distinguish between requests that came from bazel and those that came from pants. I originally relied on the user agent header but found that it was omitted if I configured any other headers.

In the reference sections for remote_execution_headers and remote_store_headers, the pants docs state that "Pants may add additional headers." I think User-Agent is one header that would be useful if always present.

This PR removes User-Agent from DEFAULT_EXECUTION_OPTIONS and opts instead to merge it with user-provided headers. That is, the following configuration:

[GLOBAL]
remote_store_headers = {"Authorization" = "Basic abc123"}

would result in the following HTTP Headers:

Authorization: Basic abc123
User-Agent: pants/2.21.0

These changes are intended to allow the user, if needed, to provide a custom user agent by setting it explicitly in remote_store_headers or remote_execution_headers. I'm not deeply familiar with python, so please let me know if there are any changes you'd like me to make. Thank you!

Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to contribute. In the short-term, I imagine a work-around might be explicitly specifying the user-agent in your pants.toml?


This change makes sense to me, but I think we may need to handle case-insensitivity better: if an existing user has set remote_store_headers = {"User-Agent" = "x"}, this change will result in the headers dictionary containing {"user-agent": "pants/...", "User-Agent": "x"} and I don't know what the behaviour of that will be!

One approach would be to check if the key already exists by traversing the whole dictionary. It's not great, but I can't think of any other option that's not at least that expensive (and... these dicts will almost certainly not be that large):

    @classmethod
     def with_user_agent(cls, headers: dict[str, str]) -> dict[str, str]:
         # Check if user-agent has already been set (with any casing)
         has_user_agent = any(k.lower() == "user-agent" for k in headers.keys())
         if has_user_agent:
             return headers

         return {"user-agent": f"pants/{VERSION}"} | headers

Given this somewhat subtle behaviour, it might be good to add a test at least of this new ExecutionOptions.with_user_agent to global_options_test.py that validates various combinations of the input headers. I'd be imagining something like:

_DEFAULT_USER_AGENT = f"pants/{VERSION}"

@pytest.mark.parametrize(
    ("input", "expected")
    [
        ({}, {"user-agent": _DEFAULT_USER_AGENT}),
        ({"not-user-agent": "foo"}, {"not-user-agent": "foo", "user-agent": _DEFAULT_USER_AGENT}), 
        ({"User-Agent": "title-case"}, {"User-Agent": "title-case"}),
        # anything else that makes sense?
    ]
)
def test_execution_options_with_user_agent_should_match_table(input: dict[str, str], expected: dict[str, str]) -> None:
    assert ExecutionOptions.with_user_agent(input) == expected

(I'm spelling this out because you mentioned you're not familiar with Python so I'm inferring you're not familiar with Pytest and its approach too, sorry if it's info you already know 😄 )


Once we get this working, we'll also need to add a note to docs/notes/2.25.x.md. I suggest adding a new heading after General, e.g. ### Remote caching/execution, with a sentence just describing the change.

@lukevmorris lukevmorris force-pushed the feat/always-set-user-agent branch 5 times, most recently from e915fb4 to f73974c Compare November 26, 2024 19:27
@lukevmorris lukevmorris force-pushed the feat/always-set-user-agent branch from f73974c to 4895a85 Compare November 26, 2024 19:27
@lukevmorris
Copy link
Contributor Author

@huonw great point, and thanks for your feedback. The python walkthrough is very helpful -- the end result looks almost identical to your suggestions, as I couldn't find ways to improve it. I added a couple quick sentences to the release notes for 2.25. Let me know what you think!

@tdyas
Copy link
Contributor

tdyas commented Nov 29, 2024

Should this code also deal with multiple key/value pairs of varying case for a single header?

I ask because on the Rust side in headers_to_http_header_map the conversion should non-deterministically merge them or the "last" one in iteration order will overwrite. However, on the Python side, it would probably be good to validate headers and ensure that only key/value pair exists per header, so that the user gets a good error message.

Doing that would also allow normalizing headers before then checking for user-agent.

Thoughts?

Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change @lukevmorris, looks great

@huonw
Copy link
Contributor

huonw commented Dec 2, 2024

@tdyas that sounds pretty sensible, but I reckon it's a bigger/separate change than this one. I've filed #21704 covering that specifically.

@huonw huonw enabled auto-merge (squash) December 2, 2024 01:57
@huonw huonw disabled auto-merge December 2, 2024 03:13
@tdyas
Copy link
Contributor

tdyas commented Dec 2, 2024

@tdyas that sounds pretty sensible, but I reckon it's a bigger/separate change than this one. I've filed #21704 covering that specifically.

Makes sense then to land this PR to fix the immediate problem then.

@huonw huonw enabled auto-merge (squash) December 2, 2024 21:13
auto-merge was automatically disabled December 2, 2024 23:13

Head branch was pushed to by a user without write access

@lukevmorris lukevmorris force-pushed the feat/always-set-user-agent branch from 79edc7a to 6f98b66 Compare December 2, 2024 23:13
@lukevmorris
Copy link
Contributor Author

Sorry @huonw, I think I properly fixed these lint errors. Turns out there's a helpful pants fmt goal, I just needed to learn how to invoke it correctly. :-)

@huonw huonw merged commit bcd892c into pantsbuild:main Dec 3, 2024
24 checks passed
@huonw
Copy link
Contributor

huonw commented Dec 3, 2024

Thanks for taking the time to contribute!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants