Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Implement Solution For Running Jenkins Agent On Solaris with Java 17 #3742

Open
1 of 3 tasks
steelhead31 opened this issue Sep 24, 2024 · 15 comments
Open
1 of 3 tasks
Assignees

Comments

@steelhead31
Copy link
Contributor

steelhead31 commented Sep 24, 2024

See: #3741 

A solution will need to be investigated and implemented for running Jenkins agents on Solaris with Java 17.

Since JDK 17 is not supported on solaris, this will need additional work.

Specifically these 4 agents..

  • build-siteox-solaris10u11-sparcv9-1
  • test-siteox-solaris10u11-sparcv9-1
  • build-azure-solaris10-x64-1
  • test-azure-solaris10-x64-1

Whatever solution is found, will likely also need to be applied to the TCK solaris nodes.

Investigation an initial prototyping has been done under comments in this issue, the continuation work is in:

@sxa
Copy link
Member

sxa commented Sep 25, 2024

Options:

  1. Attempt to rebuild the new LTS agent ourselves such that it will run on JDK11.
  2. Attempt to build a JDK17 on Solaris 10 (There are patches to make it work with gcc but I struggled last time I tried to use them)
  3. Attempt to build a JDK17 on Solaris 11 and switch to that (also with GCC as most Solaris support was removed in JDK15), while building against a Solaris 10 sysroot (subject to that being feasible in JDK8).
  4. Same as the point above, but run a Solaris 10 zone under Solaris 11, similar to what we do with docker on Linux (Would also potentially allow us to satisfy Investigate feasibility of ephemeral build systems for non-Linux platforms temurin-build#3264 (comment)). This would require a bit of 'roll your own' since I doubt there's a docker plugin that can make this easy...
  5. Running a non-Solaris JDK17 using qemu on Solaris
  6. Running some proxy machine to run the jenkins agent and pass commands to the remote machine (likely hard because of the amount of shell invocations in our jenkins pipelines)
  7. Run a second jenkins server running an older version which can communicate with the Solaris node.

Anything else?

Notes:

  • All of the "build JDK17" choices would likely force us into building each JDK up from 11 to get there.
  • We could probably survive with a 'zero' implementation instead of one with a full JIT if required. I suspect that may be all we could get on SPARC ...

@sxa
Copy link
Member

sxa commented Sep 25, 2024

IMHO Options 1 or 3 would be the cleanest ones if we can make them work.

@steelhead31
Copy link
Contributor Author

  1. A quick test of the tribblix Solaris 11 / JDK17 proved it was incompatible with Solaris 10
  2. Reached out and confirmed that there is 100% no Solaris 10 / JDK17 package.

@sxa
Copy link
Member

sxa commented Nov 15, 2024

Diagram from a discussion this morning:
IMG_20241115_140719.jpg

The diagram in the previous comment is intended to summarise the different ways where builds (also applicable to test, but I'll start with build)
Lines from top to bottom:

  1. The normal situation where the jenkins agent runs on a system, and it runs the actual work on the same machine.
  2. The 'docker build' situation where the agent runs in an environment (host that can run docker) but the actual work is done in a container managed by the jenkins plugins.
  3. Agent runs on a machine, then connects into another environment on the machine which it runs work on over a connection (e.g. ssh). this is what we do with VagrantPlaybook check where the VM is created dynamically
  4. Use an isolated old jenkins server to connect to the Solaris agents, so they can run with the older version. This version would not be updated, so needs to be firewalled off from anything that might connect to it (exception for the main jenkins server so it can initiate and copy artifacts back)
  5. The jobs are run on a schedule to perform a build and then initiate tests (potentially over ssh to a remote machine to ensure build/test machine isolation) The output can be put in a staging area that can be retrieved by a jenkins job which has the artifacts in a way that 'looks' like the existing pipelines.
  6. Run a Solaris 11 host (where it is technically possible to have a JDK17 - third parties have patched it to work) and then use that to run the build (using a shared file system with the host to allow the jenkins agent's file workspace to be shared with the Solaris 10 environment)

Given that above, option 6 is probably the cleanest and give isolation of the Solaris 10 environment from the internet. However this would require updating our machines and adjusting the pipelines to be 'zone aware' (unless there is a jenkins plugin for this as there is for docker, but that seems optimistic!) The simplest solution, which we discussed today, is option 3, where we replace the existing build and test pipelines - for Solaris only - with a time pipeline that will ssh to the Solaris machine and run "git clone temurin-build; run make-adopt-build.farm.sh; copy build artifacts back and artifact them into the jenkins job. We will look at prototyping this.

@sxa
Copy link
Member

sxa commented Nov 25, 2024

Machine defined as dockerhost-azure-solaris-proxy (Currently another jenkins agent on the host machine) and will be prototyped with the jdk8u-solaris-x64-temurin-simple job

@sxa
Copy link
Member

sxa commented Dec 9, 2024

https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-solaris-x64-temurin-simple/
First job which ran green and collected some artifacts. Need to make sure it includes anything else required - testimages etc. and ideally ensure the artifact layout is the same as a normal build. At the moment it doesn't have the timestamps etc. in the filenames. (Although that's probably supplied from the parent job in the normal pipelines

@sxa
Copy link
Member

sxa commented Dec 9, 2024

Todo:

  • Ensure metadata files are archived
  • GPG signing
  • Test case execution
  • Ensure test .tap results can be parsed

@sxa
Copy link
Member

sxa commented Dec 9, 2024

I had a discussion with @smlambert today where I shared the scripts I'm using for this prototype activity. It sounds like what I've got will be feasible. There is a script that runs on the proxy machine which copies the test execution script (dotests.sh) across to the machine and runs it with two parameters to indicate the tests suite it should run (e.g. ./dotests.sh sanity openjdk) and will copy the output* directory back to the proxy workspace where it can be processed as normal.
There are a few options which we can decide on once an initial jenkins job (does not yet exist at the time of writing) has been verified to work:

  1. Have one jenkins jobs that just sequentially runs all of the test suites (e.g. invokes dotests.sh multiple times)
  2. Have a parameterised jenkins job that is invoked once for each test suite (sanity openjdk, extended openjdk etc.) so you have multiple separate runs the same jenkins job
  3. Have individual test jobs for each, which is the most consistent with the existing way the jobs are structured and therefore is likely easier to process the results.

@smlambert
Copy link
Contributor

Any of those 3 approaches will work @sxa (with 2. or 3. being preferred so that the TAP files map to the 9 top-level targets that are required for AQAvit verification).

@sxa
Copy link
Member

sxa commented Dec 9, 2024

I'm starting with option 2 as a proof-of-concept then we can adapt later if desired:

Note that the TEST_JDK_PARAMETER in that job does nothing - the place to download from is currently hardcoded in the dotests.sh script that it calls (Which is stored in ~solaris on the machine) Currently it's the latest artifact of jdk8u-solaris-x64-temurin-simpletest

(For reference to any infra people who need to read this - the two jenkins agents being used for the proxies are running as the same user running the vagrant box on the azure ubuntu dockerhost machine. They are not configured to restart themselves if the machine is rebooted)

@sxa
Copy link
Member

sxa commented Dec 17, 2024

I've now set up jdk8u-solaris-x64-temurin-simple to trigger jdk8u-solaris-x64-temurin-simpletest afterwards and have adjusted the test job so that (for now) it ignores the parameters of which test to run and runs a fixed list of tests (Currently just sanity.openjdk and special.functional for quickly testing the process) and then the full AQA set can be added later.

First run with the modified test job is at https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-solaris-x64-temurin-simpletest/10/console
First run of the build job triggering the test job afterwards is at https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-solaris-x64-temurin-simple/33/

FYI @andrew-m-leonard for awareness. Once this is working adequately (and giving us a TAP test results summary) we'll ideally want to add this to the jdk8 trigger job (along with an equivalent for the SPARC builds)

@sxa
Copy link
Member

sxa commented Dec 18, 2024

The above jobs worked ok and https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-solaris-x64-temurin-simpletest/11/artifact/ (triggered from build job 33) successfully created two separate directories with the TAP file from the two suites. I've now modified the test job and am running another build which has the full set of AQA tests at:

which should trigger this test job:

This new run should include TAP parsing so will have the TAP Test Results on the left hand side of the job page when it is complete.

@sxa
Copy link
Member

sxa commented Dec 19, 2024

DIffernces between EA and release jobs:

  • special.openjdk and dev.functional are not included in the GA builds runs
  • BUILD_REF, CI_REF and HELPER_REF are specified
  • RELEASE is true and WEEKLY is false
  • PUBLISH_NAME does not have the -ea suffix (so is e.g. jdk8u432-b06

The logic to set the filename is in the jdk8u432-b06 so that will likely need to be pulled out and replicated somewhow unless we do it at publish time.

@sxa
Copy link
Member

sxa commented Dec 20, 2024

Purely for consistency at the moment I have created a vagrant user on the SPARC machines. I've set the number of executors on the proxy agent to 2 so it can be used for the SPARC and x64 systems. The vagrant user on the SPARC machines has the same authorized_keys entry as the real vagrant machines on x64.
Equivalent jobs for SPARC have been created to mirror the x64 ones:

@sxa
Copy link
Member

sxa commented Dec 20, 2024

Note that we will still need a solution for enabling GPG signing of these builds. It may require a small piece of pipeline code to enable that (which would possibly be better for queuing up the test jobs too)
Sub-issues have now been created for handling the follow-on tasks.

@sxa sxa changed the title EPIC: Investigate Solution For Running Jenkins Agent On Solaris with Java 17 EPIC: Implement Solution For Running Jenkins Agent On Solaris with Java 17 Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

3 participants