Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solaris/x64 test failures on new Vagrant environment #5127

Closed
sxa opened this issue Mar 6, 2024 · 13 comments
Closed

Solaris/x64 test failures on new Vagrant environment #5127

sxa opened this issue Mar 6, 2024 · 13 comments
Assignees

Comments

@sxa
Copy link
Member

sxa commented Mar 6, 2024

Ref: adoptium/infrastructure#3347 (comment)

We are seeing some failures on the new Solaris/x64 machines which have been provisioned on x64 hosts using Vagrant/Virtualbox virtualisation:

https://ci.adoptium.net/job/Grinder/9047/testReport/ is the latest run which was against the latest release, although that will be superceded by a run against the nightly build at https://ci.adoptium.net/job/Grinder/9051/testReport (Not yet complete - will update this with the specific test case failures once it's done)
The comparable run on one of the previous ESXi machines is https://ci.adoptium.net/job/Grinder/9052 but that is expected to pass.

@sxa
Copy link
Member Author

sxa commented Mar 6, 2024

Latest run on the latest nightly build (so in sync with the test material) had five failures on the new system but were green on the old ones:

Suite Test
hotspot_jre_0 GetObjectSizeOverflow - Insufficient memory for mmap
jdk_io_0 LargeCopyWithMark - Insufficient memory
jdk_net_0 ADatagramSocket Address already in use (Bind failed)
jdk_security3_0 SSLEngineExplorerMatchedSNI - Input record too big: max = 16709 len = 31792
jdk_management_0 SSLConfigFilePermissionsTest- Bind failure on port 4999

Hotspot targets re-run at Grinder#9054

Re-run with RAM increased to 6GB: https://ci.adoptium.net/job/Grinder/9056/ which has reduced the failures to the last two in the table above.

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Apr 3, 2024

Reran the failing tests on the new azure build solaris machine build-azure-solaris10-x64-1 using the rerun link from the comment above:

https://ci.adoptium.net/job/Grinder/9349/testReport/

failing tests

 java/io/BufferedInputStream/LargeCopyWithMark.java.LargeCopyWithMark
 java/net/DatagramSocket/SetDatagramSocketImplFactory/ADatagramSocket.java.ADatagramSocket
 javax/net/ssl/ServerName/SSLEngineExplorerMatchedSNI.java.SSLEngineExplorerMatchedSNI
 sun/management/jmxremote/bootstrap/SSLConfigFilePermissionTest.sh.SSLConfigFilePermissionTest

@sxa
Copy link
Member Author

sxa commented Apr 3, 2024

Reran the failing tests on the new azure build solaris machine build-azure-solaris10-x64-1 using the rerun link from the comment above:

All the same except for the mmap one which may be intermittent then.

@steelhead31
Copy link
Contributor

Taking a look.

@steelhead31
Copy link
Contributor

Down to 2 failing tests, after increasing memory to 8GB and CPU cap to 75%..

Test Result (2 failures / -2)

@steelhead31
Copy link
Contributor

I can get this test to pass by killing the DHCP agent , which is blocking port 4999

sun/management/jmxremote/bootstrap/SSLConfigFilePermissionTest.sh.SSLConfigFilePermissionTest

I'll attempt to find a solution to get dhcpagent to use a different port.

@steelhead31
Copy link
Contributor

@sxa
Copy link
Member Author

sxa commented Apr 10, 2024

Interesting - I'd suggest re-running that Grinder with 1000 iterations and see how many times it passes to get a feel for what the suitable course of action would be.

@steelhead31
Copy link
Contributor

I've found a root cause, its related to xvfb not being able to start and stop cleanly and write to /tmp and /tmp/.X11-pipe, mostly what looks like permission related things..

@sxa
Copy link
Member Author

sxa commented Apr 10, 2024

Is there an error in the log related to xvfb? I'm somewhat surprised that if Xvfb is unable to start we're only seeing an issue with one SSL test (I'm also a touch surprised that such a test needs Xvfb but 🤷🏻 )

@jiekang jiekang moved this from Todo to In Progress in 2024 2Q Adoptium Plan Apr 10, 2024
@steelhead31
Copy link
Contributor

The final issue, appears to pass intermittently on both machines..

The build machine:

One pass out of 5 runs here.. https://ci.adoptium.net/job/Grinder/9448/console

and the same result on the test machine..

https://ci.adoptium.net/job/Grinder/9450/console

@sxa
Copy link
Member Author

sxa commented Apr 10, 2024

Good to know - I've just kicked off 500 on each of the three machines to see how intermittent it is and whether there is a difference in the ESXi vs Vagrant/VirtualBox hosted ones.

@steelhead31
Copy link
Contributor

steelhead31 commented Apr 15, 2024

This can be closed. as determined to be intermittent, with around 48/500 test attempts passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants