You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An APS Jenkins build of the windows-x64-static target on the Base-7.0 branch (with pvAccess at 754a1d2) froze this afternoon while running the tests. The Windows Task Manager reported a deadlock in pvget.exe, which is run by netget.t to fetch a string PV value from softIocPVA.
At the time of the freeze, the netget.tap file contains this:
1..3
ok 1 - Got BaseVersion '7.0.2-DEV' from iocsh
# CA server configuration:
# Channel Access Server V4.13
# No clients connected.
# CAS-TCP server on 127.0.0.1:55064 with
# CAS-UDP name server on 127.0.0.1:55064
# Sending CAS-beacons to 1 address:
# 127.0.0.1:55065
ok 2 - Got same BaseVersion from caget
# PVA server configuration:
# VERSION : pvAccess Server v6.1.0
# PROVIDER_NAMES : QSRV,
# BEACON_ADDR_LIST : localhost
# AUTO_BEACON_ADDR_LIST : 0
# BEACON_PERIOD : 15
# BROADCAST_PORT : 55076
# SERVER_PORT : 55075
# RCV_BUFFER_SIZE : 16384
# IGNORE_ADDR_LIST:
# INTF_ADDR_LIST : 127.0.0.1
At this point, the netget.pl script is running pvget to fetch a string PV using the pva provider.
The Windows Task Manager shows perl.exe, pvget.exe and softIocPVA.exe all running. The pvget.exe "Analyze wait chain" window said pvget.exe is in deadlock and showed several threads, the first 3 and the last 3 lines being highlighted in red:
The equivalent window for softIocPVA shows 9 threads but none were in red or indented. I have not looked at that process at all.
The Debug action on pvget did nothing, but I was able to create a dump file for the process. On killing the pvget process the build of Base completed normally.
The Visual Studio debugger couldn't show me any symbols because this wasn't a debug build so there was no symbol database (I have turned off host optimization for future Jenkins builds, which is supposed to turn on debug symbols). The dump file showed 6 threads running, one more than above (thread 2340 was not shown in the Windows Task Manager).
I can see the call stack for each of these threads, which have different lengths. The deadlock above is between the Main Thread 8056 and thread 3440. This is the call stack for 3440:
I still have the dump file and can try to extract more information if it would help, although without symbols it may not be worth it. I will watch out for this happening again and hopefully with the debug symbols the result will be more useful.
The text was updated successfully, but these errors were encountered:
I assume main() is called from the [8056] Main Thread, so the longer of the two stack traces above.
The code in netget.t looks like this:
my$pvget = "$bin/pvget$exe";
SKIP: {
skip "softIocPVA not available", 1
if$softIoceq"$bin/softIoc$exe";
note("PVA server configuration:\n",
map("$_\n", $ioc->cmd('pvasr')));
skip "pvget not available", 1
unless-x$pvget;
my$pvaVersion = `$pvget -w5 $pv`; # <===== Hang occurred here
like($pvaVersion, qr/$pv\s .* \Q$version\E/x,
'Got same BaseVersion from pvget');
}
The .tap output I posted shows the PVA server configuration: line and the expected output from running pvasr on the IOC, after which the test calls pvget.exe -w5 $pv in back-ticks and captures the stdout string for value extraction and checking.
too bad we don't set thread names on Windows.
Apparently we do (modules/libcom/src/osi/os/WIN32/setThreadName.cpp) but only in debug builds, and the way we do it only works if the debugger is attached at the time the thread gets created — not terribly helpful here. There is a newer way and apparently both approaches can be used together, but the routine used isn't available in my MinGW headers yet and it takes a wide-char string which is a Pandora's Box that I don't feel like opening.
An APS Jenkins build of the windows-x64-static target on the Base-7.0 branch (with pvAccess at 754a1d2) froze this afternoon while running the tests. The Windows Task Manager reported a deadlock in pvget.exe, which is run by netget.t to fetch a string PV value from softIocPVA.
At the time of the freeze, the netget.tap file contains this:
At this point, the netget.pl script is running pvget to fetch a string PV using the pva provider.
The Windows Task Manager shows perl.exe, pvget.exe and softIocPVA.exe all running. The pvget.exe "Analyze wait chain" window said
pvget.exe is in deadlock
and showed several threads, the first 3 and the last 3 lines being highlighted in red:The equivalent window for softIocPVA shows 9 threads but none were in red or indented. I have not looked at that process at all.
The Debug action on pvget did nothing, but I was able to create a dump file for the process. On killing the pvget process the build of Base completed normally.
The Visual Studio debugger couldn't show me any symbols because this wasn't a debug build so there was no symbol database (I have turned off host optimization for future Jenkins builds, which is supposed to turn on debug symbols). The dump file showed 6 threads running, one more than above (thread 2340 was not shown in the Windows Task Manager).
I can see the call stack for each of these threads, which have different lengths. The deadlock above is between the Main Thread 8056 and thread 3440. This is the call stack for 3440:
and this is the stack for 8056:
I still have the dump file and can try to extract more information if it would help, although without symbols it may not be worth it. I will watch out for this happening again and hopefully with the debug symbols the result will be more useful.
The text was updated successfully, but these errors were encountered: