Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent deadlock in pvget (seen on windows-x64-static) #138

Open
anjohnson opened this issue Mar 7, 2019 · 3 comments
Open

Apparent deadlock in pvget (seen on windows-x64-static) #138

anjohnson opened this issue Mar 7, 2019 · 3 comments
Assignees

Comments

@anjohnson
Copy link
Member

An APS Jenkins build of the windows-x64-static target on the Base-7.0 branch (with pvAccess at 754a1d2) froze this afternoon while running the tests. The Windows Task Manager reported a deadlock in pvget.exe, which is run by netget.t to fetch a string PV value from softIocPVA.

At the time of the freeze, the netget.tap file contains this:

1..3
ok 1 - Got BaseVersion '7.0.2-DEV' from iocsh
# CA server configuration:
#   Channel Access Server V4.13
#   No clients connected.
#   CAS-TCP server on 127.0.0.1:55064 with
#       CAS-UDP name server on 127.0.0.1:55064
#   Sending CAS-beacons to 1 address:
#       127.0.0.1:55065
ok 2 - Got same BaseVersion from caget
# PVA server configuration:
#   VERSION : pvAccess Server v6.1.0
#   PROVIDER_NAMES : QSRV,
#   BEACON_ADDR_LIST : localhost
#   AUTO_BEACON_ADDR_LIST : 0
#   BEACON_PERIOD : 15
#   BROADCAST_PORT : 55076
#   SERVER_PORT : 55075
#   RCV_BUFFER_SIZE : 16384
#   IGNORE_ADDR_LIST:
#   INTF_ADDR_LIST : 127.0.0.1

At this point, the netget.pl script is running pvget to fetch a string PV using the pva provider.

The Windows Task Manager shows perl.exe, pvget.exe and softIocPVA.exe all running. The pvget.exe "Analyze wait chain" window said pvget.exe is in deadlock and showed several threads, the first 3 and the last 3 lines being highlighted in red:

pvget.exe (PID: 568) Thread: 8056
   pvget.exe (PID: 568) Thread: 3440
      pvget.exe (PID: 568) Thread: 8056)
pvget.exe (PID: 568) Thread: 6372
pvget.exe (PID: 568) Thread: 1492
pvget.exe (PID: 568) Thread: 5996
pvget.exe (PID: 568) Thread: 3440
   pvget.exe (PID: 568) Thread: 8056
      pvget.exe (PID: 568) Thread: 3440

The equivalent window for softIocPVA shows 9 threads but none were in red or indented. I have not looked at that process at all.

The Debug action on pvget did nothing, but I was able to create a dump file for the process. On killing the pvget process the build of Base completed normally.

The Visual Studio debugger couldn't show me any symbols because this wasn't a debug build so there was no symbol database (I have turned off host optimization for future Jenkins builds, which is supposed to turn on debug symbols). The dump file showed 6 threads running, one more than above (thread 2340 was not shown in the Windows Task Manager).

[8056] Main Thread
[2340] pvget.exe thread
[6372] pvget.exe thread
[1492] pvget.exe thread
[5996] pvget.exe thread
[3440] pvget.exe thread

I can see the call stack for each of these threads, which have different lengths. The deadlock above is between the Main Thread 8056 and thread 3440. This is the call stack for 3440:

 	[External Code]	
>	pvget.exe!00007ff73ccdcd83()
 	pvget.exe!00007ff73cccdfd0()
 	pvget.exe!00007ff73cbf2e4d()
 	pvget.exe!00007ff73cbed4a6()
 	pvget.exe!00007ff73cbee8b8()
 	pvget.exe!00007ff73cbec290()
 	pvget.exe!00007ff73cbedb60()
 	pvget.exe!00007ff73cc13448()
 	pvget.exe!00007ff73cc0ede6()
 	pvget.exe!00007ff73ccced02()
 	pvget.exe!00007ff73ccd06fc()
 	pvget.exe!00007ff73ccfc5d8()
 	[External Code]	

and this is the stack for 8056:

 	[External Code]	
>	pvget.exe!00007ff73ccdcd83()
 	pvget.exe!00007ff73cccdfd0()
 	pvget.exe!00007ff73cc0cb48()
 	pvget.exe!00007ff73cc0e55a()
 	pvget.exe!00007ff73cc111a2()
 	pvget.exe!00007ff73cbf3812()
 	pvget.exe!00007ff73cbf3000()
 	pvget.exe!00007ff73cbf352c()
 	pvget.exe!00007ff73cbf3221()
 	pvget.exe!00007ff73cbed4a6()
 	pvget.exe!00007ff73cbee8b8()
 	pvget.exe!00007ff73cbd2564()
 	pvget.exe!00007ff73cbd042d()
 	pvget.exe!00007ff73cba48be()
 	pvget.exe!00007ff73cce2e50()
 	[External Code]	

I still have the dump file and can try to extract more information if it would help, although without symbols it may not be worth it. I will watch out for this happening again and hopefully with the debug symbols the result will be more useful.

@mdavidsaver
Copy link
Member

Not much to go on, but I think about it.

Also, too bad we don't set thread names on Windows.

@mdavidsaver
Copy link
Member

Any idea which is the main() thread?

At this point, the netget.pl script is running pvget to fetch a string PV using the pva provider

So this is a Get operation, not Monitor? no -m?

@anjohnson
Copy link
Member Author

I assume main() is called from the [8056] Main Thread, so the longer of the two stack traces above.

The code in netget.t looks like this:

my $pvget = "$bin/pvget$exe";
SKIP: {
    skip "softIocPVA not available", 1
        if $softIoc eq "$bin/softIoc$exe";
    note("PVA server configuration:\n",
        map("  $_\n", $ioc->cmd('pvasr')));

    skip "pvget not available", 1
        unless -x $pvget;
    my $pvaVersion = `$pvget -w5 $pv`;                  # <===== Hang occurred here
    like($pvaVersion, qr/$pv \s .* \Q$version\E/x,
        'Got same BaseVersion from pvget');
}

The .tap output I posted shows the PVA server configuration: line and the expected output from running pvasr on the IOC, after which the test calls pvget.exe -w5 $pv in back-ticks and captures the stdout string for value extraction and checking.

too bad we don't set thread names on Windows.

Apparently we do (modules/libcom/src/osi/os/WIN32/setThreadName.cpp) but only in debug builds, and the way we do it only works if the debugger is attached at the time the thread gets created — not terribly helpful here. There is a newer way and apparently both approaches can be used together, but the routine used isn't available in my MinGW headers yet and it takes a wide-char string which is a Pandora's Box that I don't feel like opening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants