Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure threads are labelled in cardano-node #6086

Open
jasagredo opened this issue Jan 24, 2025 · 4 comments
Open

Ensure threads are labelled in cardano-node #6086

jasagredo opened this issue Jan 24, 2025 · 4 comments
Assignees

Comments

@jasagredo
Copy link
Contributor

Most of the threads in a running node are unlabelled. They can be seen with the -Ds flag, threads only show their address in the heap.

In order to be able to properly debug things we should try to have those threads named, such that we can then observe who throws exceptions to who or which thread gets blocked.

I built a custom GHC 9.10.1 which auto-labelled threads with a callstack, and did the same in the async package, to end up with this debugging traces:

threads on capability 0:
	thread    3 @ 00007ff4d9dfa000 ["main"] is not blocked (TSO_DIRTY)
threads on capability 1:
other threads:
	thread    2 @ 00007ff4d9dfb088 ["IOManagerThread"] is blocked on an external call
	thread    8 @ 00007ff4d9ed7000 ["AutoUpdate"] is blocked on an MVar @ 00007ff4d9ed5ef8
	thread    9 @ 00007ff4d9ed57f0 ["CallStack (from HasCallStack):
  rawForkIO, called at .\Control\Concurrent\Async\Internal.hs:102:27 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  async, called at src\Katip\Core.hs:788:46 in katip-0.8.8.2-inplace:Katip.Core"] is blocked on an STM operation
	thread   10 @ 00007ff4db506370 ["CallStack (from HasCallStack):
  rawForkIO, called at .\Control\Concurrent\Async\Internal.hs:102:27 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  async, called at src\Cardano\BM\Backend\Switchboard.lhs:237:13 in iohk-monitoring-0.2.0.0-inplace:Cardano.BM.Backend.Switchboard"] is blocked on a black hole 00007ff4db5061d0
	thread   11 @ 00007ff4db0a5500 ["CallStack (from HasCallStack):
  forkIO, called at .\Control\Concurrent\Async\Internal.hs:858:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  forkRepeat, called at .\Control\Concurrent\Async\Internal.hs:509:10 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  linkOnly, called at src\Cardano\BM\Backend\Switchboard.lhs:265:5 in iohk-monitoring-0.2.0.0-inplace:Cardano.BM.Backend.Switchboard"] is blocked on an STM operation
	thread   12 @ 00007ff4da0ee488 ["CallStack (from HasCallStack):
  forkIO, called at libraries\base\src\Control\Concurrent.hs:199:5 in base:Control.Concurrent"] is blocked on an MVar @ 00007ff4da0f0090
	thread   13 @ 00007ff4da0ef4d0 ["Date cacher (AutoUpdate)"] is blocked on an MVar @ 00007ff4da0f2bc8
	thread   14 @ 00007ff4d9dfa088 ["CallStack (from HasCallStack):
  forkIO, called at .\Network\Wai\Handler\Warp\Windows.hs:21:12 in warp-3.4.7-inplace:Network.Wai.Handler.Warp.Windows"] is blocked on an external call
	thread   15 @ 00007ff4d9ed30a8 ["CallStack (from HasCallStack):
  rawForkIO, called at .\Control\Concurrent\Async\Internal.hs:102:27 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  async, called at src\Cardano\BM\Backend\EKGView.lhs:347:5 in lobemo-backend-ekg-0.2.0.0-inplace:Cardano.BM.Backend.EKGView"] is blocked on an STM operation
	thread   16 @ 00007ff4d9ed4230 ["CallStack (from HasCallStack):
  forkIO, called at .\Control\Concurrent\Async\Internal.hs:858:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  forkRepeat, called at .\Control\Concurrent\Async\Internal.hs:509:10 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  linkOnly, called at .\Control\Concurrent\Async\Internal.hs:492:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  link, called at src\Cardano\BM\Backend\EKGView.lhs:250:9 in lobemo-backend-ekg-0.2.0.0-inplace:Cardano.BM.Backend.EKGView"] is blocked on an STM operation
	thread   17 @ 00007ff4d9dfb548 ["CallStack (from HasCallStack):
  rawForkIO, called at .\Control\Concurrent\Async\Internal.hs:102:27 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  async, called at src\Cardano\BM\Backend\Prometheus.lhs:75:5 in lobemo-backend-ekg-0.2.0.0-inplace:Cardano.BM.Backend.Prometheus"] is blocked on an MVar @ 00007ff4da0ee3a8
	thread   18 @ 00007ff4da0ee678 ["CallStack (from HasCallStack):
  forkIO, called at .\Control\Concurrent\Async\Internal.hs:858:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  forkRepeat, called at .\Control\Concurrent\Async\Internal.hs:509:10 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  linkOnly, called at .\Control\Concurrent\Async\Internal.hs:492:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  link, called at src\Cardano\BM\Backend\EKGView.lhs:258:21 in lobemo-backend-ekg-0.2.0.0-inplace:Cardano.BM.Backend.EKGView"] is blocked on an STM operation
	thread   19 @ 00007ff4d9ed24c8 ["CallStack (from HasCallStack):
  rawForkIO, called at .\Control\Concurrent\Async\Internal.hs:102:27 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  async, called at src\Cardano\BM\Backend\Aggregation.lhs:150:5 in lobemo-backend-aggregation-0.1.0.0-inplace:Cardano.BM.Backend.Aggregation"] is blocked on an STM operation
	thread   20 @ 00007ff4d9ed3908 ["CallStack (from HasCallStack):
  forkIO, called at .\Control\Concurrent\Async\Internal.hs:858:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  forkRepeat, called at .\Control\Concurrent\Async\Internal.hs:509:10 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  linkOnly, called at .\Control\Concurrent\Async\Internal.hs:492:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  link, called at src\Cardano\BM\Backend\Aggregation.lhs:120:9 in lobemo-backend-aggregation-0.1.0.0-inplace:Cardano.BM.Backend.Aggregation"] is blocked on an STM operation
	thread   21 @ 00007ff4da0f0a38 ["Date cacher (AutoUpdate)"] is blocked on an MVar @ 00007ff4da0ee5d8
	thread   22 @ 00007ff4d9ed23c8 ["CallStack (from HasCallStack):
  rawForkIO, called at .\Control\Concurrent\Async\Internal.hs:102:27 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  async, called at src\Cardano\BM\Backend\Monitoring.lhs:162:5 in lobemo-backend-monitoring-0.1.0.0-inplace:Cardano.BM.Backend.Monitoring"] is blocked on an STM operation
	thread   23 @ 00007ff4d9dfb000 ["CallStack (from HasCallStack):
  forkIO, called at .\Network\Wai\Handler\Warp\Windows.hs:21:12 in warp-3.4.7-inplace:Network.Wai.Handler.Warp.Windows"] is blocked on an external call
	thread   24 @ 00007ff4d9ed4d38 ["CallStack (from HasCallStack):
  forkIO, called at .\Control\Concurrent\Async\Internal.hs:858:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  forkRepeat, called at .\Control\Concurrent\Async\Internal.hs:509:10 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  linkOnly, called at .\Control\Concurrent\Async\Internal.hs:492:8 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  link, called at src\Cardano\BM\Backend\Monitoring.lhs:131:9 in lobemo-backend-monitoring-0.1.0.0-inplace:Cardano.BM.Backend.Monitoring"] is blocked on an STM operation
	thread   25 @ 00007ff4da0f3ef0 ["CallStack (from HasCallStack):
  rawForkIO, called at .\Control\Concurrent\Async\Internal.hs:102:27 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  async, called at src\Cardano\Node\Configuration\Logging.hs:288:13 in cardano-node-10.2-inplace:Cardano.Node.Configuration.Logging"] is blocked on atomic MVar read @ 00007ff4db506e30

Now it is time to get those threads named.

@jasagredo
Copy link
Contributor Author

@jutaro will take care of the threads in cardano-node, namely:

  • src\Cardano\Node\Tracing\Tracers\Resources.hs:20
  • src\Cardano\Node\Tracing\Tracers\Peer.hs:53
  • src/Cardano/Node/Configuration/Logging.hs:288

@jasagredo
Copy link
Contributor Author

I seem to be missing these:

	thread   52 @ 00007ff4da16ddd8 ["CallStack (from HasCallStack):
  forkIO, called at .\Control\Concurrent\Async\Internal.hs:663:16 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  concurrently', called at .\Control\Concurrent\Async\Internal.hs:615:19 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  race, called at src\Control\Monad\Class\MonadAsync.hs:405:27 in io-classes-1.5.0.0-inplace:Control.Monad.Class.MonadAsync
  race, called at src\Control\Monad\Class\MonadAsync.hs:322:42 in io-classes-1.5.0.0-inplace:Control.Monad.Class.MonadAsync"]

which come from an Alternative instance that starts a race inside.

and this:

	thread   63 @ 00007ff4dcad2198 ["CallStack (from HasCallStack):
  forkIO, called at libraries\ghc-internal\src\GHC\Internal\Conc\Sync.hs:316:23 in ghc-internal:GHC.Internal.Conc.Sync"] is blocked on an external call

which comes from forkIOWithUnmask

@jasagredo
Copy link
Contributor Author

The cardano node built from this branch #6089 demonstrates the result. If run with -Ds and built with async +debug-auto-label the only threads that are unnamed are Timeout threads from base (see) and a couple of threads related to race, namely:

        72a0: cap 0: thread 52 has label CallStack (from HasCallStack):
  forkIO, called at .\Control\Concurrent\Async\Internal.hs:663:16 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  concurrently', called at .\Control\Concurrent\Async\Internal.hs:615:19 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  race, called at src\Control\Monad\Class\MonadAsync.hs:412:27 in io-classes-1.5.0.0-inplace:Control.Monad.Class.MonadAsync
  race, called at src\Control\Monad\Class\MonadAsync.hs:325:42 in io-classes-1.5.0.0-inplace:Control.Monad.Class.MonadAsync



        8da4: cap 0: thread 56 has label CallStack (from HasCallStack):
  forkIO, called at .\Control\Concurrent\Async\Internal.hs:660:16 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  concurrently', called at .\Control\Concurrent\Async\Internal.hs:615:19 in async-2.2.5-inplace:Control.Concurrent.Async.Internal
  race, called at src\Control\Monad\Class\MonadAsync.hs:412:27 in io-classes-1.5.0.0-inplace:Control.Monad.Class.MonadAsync
  race, called at src\Ouroboros\Network\Server2.hs:181:44 in ouroboros-network-framework-0.15.0.0-inplace:Ouroboros.Network.Server2

(apart from the ones Jurgen will take care of)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant