-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Start LLMQContext early to let VerifyDB()
check ChainLock signatures in coinbase
#5752
Conversation
…atures in coinbase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK for squash merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
d8d1be9
CI looks good, pls re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK for squash merge
…atures in coinbase (dashpay#5752) ## Issue being fixed or feature implemented Now that we have ChainLock sigs in coinbase `VerifyDB()` have to process them. It works most of the time because usually we simply read contributions from quorum db https://github.com/dashpay/dash/blob/develop/src/llmq/quorums.cpp#L385. However, sometimes these contributions aren't available so we try to re-build them https://github.com/dashpay/dash/blob/develop/src/llmq/quorums.cpp#L388. But by the time we call `VerifyDB()` bls worker threads aren't started yet, so we keep pushing jobs into worker's queue but it can't do anything and it halts everything. backtrace: ``` * frame #0: 0x00007fdd85a2873d libc.so.6`syscall at syscall.S:38 frame #1: 0x0000555c41152921 dashd_testnet`std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) + 225 frame #2: 0x0000555c40e22bd2 dashd_testnet`CBLSWorker::BuildQuorumVerificationVector(Span<std::shared_ptr<std::vector<CBLSPublicKey, std::allocator<CBLSPublicKey> > > >, bool) at atomic_futex.h:102:36 frame #3: 0x0000555c40d35567 dashd_testnet`llmq::CQuorumManager::BuildQuorumContributions(std::unique_ptr<llmq::CFinalCommitment, std::default_delete<llmq::CFinalCommitment> > const&, std::shared_ptr<llmq::CQuorum> const&) const at quorums.cpp:419:65 frame #4: 0x0000555c40d3b9d1 dashd_testnet`llmq::CQuorumManager::BuildQuorumFromCommitment(Consensus::LLMQType, gsl::not_null<CBlockIndex const*>) const at quorums.cpp:388:37 frame dashpay#5: 0x0000555c40d3c415 dashd_testnet`llmq::CQuorumManager::GetQuorum(Consensus::LLMQType, gsl::not_null<CBlockIndex const*>) const at quorums.cpp:588:37 frame dashpay#6: 0x0000555c40d406a9 dashd_testnet`llmq::CQuorumManager::ScanQuorums(Consensus::LLMQType, CBlockIndex const*, unsigned long) const at quorums.cpp:545:64 frame dashpay#7: 0x0000555c40937629 dashd_testnet`llmq::CSigningManager::SelectQuorumForSigning(Consensus::LLMQParams const&, llmq::CQuorumManager const&, uint256 const&, int, int) at signing.cpp:1038:90 frame dashpay#8: 0x0000555c40937d34 dashd_testnet`llmq::CSigningManager::VerifyRecoveredSig(Consensus::LLMQType, llmq::CQuorumManager const&, int, uint256 const&, uint256 const&, CBLSSignature const&, int) at signing.cpp:1061:113 frame dashpay#9: 0x0000555c408e2d43 dashd_testnet`llmq::CChainLocksHandler::VerifyChainLock(llmq::CChainLockSig const&) const at chainlocks.cpp:559:53 frame dashpay#10: 0x0000555c40c8b09e dashd_testnet`CheckCbTxBestChainlock(CBlock const&, CBlockIndex const*, llmq::CChainLocksHandler const&, BlockValidationState&) at cbtx.cpp:368:47 frame dashpay#11: 0x0000555c40cf75db dashd_testnet`ProcessSpecialTxsInBlock(CBlock const&, CBlockIndex const*, CMNHFManager&, llmq::CQuorumBlockProcessor&, llmq::CChainLocksHandler const&, Consensus::Params const&, CCoinsViewCache const&, bool, bool, BlockValidationState&, std::optional<MNListUpdates>&) at specialtxman.cpp:202:60 frame dashpay#12: 0x0000555c40c00a47 dashd_testnet`CChainState::ConnectBlock(CBlock const&, BlockValidationState&, CBlockIndex*, CCoinsViewCache&, bool) at validation.cpp:2179:34 frame dashpay#13: 0x0000555c40c0e593 dashd_testnet`CVerifyDB::VerifyDB(CChainState&, CChainParams const&, CCoinsView&, CEvoDB&, int, int) at validation.cpp:4789:41 frame dashpay#14: 0x0000555c40851627 dashd_testnet`AppInitMain(std::variant<std::nullopt_t, std::reference_wrapper<NodeContext>, std::reference_wrapper<WalletContext>, std::reference_wrapper<CTxMemPool>, std::reference_wrapper<ChainstateManager>, std::reference_wrapper<CBlockPolicyEstimator>, std::reference_wrapper<LLMQContext> > const&, NodeContext&, interfaces::BlockAndHeaderTipInfo*) at init.cpp:2098:50 frame dashpay#15: 0x0000555c4082fe11 dashd_testnet`AppInit(int, char**) at bitcoind.cpp:145:54 frame dashpay#16: 0x0000555c40823c64 dashd_testnet`main at bitcoind.cpp:173:20 frame dashpay#17: 0x00007fdd85934083 libc.so.6`__libc_start_main(main=(dashd_testnet`main at bitcoind.cpp:160:1), argc=3, argv=0x00007ffcb8ca5b88, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007ffcb8ca5b78) at libc-start.c:308:16 frame dashpay#18: 0x0000555c4082f27e dashd_testnet`_start + 46 ``` Fixes dashpay#5741 ## What was done? Start LLMQContext early. Alternative solution could be moving bls worker Start/Stop into llmq context ctor/dtor. ## How Has This Been Tested? I had a node with that issue. This patch fixed it. ## Breaking Changes Not sure, hopefully none. ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_
…erministicMNManager`, `LLMQContext` member ctors, reduce use in `CJContext` 7d26061 refactor: move `CConnman`, `PeerManager` out of `CCoinJoinClientQueueManager` ctor (Kittywhiskers Van Gogh) 953ba96 refactor: move `CConnman` out of `CoinJoinWalletManager` ctor (Kittywhiskers Van Gogh) ac930a8 refactor: remove unused `CConnman` from `CDeterministicMNManager` ctor (Kittywhiskers Van Gogh) a14e604 refactor: remove `CConnman`, `PeerManager` from `LLMQContext` ctor (Kittywhiskers Van Gogh) d9e5cc7 refactor: move `PeerManager` out of `CInstantSendManager` ctor (Kittywhiskers Van Gogh) 82d1aed refactor: move `CConnman`, `PeerManager` out of `CSigSharesManager` ctor (Kittywhiskers Van Gogh) 7498a38 refactor: move `PeerManager` out of `CSigningManager` ctor (Kittywhiskers Van Gogh) 7ebc61e refactor: move `CConnman` out of `CQuorumManager` ctor (Kittywhiskers Van Gogh) c07b522 refactor: move `CConnman` out of `CDKGSession{,Handler,Manager}` ctor (Kittywhiskers Van Gogh) 01876c7 refactor: move `PeerManager` out of `CDKGSession{,Handler,Manager}` ctor (Kittywhiskers Van Gogh) cc0e771 refactor: start BLS thread in `LLMQContext` ctor, move `Start` downwards (Kittywhiskers Van Gogh) Pull request description: ## Additional Information * Depends on #6425 * Dependency for #6304 * In order to reduce the logic used in chainstate initialization (that will be split out of `init.cpp` in [bitcoin#23280](bitcoin#23280)), the spinning up of threads (as done in `LLMQContext::Start()`) needs to be moved down. * They were moved up in [dash#5752](#5752) as `CBLSWorker` is a part of `LLMQContext` and `CBLSWorker` is needed during chainstate verification. As suggested in dash#5752, an alternate fix to the one already merged in was to move `CBLSWorker` `Start()`/`Stop()` to the constructor, which is done here. * Another alternate fix is that we move it out of `LLMQContext` entirely and let it remain in `NodeContext` though this approach has not been taken. * The reason we cannot retain the status quo is because `bitcoin-chainstate` (the binary introduced in [bitcoin#24304](bitcoin#24304) that's built on the code split off in [bitcoin#23280](bitcoin#23280)) aims to be devoid of P2P logic and this is reflected in the source files used to build it ([source](https://github.com/bitcoin/bitcoin/pull/24304/files#diff-4cb884d03ebb901069e4ee5de5d02538c40dd9b39919c615d8eaa9d364bbbd77R794)). (Also, there's no `NodeContext`, [source](https://github.com/bitcoin/bitcoin/pull/24304/files#diff-4cb884d03ebb901069e4ee5de5d02538c40dd9b39919c615d8eaa9d364bbbd77R795-R798)) * This means need to separate P2P and validation components from Dash-specific logic in order for the split to work as expected. This PR is a step in that direction by moving P2P elements (`CConnman` and `PeerManager`) out of constructors. * As it stands, there are two sources for Dash-specific components to have access to P2P components, initialization (e.g. through `LLMQContext::Start()` or `PeerManagerImpl::ProcessMessage()`). ## Breaking Changes None expected. While changes are present in initialization order, consensus behaviour should remain unchanged. ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)** - [x] I have added or updated relevant unit/integration/functional/e2e tests **(note: N/A)** - [x] I have made corresponding changes to the documentation **(note: N/A)** - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: knst: utACK 7d26061 UdjinM6: utACK 7d26061 Tree-SHA512: 4f12cbda935cad3a186acb31fed513cea489b0d3a55aa80be8e1336d10cbe1579d6d3db862a78a167134650c9e97816732acaf0c85ab759f6555b1b6be99ec02
Issue being fixed or feature implemented
Now that we have ChainLock sigs in coinbase
VerifyDB()
have to process them. It works most of the time because usually we simply read contributions from quorum db https://github.com/dashpay/dash/blob/develop/src/llmq/quorums.cpp#L385. However, sometimes these contributions aren't available so we try to re-build them https://github.com/dashpay/dash/blob/develop/src/llmq/quorums.cpp#L388. But by the time we callVerifyDB()
bls worker threads aren't started yet, so we keep pushing jobs into worker's queue but it can't do anything and it halts everything.backtrace:
Fixes #5741
What was done?
Start LLMQContext early. Alternative solution could be moving bls worker Start/Stop into llmq context ctor/dtor.
How Has This Been Tested?
I had a node with that issue. This patch fixed it.
Breaking Changes
Not sure, hopefully none.
Checklist: