-
-
Notifications
You must be signed in to change notification settings - Fork 317
[ARCHIVED] Lodestar Planning & Standup Meetings 2023‐2024
The Lodestar team hosts planning/standup meetings weekly on Tuesdays at 2:00pm Universal Standard Time. These meetings allow the team to conduct release planning, prioritise tasks, sync on current issues, implementations and provide status updates on the Lodestar roadmap.
Note that these notes are transcribed and summarized by AI language models and may not accurately reflect the context discussed.
Agenda: https://github.com/ChainSafe/lodestar/discussions/7289
- December 17th will be the last standup of the year.
v1.24.0 Release
- Deployment of v1.24.0-rc.0 was done on December 6th.
- After the meeting, there had been some issues detected related to the gossip validation queues.
- The team continues to investigate what may have caused the regression and may delay the release to address the regression.
Devnet-5
- Alpha.10 specifications will be released and is assessed as minimal changes.
- It is likely that devnet-5 will be launched in mid to late December.
- Lodestar has minimal implementations remaining to complete alpha.10 changes including increasing blob target and max.
PeerDAS
- We need to merge unstable into PeerDAS.
- There was a TODO PR to make the column subnet stable and not to subscribe to our own subnet column.
- The reconstruction PR is mostly done and will be finished up this week by Matt.
- There needs to be a rebase on Electra also.
Lodestar Ethereum Consensus Client - Benchmarking Tool Discussion
- The PR aims to refactor the benchmarking tool, removing the dependency on Mocha.
- The core architecture remains unchanged, but the tool no longer requires Mocha as a peer dependency.
- The smallest components required for benchmarking are now embedded within the tool itself.
Vitest vs. Custom Benchmarking Tool
- Vitest has its own benchmarking feature, but it lacks the ability to persist benchmarking results.
- The team prefers maintaining their own tool for flexibility in making improvements and fixes.
Functionality and API
- The semantics of the benchmarking tool remain the same, with test files using
describe
blocks anditBench
functions. - There's a discussion about whether to keep the
describe
andit
blocks, which were legacy from Mocha. - The team considers removing the
it
block as part of a breaking change release[1].
Hooks and Setup
- Individual benchmarks support
before
andbeforeEach
hooks. - There's a debate about whether to include global
before
andbeforeEach
hooks for test suites. - Some existing performance tests use global
before
blocks for setup, such as caching states.
Peer Dependencies and Exports
- The team discusses whether to use Vitest as a peer dependency or re-export necessary functions.
- There's a preference to avoid peer dependencies to reduce unnecessary library imports.
Conclusions and Next Steps
- The team agrees to maintain their custom benchmarking tool for greater flexibility.
- They will likely remove the
it
block as part of a breaking change release. - There's ongoing discussion about whether to include global hooks (
before
,beforeEach
) in the benchmarking tool. - The team is considering re-exporting necessary functions to maintain compatibility with existing tests.
The final decision on these points is pending further discussion and consensus among team members.
Ethereum Coordination Issues
- Discussions scattered across multiple platforms:
- Telegram groups
- ACD (All Core Devs) calls
- Discord
- Ethereum Magicians
- Ethereum Research
Problems Identified
- Fragmentation: Difficult to track ongoing discussions and decisions
- Lack of Centralization: No single source for all information on EIPs or features
- Private vs. Public Discussions: Some occur in private Telegram groups, limiting transparency
- Signal-to-Noise Ratio: Platforms like Twitter criticized for low signal-to-noise due to bots and trolls
- Inconsistent Platform Usage: Confusion about which platform to use for specific purposes
Proposed Solutions
- Standardization: Develop guidelines for where different types of discussions should occur
- Discord as Primary Forum: Suggestion to use Discord as the main public discussion platform
- Limit Telegram Use: Reserve for private, security-related matters
-
Clarify Platform Roles:
- Ethereum Research: Data collection, theoretical ideas
- Ethereum Magicians: EIP debates and coordination
- Threaded Discussions: Implement system similar to Scroll's Discord channel for better organization
Additional Considerations
- Need for balance between openness and noise reduction
- Importance of preserving historical discussions
- Potential for bringing proposal to ACD call after async discussion and consensus
Cayman:
Lodestar with Bun: Debugging and Progress Approach
- Iterative debugging: Run, hit bug, patch, repeat
- Goal: Understand extent of problems
Current Status
- Lodestar running and syncing
- Some issues persist
Next Steps
- Deploy to feature group
- Analyze metrics
NAPI Modules
- Working:
- PubkeyIndexMap
- BLST
- Hashing (hashtree)
- Not working:
- LevelDB backend (reason unknown, possibly old NAPI version)
Debugging Information
- Stack trace available in GitHub issue: "Running Lodestar using bun"
- Stack trace not very helpful
Future Plans
- Gather more metrics by end of week
- Treat as experimental
- Iterate based on benchmarks and metrics
Agenda: https://github.com/ChainSafe/lodestar/discussions/7219
Non-Finality Devnet-0
The team discussed the non-finality Devnet that EthPandaOps was working on, addressing several issues discovered during testing:
-
Issues and Fixes
- Heap Increase Workaround: Over the weekend, the heap size was increased as a temporary workaround for an issue.
- Checkpoint State Cache: A bug was identified where nodes that stopped and restarted after a non-finality period kept adding state to the checkpoint state cache. Tuyen submitted a PR to fix this issue, successfully tested on a node in their infrastructure.
- State Transition Issue: Another issue involved a "too many promises" error during state transition when syncing nodes. This was linked to processing old blocks and multiple forks simultaneously.
-
Current Status
- The main remaining issue is the "too many promises" error, which is under investigation. Matt suggested bumping a number related to old block processing as a potential fix.
- Discussions on why shuffling occurs during syncing raised questions about state regeneration processes.
- https://github.com/ChainSafe/lodestar/pull/7251
-
Observations and Future Work
- The data from this DevNet is being used to assess whether to adjust blob targets and maximums. Pari has documented observations in a shared link for further analysis: https://notes.ethereum.org/@parithosh/nft-devnets
Mekong Testnet Issue
The team discussed a deposit bug identified in the Mekong testnet, where double deposits for the same validator within an epoch led to unexpected behavior. NC is investigating the issue and working on a hotfix.
-
Nature of the Bug
- The issue arises when two new deposits for the same validator occur within an epoch, pointing to a validator that hasn't existed yet.
- The intended behavior is for the first deposit to add the validator to the registry, and the second to act as a top-up. However, both deposits are mistakenly adding the validator twice due to a bug in the spec.
-
Technical Details
- In Electra, deposit processing is queued until the epoch transition instead of processing immediately at the slot level.
- This results in two outcomes:
- The validator registry is updated twice.
- The pubkey cache becomes inconsistent, with two indices pointing to the same pubkey but only one pubkey-to-index mapping.
-
Challenges and Fixes
- NC is developing a hotfix to specifically address this double deposit scenario, but acknowledges it as a temporary brute force solution.
- The issue wasn't caught in spec tests because they typically test single operations rather than scenarios involving multiple blocks within an epoch.
-
Spec Test Considerations
- There is a suggestion to enhance spec tests to cover scenarios with multiple operations across different blocks within an epoch, such as multiple deposits or withdrawals.
- Note that there are also multi-block spec tests and we should see how to potentially improve those.
- The team is considering potential implications for other operations like withdrawals and exits but requires more analysis.
PeerDAS Update
The team discussed the current status and challenges related to PeerDAS, focusing on synchronization issues and strategies for improving peer selection and data availability. The discussion also touched on the potential need to rebase the PeerDAS branch onto Electra.
-
Synchronization Challenges
- An issue was identified where Lodestar nodes were attempting to sync data from one fork while syncing blocks from another, leading to synchronization problems.
- This behavior is due to the current sync mechanism, which does not differentiate between peers from different forks. The system shuffles between peers until it finds one that can provide the required data.
-
Peer Selection and Scoring
- Gajinder highlighted the need to improve peer scoring strategies to avoid retrying failed peers and to better manage peer selection based on column subnets.
- Tuyen suggested using a mechanism similar to attestation subnets for balancing data columns, ensuring that peers providing unique or needed columns are prioritized.
-
DevNet and Testing
- Barnabas plans to launch a Devnet soon, which will help in further debugging and testing these issues.
- The team discussed the importance of maintaining a healthy balance of peers across subnet samples to ensure data availability and network stability.
-
N-Historical State Feature
- The N-historical state feature, which helps manage long non-finality periods, is enabled in unstable branches but not in PeerDAS. The team considered rebasing PeerDAS onto Electra to incorporate this feature, given its potential usefulness in long non-finality scenarios.
Merging vs Rebasing PeerDAS feature branch
The team discussed the decision to use merge instead of rebase for handling long-lived feature branches, particularly those involving multiple contributors. The conversation highlighted the trade-offs between the two approaches and the implications for project workflow.
Key Arguments for Merge
- Ease of Coordination: Merging is preferred for long-lived feature branches as it simplifies coordination among multiple contributors. It allows developers to continue their work without needing to constantly rebase their changes onto the latest branch state.
- Preservation of Work: Merging helps preserve in-progress work without requiring contributors to resolve conflicts that arise from rebasing.
- Team Consensus: The general consensus among team members, including Cayman and Nico, was in favor of merging due to its practicality in collaborative environments.
Key Arguments for Rebase
- Clean Commit History: Rebasing offers a cleaner commit history by applying changes sequentially, which can be beneficial when integrating a feature branch back into the main branch.
- Final Integration Complexity: While merging simplifies ongoing development, it can complicate the final integration into the main branch. Rebasing avoids cyclic dependencies and maintains a linear history.
Decision and Implementation
- Preference for Merge: The team decided to proceed with merging for long-lived feature branches, acknowledging that while it may complicate final integration, it facilitates easier collaboration during development.
- Handling Final Merge: When merging back into the main branch, the team will evaluate whether to squash commits or preserve specific changes based on their relevance and impact.
- Case-by-Case Basis: The decision to merge or rebase will be made on a case-by-case basis, considering factors such as the complexity of conflicts and the number of contributors involved.
Gas Fee Mechanism to EIP-7742
Gajinder provided an update on his work related to EIP-7742, focusing on the gas fee mechanism for blob base fees. The discussion centered around the proposed changes to dynamically adjust network blob gas limits and the implications for the Ethereum protocol.
-
EIP-7742 and Target Changes
- EIP-7742 is designed to scale target gas limits dynamically without requiring changes in the execution layer (EL).
- The current proposal does not include target changes, but Gajinder's amendment ensures that if target changes occur, fee computations remain correct and aligned with consensus layer (CL) directives.
-
Proposed Target Adjustments
- Paritosh has proposed a radical change to increase the target and max from 3/6 to 6/9. Gajinder suggests a more conservative approach of 4/6, where 4 is the target and 6 is the current max.
- The amendment aims to update fee mechanisms so that any target changes are seamlessly integrated without further EL modifications.
-
Consensus Layer Control
- Once the Engine API passes target and max values, the CL can independently manage fee mechanisms without concern for EL operations.
- This allows for flexible strategies in target adjustments, whether time-based, hard fork-based, or coordinated client updates.
-
Implementation Details
- The change involves including target and max values in block production and new payloads within the Engine API.
- The execution payload header may be extended with "target blobs per block" to reflect these changes, although it could also be determined by hard fork configurations.
-
Strategic Implications
- This mechanism provides the CL with greater autonomy in managing network parameters, potentially enhancing scalability and efficiency.
- The upcoming All Core Developers Consensus (ACDC) call will decide on these proposals, influencing how future updates are implemented across Ethereum clients.
v1.24 Planning
-
Release Timeline
- The target is to have v1.24 released before the Christmas break, with an RC planned for early December to ensure adequate testing time.
-
Reverted Features from v1.23
- The team discussed the possibility of reintroducing features that were reverted from v1.23, specifically focusing on js-libp2p 2.0 and async aggregate with randomness.
- The plan is to test the js-libp2p 2.0 update in two phases: first by applying the libp2p update without the IDONTWANT feature by downgrading gossipsub due to its potential network impact.
- The team will deploy this on a feature branch for testing.
-
Async Shuffling and Snappy WASM
- Async shuffling with randomness was previously reverted due to network thread congestion.
- There is a proposal to combine this with the Snappy WASM PR, which improves network thread performance, to see if both changes together balance out performance issues.
- Testing will be conducted on a feature branch to evaluate these combined changes.
-
Parallel Testing and Deployment
- Both the js-libp2p 2.0 and async shuffling with Snappy WASM will be tested in parallel over the next week.
- The aim is to gather data and ensure stability before including these updates in the final v1.24 release.
Consensus-Spec Alpha 9
-
Status
- Spec test is failing a lot of test cases. It will take more time than expected.
- Mekong testnet fixes take priority.
JavaScript Runtime Discussion
The team discussed the potential benefits and challenges of experimenting with different JavaScript runtimes, specifically Bun and Deno, to improve performance and efficiency in Lodestar. The conversation focused on benchmarking results, compatibility, and strategic considerations for future development.
-
Benchmarking and Performance
- Tuyen conducted benchmarks comparing Bun, Deno, and Node.js. Bun demonstrated superior efficiency in memory usage and speed, particularly in tasks like deserializing blocks and updating Merkle trees.
- Bun required significantly less memory for operations (e.g., 48 bytes for a Uint8 array) compared to Deno (200 bytes) and Node.js (240 bytes).
-
Strategic Considerations
- Nazar suggested not limiting the project to a single runtime due to the rapidly evolving nature of these technologies. Both Bun and Deno should be supported to allow flexibility as performance characteristics change over time.
- The team discussed publishing packages on JSR.io, a modern TypeScript package registry developed by Bun, which could facilitate multi-runtime support.
-
Compatibility and Implementation
- Initial tests showed that SSZ packages run well on Bun, indicating compatibility with existing codebases.
- There was a discussion about the compatibility of Node-specific APIs like buffer classes. Bun appears to be 100% API compatible with Node.js, easing the transition.
-
Future Testing and Development
- Nazar is working on updating Dapplion's benchmark packages to run tests across different runtimes, providing real-time data for comparison.
- The team acknowledged that while Bun currently leads in performance, ongoing testing is essential to adapt to future changes in runtime capabilities.
-
JavaScript Engine Differences
- Cayman emphasized that the differences in JavaScript engines (V8 for Node/Deno vs. Chakra for Bun) fundamentally affect performance and memory characteristics. This distinction supports prioritizing Bun for its unique advantages.
-
Continue Benchmarking: Update and run comprehensive benchmarks across Bun, Deno, and Node.js to gather detailed performance data.
-
Package Registry: Publish Lodestar packages on JSR.io to facilitate multi-runtime support and streamline package management.
-
Monitor Runtime Developments: Keep track of updates in JavaScript runtimes to reassess their performance characteristics regularly.
-
Evaluate Engine Compatibility: Ensure that critical APIs used in Lodestar are compatible across all considered runtimes to prevent integration issues.
-
Strategize Runtime Adoption: Develop a strategic plan for potentially adopting Bun as a primary runtime while maintaining flexibility to switch as needed based on future developments.
Concerns with Native Dependencies
-
The primary concern with transitioning to new JavaScript runtimes like Bun and Deno is the handling of native dependencies. These dependencies, particularly those related to cryptographic operations and system resources (e.g., TCP/UDP sockets), may not be fully supported or optimized in WebAssembly (WASM).
-
Performance Limitations: While WASM offers a universal compilation target, it may not match the performance of native extensions for specific tasks that require low-level system access or specialized instructions, such as cryptography.
-
Compatibility Challenges: Current limitations in Bun and Deno regarding NAPI support pose challenges for running Lodestar, which relies on native modules. The roadmap for these runtimes includes NAPI support, but it is not yet fully implemented.
-
FFI (Foreign Function Interface): Both Bun and Deno have their own methods for FFI, which may require additional work to wrap existing NAPI-based native modules for compatibility.
Experiments and Next Steps
-
Benchmarking with FFI:
- The team plans to experiment with using FFI in Bun for specific functions like
HashTree
andHashTreeInto
. This will help assess the feasibility and performance of using FFI as an alternative to NAPI.
- The team plans to experiment with using FFI in Bun for specific functions like
-
Dual Runtime Support:
- Given the rapidly evolving nature of JavaScript runtimes, the team intends to support both Bun and Deno to maintain flexibility and adaptability as performance characteristics change over time.
-
JSR.io Registry:
- Regardless of the chosen runtime, the team plans to publish Lodestar packages on JSR.io, a modern TypeScript package registry developed by Bun, to streamline package management across different environments.
-
Performance Monitoring:
- Continuous benchmarking and monitoring of runtime performance will be essential to determine the most efficient runtime for Lodestar's needs. This includes updating benchmark packages to run tests across different runtimes.
-
Community Engagement:
- Engaging with the broader JavaScript community to stay informed about developments in runtime capabilities and best practices for optimizing performance across different engines.
Agenda: https://github.com/ChainSafe/lodestar/discussions/7136
- The last standup before Devcon is scheduled for October 22 (next week) before synchronous meetings become hard to coordinate due to travel.
v1.23 Planning
- The v1.23 release is tentatively planned for early next week, pending further testing this week.
- Initial issues were identified with the rc.0 release, particularly related to libp2p, which need addressing before finalizing the release.
- A specific issue was noted where libp2p hangs during startup on Nico's German server but not on other test servers or environments.
- The problem seems to occur when opening a TCP socket, preventing the
libp2p.start
function from completing. - The team discussed investigating network-related causes and potential differences in server configurations that might contribute to the issue.
- More detailed information about the network setup on the affected server is needed to diagnose the problem.
- An issue may be opened with libp2p for further insights, and discussions will continue in a call after the standup.
- There is caution around deploying v1.23 without understanding the root cause of the libp2p issue, as it could potentially affect other users.
- The team will continue to monitor progress asynchronously and make decisions based on further findings.
Checkup on Additional features
- A fix for BLST-related illegal instructions has been completed and could be included in the release if time permits.
- The team agreed to focus on resolving the libp2p issue first before expanding the scope of changes in this release.
- The
getBlobsv1
feature is still under testing, primarily with Nethermind. - Current tests have not returned any non-null responses, indicating potential issues or bugs that need to be addressed.
- Gajinder plans to coordinate with Marek to identify and resolve these issues and may test with ChainSafe's Geth nodes if necessary.
-
Blobs Availability and Network Performance
- Concerns were raised about the availability of blobs before they can be imported via gossip, which could affect network performance and latency.
- The team discussed the importance of testing this feature on solo staker setups to understand its behavior and impact better.
Standup Time Change Post-Devcon
- The team agreed that standups after the Devcon conference will be pushed by one hour to start at 15:00 UTC to account for timezone differences and daylight savings ending.
SSZ Release and New Features
- Nazar confirmed that everything is ready for the SSZ release, with automatic release processes in place to facilitate the update.
- All necessary PRs have been merged, including a new API for upgrading trees to new types, which is considered essential for the release.
- A draft feature for batch hash tree root is included but marked as "do not merge" for further testing and refinement.
- Cayman plans to review the new API for upgrading trees and proceed with merging and releasing once confirmed.
Electra: Separate Type for Unaggregated network attestations
- The pull request introduces a
SingleAttestation
type to address complexities and inefficiencies in the current on-chain attestation format introduced by EIP-7549. - By placing validator and committee indices directly in the attestation type, the proposal aims to simplify processing, reduce hash computations, and increase security.
- The change is intended to affect only network encoding, minimizing implementation impact on clients.
-
Implementation Challenges
- Nico expressed a desire to adopt this change but noted delays due to other priorities and difficulties in implementation.
- There is a need for further review and consideration of suggestions from Tuyen and others.
-
Community and Client Feedback
- Some client teams, including Teku and Lighthouse, are already implementing the changes, though the PRs are extensive.
- There are mixed opinions about the benefits versus the complexity introduced by this change. While it offers a cleaner spec perspective, it adds chaos from an implementation standpoint.
-
Consensus and Next Steps
- The team generally favors the single attestation change but is moderately in favor of additional changes related to aggregated attestations.
NC:
-
Alpha-8 Spec and DevNet 4
- Consensus was reached on the engine API regarding execution requests, unblocking further development.
- The Alpha-8 or DevNet 4 specification is now frozen, with all changes implemented and consolidated into a single branch and PR draft.
- The implementation is passing all spec tests but experiencing performance degradation in block processing, failing benchmark CI tests.
-
Performance Investigation
- NC is investigating the source of performance degradation to address store performance issues before finalizing the draft.
-
Client Testing and Readiness
- Lucas has begun testing with an EL client, with EthereumJS being the only EL client currently ready for DevNet 4.
- NC plans to test with EthereumJS to ensure compatibility and functionality.
-
Single Attestation
- Progress is being made on single attestation, with validation and optimization parts left to implement. Completion is expected in three to four days.
Tuyen:
-
Fork Choice Improvements
- Implemented enhancements in fork choice by partitioning queued attestations into slots. This allows for efficient removal by slot, improving the handling of queued attestations.
-
Attestation Metrics
- Addressed inconsistencies in queue attestation metrics, where discrepancies were noted in the expected number of attestations per slot across different nodes.
- The fix involves computing the queue attestation numbers from the previous slot to ensure consistency in metrics reporting.
-
SSZ Batch Hash
- Progress is being made on the SSZ batch hash with a draft PR nearly complete. Testing is ongoing on the Lodestar side.
- Encountered challenges with outdated PRs and manual cherry-picking, leading to spec test failures that require further investigation.
- Despite these challenges, the main branch testing the batch hash has remained stable for over two months.
Nazar:
-
Lint Setup Migration
- The team has successfully migrated to a new lint setup using different rules from Biome, aimed at improving efficiency.
- Nazar encouraged team members to reach out if they encounter any issues with the new setup.
-
Differential Hierarchical Backup
- Nazar is revisiting long-standing issues related to differential hierarchical backup of states.
- A new PR is being prepared to enable this feature under a feature flag, allowing it to be tested as an experimental feature in real environments without being set as default.
-
Testing and Edge Cases
- Testing on the feat4 node has shown that upgrading Xdelta3 to their own package has resolved previous errors.
- An edge case involving checkpoint sync was identified, where issues arise if a node is stopped and restarted during a checkpoint sync that is behind a snapshot slot. Nazar is working on resolving this issue.
Gajinder:
-
getBlobsV1 Issue
- Gajinder encountered issues with not receiving blobs and raised this with the Nethermind team for further investigation.
- Progress on other fronts, including PeerDAS, has been limited due to these challenges.
-
PeerDAS Synchronization
- A key issue identified is Lodestar attempting to range fetch from two different forks on the same batch, causing synchronization problems.
- Some Lodestar nodes are synced to the major fork, while others are not due to this issue.
- The discussion in the PeerDAS call focused on using a super node-based DevNet to simplify debugging and address data availability issues.
-
DevNet Strategy
- The current strategy involves ensuring all clients are working fine on a super node-based DevNet before introducing full nodes.
- Gajinder plans to resolve the issue preventing some nodes from syncing and aims to stabilize Lodestar's presence on DevNet.
-
Super Node Functionality
- There was a discussion about whether reconstructing columns is necessary for becoming a super node.
- While not required for super node functionality, reconstructing columns can improve network stability and performance by reducing the need to fetch all columns.
Nico:
-
v1.23 Release Preparations
- Finalized several pull requests (PRs) tagged for the v1.23 release, focusing on proposal duties for historical epochs.
-
DevNet 4 and Specification Reviews
- Conducted reviews related to DevNet 4 and specifications, ensuring that both the beacon and builder specs are finalized for the current phase.
-
Grafana Metrics and HTTP Timeout
- Addressed outdated panels in builder metrics on Grafana, updating them to better monitor the newly added HTTP timeout of one second for header requests as per the specification.
-
libP2P Issue Investigation
- Actively investigating issues related to the recent release candidate (RC) and collaborating with Cayman to resolve a libP2P issue. Nico is open to testing any potential solutions or suggestions.
Matt:
-
Segfaults and Tokio Async Issue
- Matt addressed segfaults related to Tokio async in shuffling. The issue was identified and fixed by Brooklyn, but now the system hangs due to a deeper issue in the Tokio runtime.
- Matt has documented the issue further but finds it challenging to debug due to its complexity.
-
BLST and API Issue Resolution
- Resolved an issue related to Afri by creating a portable build, ensuring compatibility across different client builds.
- Performance tests showed minimal difference between portable and compatible builds, allowing for deployment on production for further observation.
-
Bits of Randomness Discussion
- A question arose regarding the use of bits of randomness in cryptographic operations. Matt proposed a fix that allows toggling between 64-bit and 128-bit randomness with minimal performance impact.
- The team debated the necessity of increasing randomness to 128 bits, with concerns about whether it truly enhances security or is just perceived as better due to larger numbers.
- Matt plans to rerun tests to confirm performance metrics before finalizing changes.
-
Performance Metrics
- Aggregating signatures showed a significant performance hit (5%), while other operations had minimal impact (1-3%).
- The discussion emphasized the need for clear mathematical justification for changes in randomness levels, rather than relying on assumptions about larger bit sizes being inherently more secure.
Cayman:
-
Issue Management
- Cayman and Phil reviewed and closed numerous stale and unrelated issues across ecosystem repositories and historical Lodestar issues.
- The goal was to streamline and organize remaining issues to better inform prioritization and roadmap discussions during their upcoming offsite.
-
Library Improvements
- Investigated issues within the Yamux library and worked on getting the CI passing for the QUIC library to facilitate a release.
- Recent merges into NAPI RS now support generating ESM directly, simplifying the build process.
-
QUIC Library Progress
- A fix in NAPI RS also addressed a SEGFAULT issue in the QUIC implementation, though it remains unstable and can crash computers during certain operations.
- Plans to deploy the updated QUIC on a feature branch to further test stability and functionality.
- Aiming to finalize a version of QUIC before traveling to Devcon, as it is highly anticipated by the Libp2p community.
Agenda: https://github.com/ChainSafe/lodestar/discussions/7096
JS-libp2p offsite Cayman attended an offsite with the js-libp2p team to discuss the roadmap and future developments. The discussions focused on browser connectivity improvements, transport protocols, and potential impacts on client development.
-
Browser Connectivity
- Significant progress has been made towards making browser connectivity feasible, with discussions centered on three web-compatible transports: WebSockets, WebRTC, and WebTransport.
- WebSockets: Reliable but require server configuration with SSL/TLS certificates for browser dialability.
- WebRTC: Allows browser-to-browser connections via circuit relays but is limited by its availability only on the main thread and slower performance compared to WebSockets.
- WebTransport: Built on QUIC and aims to improve over WebSockets by allowing self-signed certificates. However, a Chrome bug limits its usability due to a connection cap.
-
Future Developments
- Efforts are underway to improve WebRTC using QUIC and to address the Chrome bug affecting WebTransport.
- The LibP2P Foundation is working with Let's Encrypt to create a workflow for auto-configuring TLS certificates, enhancing browser dialability.
-
Implications for Lodestar
- While current developments are not directly related to consensus, they could impact client-side implementations in the future.
- Discussions included potentially skipping Yamux in favor of QUIC for improved performance.
-
Metrics and Observations
- The team discussed adding histograms and summaries to available metric types in libp2p, focusing on connection upgrade times and other performance metrics.
Persisting Help Information
-
Discord Forum Channel
- The idea of using a Discord forum channel was proposed to organize help discussions better than the current chat format, where information can easily get lost.
- A forum format allows for easier recall of solutions, as discussions are categorized and persistent.
-
GitHub Discussions
- GitHub Discussions was suggested as an alternative for persisting questions and answers, as it is web-searchable and can appear in Google search results.
- This platform could serve as an interactive wiki, where resolved issues from Discord can be documented for broader accessibility.
-
Workflow Suggestions
- When a problem is solved on Discord, the team can create a new post in GitHub Discussions with the question and answer to persist the solution.
- Alternatively, a link to the relevant Discord chat can be included in a GitHub issue with a brief summary of the resolution.
-
Considerations
- The team acknowledged that users might be more accustomed to the informal nature of Discord chats but emphasized the importance of documenting valuable information.
- Using GitHub Discussions or issues provides a structured way to store information that is easily accessible and searchable.
IPv6 Status in Lodestar
-
Current Setup and Proposed Change
- Currently, the system defaults to listening on IPv4 unless a specific listen address is configured.
- The proposal is to change the default setting to listen on both IPv4 and IPv6, which would require code adjustments at the CLI layer.
-
Considerations and Concerns
- There are concerns about performance impacts and potential attack vectors, such as DDoS threats, when enabling IPv6 by default.
- Testing is recommended before implementing this change to assess any performance issues or security vulnerabilities.
-
Community Input
- The suggestion came from a community member advocating for wider adoption of IPv6.
- The team acknowledged the importance of community input but emphasized the need for thorough testing.
- You have to add IPv6 in your ENR so you can be dialed.
-
Technical Details
- If implemented, the system would listen on both protocols by default, using addresses like 0.0.0.0 for IPv4.
- DiscV5 would handle public IP discovery to populate the ENR (Ethereum Node Record), allowing nodes to be dialed over both protocols.
Offload Shuffling Computation to Worker Threads
-
Current Status and Observations
- The PR related to this change has been merged, but initial observations indicate that while epoch transition times have decreased, the
prepareNextEpoch
function has not shown improvement. - The team noted that benefits from this change would only be realized once the computation is offloaded to a separate thread.
- The PR related to this change has been merged, but initial observations indicate that while epoch transition times have decreased, the
-
Implementation Options
- Two main options were considered: using NAPI-RS or implementing the offloading with a custom worker thread.
- Tuyen suggested copying Lighthouse's shuffling logic and wrapping it in NAPI-RS for asynchronous execution.
-
Timeline and Release Planning
- The team discussed targeting this improvement for version 1.23, with a general two-week planning period.
- Matt expressed confidence that wrapping the shuffling logic in NAPI-RS could be accomplished quickly, potentially within a week.
-
Next Steps
- Matt plans to work on the implementation and aims to have a preliminary version ready for testing by the end of the week.
- The team agreed to monitor progress and adjust timelines as needed, with flexibility to push the milestone if necessary.
Enabling N-Historical State Feature on Production
The team discussed the proposal to enable the n-historical state feature by default in Lodestar, based on the analysis and testing outcomes from the related pull request (PR #7104). This feature aims to improve state management during network operations.
-
Current Testing and Observations
- Tuyen highlighted that during testing, the CIP node did not require any state reloading. However, the Holesky production node experienced a state reload due to receiving an old orphan block.
- The reload was triggered by a gossip block with a parent 32 slots ago, prompting an investigation into parameter settings.
-
Parameter Adjustments
- The default parameter for
maxSkipSlots
was initially set at 32, focusing on minimizing memory usage. - Tuyen proposed increasing the
maxBlockStates
to 64 to align with other parameters likemax_skip_slot
andlatest_permissible_slot
, ensuring no state reloads are necessary under normal conditions.
- The default parameter for
-
Feature Stability and Confidence
- The feature has been tested for three months, providing confidence in its stability.
- The team discussed whether to enable the feature by default in the upcoming release, given its robustness and alignment with Lodestar's goals to handle long periods of non-finality.
-
Implementation Plan
- The feature is currently behind a feature flag and can be enabled by turning on this flag.
- Tuyen suggested reviewing the parameter changes in the PR to validate conclusions before enabling the feature.
-
Team Consensus
- There was general agreement on enabling the feature by default, as it aligns with the goal of improving Lodestar's resilience during network splits or non-finality periods.
- The PR is tagged for milestone 1.23, allowing time for further testing and adjustments if needed.
Agenda: https://github.com/ChainSafe/lodestar/discussions/7087
-
Agenda and Release Plans
- The team is planning to release version 1.22 of Lodestar. It has been in beta for testing, with no significant issues reported.
- Improvements noted include better garbage collection and enhanced epoch transition performance, with a 200-millisecond improvement over version 1.21.
- The Electra branch merge into unstable did not negatively impact pre-Electra functionality, indicating readiness for release.
-
Devcon Impact Booth Success and Objectives
- The impact booth at DevCon in Bogota was successful, leading to the hiring of two protocol engineers.
- The goal is to replicate this success in Bangkok by showcasing Lodestar and engaging with attendees.
-
Planning and Preparation
- A television will be used to display metrics and dashboards, potentially highlighting Lodestar's competitiveness against other clients? To be further discussed.
- Marketing collateral such as banners, t-shirts, USB sticks with Lodestar binaries, and other unique swag items were considered to attract visitors.
-
Logistics and Staffing
- Existing marketing materials from previous events will be utilized, with local options available for printing additional items if needed.
- Staffing will involve both Lodestar and ChainSafe team members, ensuring a mix of technical expertise and broader company representation.
-
Engagement and Networking
- The booth will serve as a hub for networking and collaboration, with opportunities to connect with other developers and projects.
- Plans include participating in protocol workshops prior to DevCon to engage with core developers and the Ethereum Foundation.
-
Feedback and Ideas
- The team brainstormed various swag ideas, emphasizing the importance of unique and memorable items that attendees will appreciate.
- Suggestions included practical items like AirTags or socks branded with Lodestar, balancing creativity with budgetary constraints.
Tuyen:
-
Memory Optimization
- Tuyen identified a method to reuse balance history from the pre-finalized state for the current epoch, which reduces the need to create new branches for balances at every epoch transition.
- This approach, while initially unconventional due to its mutative nature, addresses significant memory usage issues during epoch transitions.
-
Performance Improvements
- The optimizations have resulted in reducing the time for preparing the next epoch to approximately 1.3 to 1.4 seconds.
- Tuyen aims to further reduce this time to 500-600 milliseconds after merging Matthew's PR and rebasing his branch.
-
Additional Work
- Tuyen is continuing work on applying new APIs,
toHex
andfromHex
, as part of ongoing improvements. - A PR has been submitted to add metrics for state regeneration, which was included in the last release but lacked Grafana metrics.
- Tuyen is continuing work on applying new APIs,
Matt:
-
BLST Repository Issue
- Matt identified and resolved an issue in the Superanational's BLST repo related to platform identification. The problem was due to assembly code being sent incorrectly for certain CPUs, specifically older Celeron models.
-
Pull Requests and Merges
- Several PRs are ready for merging following the release of version 1.22. Some PRs have already been merged into the PeerDAS branch by Gajinder.
- Matt is working on reconstructing the matrix once the threshold for having columns is met, which may require system refactoring of the PeerDAS branch.
-
Testing Challenges
- Unit tests for blinding and unblinding are passing, but simulation tests continue to fail.
- Matt plans to write an end-to-end test to diagnose why simulation tests are not passing and will revisit this after focusing on PeerDAS.
-
Future Goals
- Continue making progress on reconstructing the matrix for PeerDAS.
- Collaborate with Cayman to investigate potential related issues encountered by Afri and others.
NC:
-
DevNet 3 Issue Investigation
- Lodestar was stuck in DevNet 3 due to issues with withdrawal requests, specifically related to large amounts rather than validator indices.
- NC is working on adding test cases to the spec tests to cover scenarios with larger withdrawal amounts.
-
Electra Light Client Spec Tests
- NC enabled spec tests for the Electra light client, which revealed some bugs. These have been fixed, and all tests are now passing.
-
ePBS Implementation
- Progress has been made on the ePBS front, particularly in the PTC (Proof-of-Time Commitment) aspects.
- NC has completed the client-side PTC service and duty service and calculated PTC validator indices on the Beacon Node side.
- The next step involves handling incoming PTC attestations.
-
Upcoming Focus
- NC plans to shift focus back to upcoming DevNets due to significant spec changes that have been merged into the consensus spec repo. These changes need to be implemented promptly to stay on schedule.
Nazar:
-
Deployment Script Issue
- Nazar identified a problem with the Ansible deployment scripts, which preserve chain database data even when switching networks during deployment.
- This led to debugging challenges when switching from mainnet to the Holesky testnet without resetting the database, causing slot number mismatches.
- Nazar suggested updating the scripts to reset the database when changing networks to prevent similar issues in the future.
-
Differential Library Limitation
- The differential library used for generating binary differences faced issues with state sizes over 250 MB due to fixed-length data limitations in the Node.js binding.
- Nazar explored alternatives but found none that support streaming data, which is necessary for handling larger state sizes on mainnet.
-
Custom Binding Development
- To address the limitation, Nazar began writing a custom binding for the C library (Xdelta3) to support streaming data.
- He plans to share this implementation with Matthew for review once it's ready.
-
Discussion on Binding Libraries
- Matthew and Nazar discussed using NAPI-RS versus traditional NAPI for creating Node.js bindings.
- Matthew expressed caution about using NAPI-RS due to potential unpredictability and suggested sticking with NAPI for simplicity and reliability.
- Nazar considered using the NAN library but acknowledged its deprecation and agreed to explore replacing it with C++ functions using NAPI.
Nico:
-
Pull Requests and Testing
- Nico spent time reviewing PRs and engaging in discussions to ensure smooth progress.
- He conducted testing on the latest release candidate, noting improvements and ensuring stability.
-
Electra Branch Finalization
- The primary focus for the week is finalizing Electra-related updates on the Beacon API.
- This includes moving requests out of the execution payload and aligning with updates in the builder spec, particularly concerning blinded blocks.
- Nico plans to verify Lodestar's implementation to ensure compliance with these changes.
-
Remote Signing and Spec Extensions
- Work is ongoing for implementing remote signing capabilities, although it has not yet been completed.
- A new PR has emerged with extensions to the specification, which Nico plans to review.
Agenda: https://github.com/ChainSafe/lodestar/discussions/7068
Cayman:
-
JSLibP2P Maintainers Retreat
- Cayman is attending the js-libp2p maintainers retreat and invited team members to share any thoughts or topics they would like him to discuss.
- The release of js-libp2p 2.0 is imminent, with no major new features but some breaking changes, such as decoupling peer IDs from private keys.
-
QUIC Binding Development
- Cayman has been focused on debugging the QUIC binding, which is now fully functional but has encountered issues such as segmentation faults.
- The problem appears to be related to a bug in the NAPI-RS code generation, particularly with high-level types like Uint8 arrays and buffers.
- He plans to investigate further and explore different techniques to resolve these issues.
-
Current Challenges and Next Steps
- While gossip messages are being received correctly, there is an issue with request-response functionality, suggesting a bug in Lodestar's code rather than the QUIC implementation.
- Cayman intends to continue debugging this week to address these issues before the retreat.
Matt:
-
Branch Management and PRs
- Matt has successfully reshuffled a branch and is awaiting final approval.
- He completed three pull requests (PRs) for the PeerDAS branch, including two rebases and one validation of columns.
- These PRs are ready for Gajinder's review, and Matt is open to responding to any additional feedback.
-
Protocol Knowledge Enhancement
- Matt spent time reading through various annotated Ethereum specifications to fill gaps in his protocol knowledge.
- He found resources like Ben's annotated spec, Vitalik's annotated spec, and others valuable for understanding protocol intricacies.
-
Fixture Creation for Testing
- Developed functions to create fixtures for all block types to test blinding processes.
- Plans to use these fixtures in testing and may include them in a separate PR.
-
Collaboration and Debugging
- Matt expressed interest in collaborating with Cayman to investigate potential related issues encountered by Afri.
- He is curious about whether these issues are interconnected and aims to narrow down the problems through joint debugging efforts.
Tuyen:
-
Snapshot API Implementation
- The main focus is on implementing the
toSnapshot
andfromSnapshot
APIs to facilitate efficient data snapshots. - The implementation is straightforward in the persistent-merkle tree but presents challenges in SSZ due to the introduction of a new type that represents a partial tree rather than a full tree.
- Tuyen has created new methods, such as
DUPartialTree
, and has made adjustments where certain methods throw errors due to the unique nature of this type.
- The main focus is on implementing the
-
Review and Feedback
- The pull request for this implementation has been submitted and reviewed by Gajinder. Tuyen is seeking further review from Cayman to refine the approach.
- The goal is to finalize and release the snapshot API before proceeding with additional tasks like batch hashing.
-
Next Steps
- Tuyen plans to continue working on batch hashing, focusing on
HashTreeRoot
, while awaiting feedback on the current PR. - The release of the snapshot API is prioritized to ensure a smooth transition to subsequent tasks.
- Tuyen plans to continue working on batch hashing, focusing on
Gajinder:
-
PeerDAS Development
- Gajinder has been actively working on fixing issues and bugs in PeerDAS to improve its functionality.
- The availability of other client builds has allowed for more extensive debugging and testing against different setups.
-
EPBS State Transition
- Initial work has been done on EPBS to get the state transition working, but it remains a work in progress.
- The focus shifted back to PeerDAS due to the availability of other clients for testing.
-
Review and Collaboration
- Gajinder reviewed Matt's pull requests related to PeerDAS and is in the process of merging them.
- He also conducted a peer review of Tuyen's snapshot API work, praising its generic implementation that enhances Lodestar's capabilities.
NC:
-
Consensus Specification Review
- NC spent considerable time reviewing consensus spec changes, which are promising for DevNet4.
- The changes discussed in the ACDC call appear to have minimal resistance, indicating a smooth path for adoption.
-
Major Spec Changes
- Some of the spec changes are substantial, such as finalizing deposits after processing and moving execution requests from the beacon block body to the execution payload.
- These changes are anticipated to be significant for DevNet4, although they will not impact DevNet3 as its spec is already frozen.
-
Implementation Efforts
- NC has begun implementing smaller spec changes to stay on schedule for DevNet4.
- Work on larger spec changes is also underway, preparing for their integration into the development network.
Agenda: https://github.com/ChainSafe/lodestar/discussions/7054
Retrospective of electra-fork branch
-
Large Branch Rebasing Challenges
- Rebasing was initially chosen for its ability to maintain a clean and linear commit history.
- However, rebasing can disrupt collaboration as it requires force pushes, altering the branch history and affecting in-progress PRs.
- The reliance on a single person to manage rebases can create bottlenecks and complicate collaboration.
-
Merging as an Alternative
- Merging offers a less pristine commit history but provides a consistent view of the branch for all collaborators.
- It allows GitHub to serve as a reliable source of truth, reducing confusion about the branch's status.
- Merging avoids the destructive history updates associated with rebasing, making it easier to track changes and debug issues.
-
Tradeoffs and Considerations
- The team discussed the tradeoff between maintaining a clean history and facilitating easier collaboration.
- Merging can result in a less organized commit history, but this can be mitigated by performing an interactive rebase at the end to clean up the history before the final merge.
- For large, long-running branches with contributions from multiple developers, merging may be more practical.
-
Developer Preferences and Tools
- Some developers prefer rebasing for its linear history, which simplifies command-line operations like
git diff
. - Others find merging preferable for public branches due to its non-destructive nature, especially when multiple developers are involved.
- The discussion included suggestions for using tools like VS Code for visual diffs to aid in debugging and understanding changes.
- Some developers prefer rebasing for its linear history, which simplifies command-line operations like
-
Action Items and Experimentation
- The team considered experimenting with merging on the PeerDAS branch to evaluate its effectiveness in improving collaboration.
- There was a consensus to try merging for the next large feature branch and assess the impact on collaboration and workflow.
Monthly PR Cleanup/Updates
-
- The Snappy WASM implementation is functional but has shown mixed performance results.
- It is more performant in terms of WebAssembly (WASM) execution, but there are concerns about increased node overload.
- Initial metrics indicated improved performance for nodes with fewer validators, while nodes with more validators or those subscribed to all subnets experienced performance degradation.
- The plan is to merge the latest unstable branch and redeploy to gather more specific metrics.
- The goal is to better understand the performance impact and identify any issues that need addressing.
- The team aims to analyze these metrics to determine the feasibility of incorporating Snappy WASM into the main branch.
- The core motivation for the PR is to leverage SnappyWASM's speed, as Snappy compression is a frequent operation in Lodestar.
- Optimizing this process is expected to yield significant performance benefits, particularly in scenarios with high compression demands.
-
make multiple api errors spec compliant
- Matthew reached out to see if the external contributor would like to complete this or if we should push it to the finish line.
-
- Determined to be low priority. It doesn't hurt anything to still have this.
- External contributor didn't complete the PR. It is now stale and will be closed.
-
- The PR has been successfully rebased on top of the Electra branch, and all tests are passing.
- It is ready for a final review to ensure the rebase was smooth and that no issues have arisen from the integration with Electra.
- Matthew plans to deploy the updated branch to a feature server to gather metrics and further validate its performance.
- Previous deployments showed mixed results, with smaller nodes performing better and larger nodes experiencing some performance issues. These will be re-evaluated with the latest changes.
-
Use binary diff to persist finalized states
- Still in it's initial state.
- More TO-DOs to be completed before review.
-
- Will try new suggestion with the pubkey index map and deploy another branch for more investigation.
-
Srchive state using BufferPool if provided is ready for review.
-
Improve performance of getExpectedWithdrawals ready for review. Follow up on spec naming considerations.
-
Single state tree at start up is ready for review. Would like to include this with v1.22.0 release to save time and memory during startup.
-
toPubkeyHex ready for review.
-
Add electra support for light-client is rebased with tests fixed and ready for review
-
CKZG PRs
- Update ckzg to final DAS version: This PR involves publishing the PeerDAS KZG implementation now that it is available in the main repository. It also involves removing redundant code related to the trusted setup, as this is now included in the library itself.
- Integrate peerdas-kzg: Focuses on implementing the Rust KZG for Kev, which includes adding new functions and addressing a stub function discussed in previous threads. This work is aimed at completing the necessary functionality for Gajinder.
-
PeerDAS Integration
- The integration of CKZG into the PeerDAS branch is a significant step forward, as it enhances the data availability sampling process, which is crucial for Ethereum's consensus layer.
v1.22.0 Release Prep
- PRs added include: 7065, 7064, 7063, 7056
- Nice to haves: 7042, 7033
- Waiting on more feedback for 7022
- More to follow on: 6483
Cayman:
- Cayman has been focused on developing the QUIC binding and reports significant progress, with most libp2p compliance tests now passing.
- The code is shaping up well, indicating that the integration is on track for success.
- The plan is to continue testing to pass additional compliance tests and to integrate the QUIC binding into Lodestar.
- Cayman intends to deploy a branch with the QUIC integration to observe performance metrics and ensure stability.
- The integration of QUIC is expected to improve Lodestar's network performance by leveraging QUIC's capabilities for faster and more reliable data transmission.
- The deployment and testing phase will provide insights into real-world performance and help identify any remaining issues.
Gajinder:
-
PeerDAS Interoperability
- Gajinder attempted to achieve interoperability between Lodestar's supernode and full node using Kurtosis, but encountered failures with other clients.
- Efforts to connect Lodestar's supernode with supernodes from other clients also failed, indicating broader issues with client interoperability.
- Debugging is ongoing to resolve these issues, and a spec change regarding the CSC type on the get metadata call was noted. This change was reverted, and further testing with Kurtosis devnets is planned.
-
Verkle Tree Branch Rebase
- The Verkle branch has fallen behind, and Gajinder is working on rebasing it to incorporate performance gains.
- There have been reports of node instability on the current Verkle testnet, prompting efforts to deliver improvements from the Verkle branch.
- Work is also being done on EPBS (Ethereum Proof-of-Burn System) state transition tasks to enhance performance.
NC:
-
ePBS Implementation
- NC has started working on ePBS but identified several refactoring tasks that need to be completed on the unstable branch before proceeding.
- The ePBS integration is described as "destructive" to the protocol, indicating that it involves substantial changes that could impact existing functionality.
-
Refactoring Tasks
- One of the primary refactoring tasks involves removing the assumption of three intervals per slot, which is currently hardcoded in various parts of the codebase.
- This change is necessary to accommodate the ePBS requirements and ensure the codebase is flexible enough to support future protocol updates.
Phil:
Phil discussed the potential value and adoption of web user interfaces (UIs) for controlling Ethereum nodes, referencing existing UIs like Lighthouse's Siren and Prism's deprecated web UI. The conversation focused on assessing the demand and utility of such interfaces among node operators and how they might enhance Lodestar's usability.
-
Current Perception and Use
- Web interfaces for node control have not been widely popular or adopted, as seen with Prism's UI, which was eventually deprecated due to low usage.
- Lighthouse's Siren UI presents a visually appealing and seemingly useful tool, but its actual necessity and demand among users remain uncertain.
-
Research and Exploration
- Andrew has been tasked with incorporating questions about web UI usefulness into his ongoing research with node operators to gather insights on demand and potential features.
- The goal is to determine whether such interfaces are desired by users and what specific functionalities would be most beneficial.
-
Potential Features and Benefits
- Ideas for useful UI features include pruning the archive database, displaying validator duties, and providing real-time state snapshots.
- A web UI could serve as a local block explorer, offering insights into the current chain state, forks, and other critical metrics, which could aid in debugging and monitoring.
-
Challenges and Considerations
- Many settings require node restarts, which might limit the immediate utility of a web UI for configuration changes.
- The discussion highlighted the need for a UI that balances technical detail with accessibility, potentially leveraging existing tools like Forkmon for visualizing fork choices.
-
Next Steps and Community Feedback
- The team is interested in exploring existing open-source tools that could be integrated into a potential UI for Lodestar.
- Gathering feedback from node operators about their experiences and needs will be crucial in determining the feasibility and design of a web UI.
Agenda: https://github.com/ChainSafe/lodestar/discussions/7040
SemVer and Lodestar Internal Packages
-
The discussion was initiated by concerns over how Lodestar handles semantic versioning (SemVer) for its internal packages. See: https://github.com/ChainSafe/lodestar/issues/7052
-
There was a recognition that major version upgrades have been reserved for specific events, not necessarily reflecting breaking changes, which could lead to issues for users unaware of these changes.
-
Current Practice and Challenges
- Lodestar uses a single version across the monorepo, focusing on the top-level CLI package for versioning decisions.
- This approach has led to breaking changes in underlying packages without corresponding SemVer updates, causing issues for users.
- There is a need to revisit dependencies on sub-packages to decide if it's worth maintaining separate versioning.
-
Potential Solutions and Considerations
- The team discussed the possibility of breaking apart the published flow to allow for independent version updates.
- Using tools like Lerna, which supports different versions for different packages, was suggested as a potential solution.
- Implementing a process to track breaking changes, possibly using conventional commits with package-specific tags, was also considered.
-
Community and External Feedback
- Andrew from EthereumJS shared insights on managing SemVer in their monorepo, highlighting the challenges of maintaining different version numbers and the importance of adhering to SemVer to avoid breaking downstream dependencies. However, EthereumJS only does one breaking change per year and much longer release cycles.
- The importance of supporting other teams and projects, like Ultralight, that rely on Lodestar's packages was emphasized.
-
Action Items and Next Steps
- The team agreed on the importance of adhering more strictly to SemVer for Lodestar's packages.
- We should also note that a lot of communications to communicate breaking changes are geared towards node operators, rather than library users.
- Release notes aren't sufficient if someone can just easily break by updating to the latest minor via
npm install
as an example without a.lock
file. - An action item was proposed to explore what changes would be necessary to implement more granular versioning without causing disruptions. We can continue the discussion on how to approach this via the issue here: https://github.com/ChainSafe/lodestar/issues/7052
- The discussion concluded with a commitment to avoid breaking changes until a more robust versioning strategy is in place.
Current Status of .era file compatibility in Ultralight
-
Andrew has partially developed .era files and has a halfway functioning package.
-
A version 0.0.1 has been published, but there is no commitment to maintaining backward compatibility at this stage.
-
The implementation is incomplete, and Andrew is open to discussions if others are interested in utilizing this work.
-
Dependencies and Challenges
- Andrew intends to use one of Lodestar's libraries as a dependency due to its Snappy decompression capabilities.
- The challenge lies in the fact that Snappy frame compression is not directly compatible with Snappy.js, leading to the use of an older Node.js library.
- There is a need to add an API from Rust, but Andrew lacks the time for WASM testing required for this addition.
-
Future Plans
- Andrew plans to further develop the .era files over the next month or two.
- He is considering contributing a pull request to the relevant library to avoid using C++ underlying code, even though it might not be used in Node.js or browser environments.
Electra-Fork Rebase Branch
-
Current Status and Importance
- The Electra-fork rebase branch is currently blocking several other pull requests (PRs), making its progress critical for ongoing development.
- Most tests are passing, and there are no conflicts, indicating readiness for a fresh rebase and merge.
-
Rebase and Merge Plan
- A fresh rebase is planned, followed by a non-squash merge to maintain commit history.
- There is a minor rebase mistake identified, which can be addressed in a subsequent PR.
- The team discussed merging the branch and then incorporating performance improvement PRs into the unstable branch.
-
Performance and Fork Awareness
- Two PRs from Tuyen aim to fix performance issues, with one already approved.
- The importance of ensuring new compounding validators are fork-aware was discussed, particularly in relation to block forking and cache management.
- The effective balance process is confirmed to be fork-aware, addressing concerns about cache updates during epoch transitions.
-
Final Steps and Approval
- The final plan involves completing the rebase, obtaining a final approval, and merging without squashing.
- The team agreed to wait for the final rebase and ensure there are no issues before proceeding with the merge.
Devnet-3 Readiness
-
Current State and Objectives
- The team discussed the preparations for DevNet-3, which is based on the Alpha 4 version of the specifications for the Electra fork.
- Gajinder mentioned a forthcoming consolidation PR to address remaining spec-related changes.
- The team considered bumping the spec version to identify any breaking changes, with a focus on PeerDAS-related updates.
- Discussion included whether to include the
max blobs per block
variable for PeerDAS testing, noting that it has been moved from preset to config in the latest release. - Gajinder indicated that this adjustment should be quick and there was no immediate urgency.
- The team aimed to cut a release for Barnabas by the following day, aligning with plans to start DevNet-3 preparations.
Discussion and Feedback on Product Research with Andrew Levy
-
Survey and Data Collection
- Andrew has initiated a survey targeting node operators to identify trends and gather initial insights.
- The survey results are intended to guide deeper discussions and are not operator-specific to maintain privacy.
- Andrew highlighted the importance of understanding operators' expectations versus their experiences with Lodestar, particularly concerning reliability and performance.
-
Feedback on Survey Approach
- Participants suggested focusing on specific questions related to setup, configuration, and performance metrics to uncover underlying issues.
- Emphasis was placed on understanding operators' methods for measuring performance and reliability.
- The need for open-ended questions to allow operators to express specific issues or suggestions was highlighted.
-
Improving Communication and Support
- There was a discussion on improving communication channels with operators beyond Discord, to facilitate better support and feedback mechanisms.
- Suggestions included identifying preferred communication methods and establishing more direct lines of contact.
-
Additional Questions for Consideration
- What version of Lodestar are operators running, and what does their upgrade cycle look like?
- What type of hardware and deployment methods (e.g., Docker vs. bare metal) are they using?
- Are there specific features or improvements operators wish to see in Lodestar?
- How do operators define and measure reliability and performance, and what benchmarks are they using for comparison?
-
Engagement Strategy
- Andrew was advised to engage with operators who are already running multiple clients, as they may be more open to diversifying and adopting Lodestar.
- The importance of refining the survey based on initial feedback before expanding outreach was discussed.
- There was a suggestion to leverage insights from operators who are eager to provide feedback to refine the approach before reaching out to less engaged operators.
-
Action Items
- Andrew will incorporate the feedback and additional questions into his research process.
- He plans to continue engaging with operators, focusing on those who are open to discussions and have shown interest in running multiple clients.
- Andrew will update the team on progress and refine the survey as needed based on ongoing discussions with operators.
Tuyen
- Improve archive state using BufferPool, this is a prerequisite for state diff PR, PR is ready to review
- electra: fix performance regression with processEffectiveBalancesUpdate, PR was merged
- electra: improve getExpectedWithdrawals, PR is under review
- improve state regen: now it takes around 120ms (instead of ~375ms, 3x faster) to reprocess a block, PR is under review
- no progress for ssz batch hash, PR is under review
- Working on having single state tree at start up, this should save us 25%-30% of blocking time at start up as analyzed here https://github.com/ChainSafe/lodestar/issues/7027#issuecomment-2311468547
- Other TODO: Snapshot APIs EIP-4881
Cayman
- Spent last week working on a js-libp2p-quic implementation. It's coming along slowly but surely.
- Current design is to use napi-rs to wrap quinn and rust-libp2p-tls and rust-libp2p-identity. Work has been done to figure out workable patterns to facilitate having rust-created data be emitted into js.
- Cayman has a sanity test running locally that creates a quic server and client, connects and creates connections, and opens a stream and sends and receives data -- all controlled from js.
- Current work is in implementing the js-libp2p interfaces (there is/will be a decent amount of glue code), as well as exposing all necessary things from the rust side and adding some polish there.
-
Rebase of Electra-Fork Branch:
- Proposal: Gajinder proposed rebasing the Electra-Fork branch.
-
Status:
- The branch is close to being mergeable.
- Most features for Phase 0 and Pre-Electra are implemented.
- Remaining tasks include addressing PR comments and polishing.
- There are some merge conflicts, likely related to recent changes in hex formatting.
-
Coordination with Upcoming Releases:
-
1.21 Release:
- The team agreed that v1.21 is stable.
- There's no urgent need for further updates before merging Electra.
-
1.22 Release:
- The plan is to merge the Electra-Fork branch after the 1.21 release.
- This merge will prevent further rebasing cycles and help streamline feature development.
- The team prefers to merge Electra sooner rather than later to test its impact on the fleet pre-Electra.
-
1.21 Release:
-
Impact on Workflow:
-
PeerDAS and ePBS:
- Both will be based on the Electra-Fork, so merging it will clear the way for further work.
-
Production Considerations:
- Merging Electra will align the devnet and production environments, aiding in the productionization process.
- Being among the first to merge Electra will position the team well as devnets progress.
-
PeerDAS and ePBS:
-
Ansible Maintainability and Best Practices:
- Current Issues: The team faces challenges due to varying Ansible versions across different machines, which causes inconsistencies in deployments.
-
Proposal:
- Move to running Ansible within a Docker container to ensure consistency.
- Use Ansible Navigator, a tool that simplifies running Ansible in a container, ensuring that all Ansible tools and versions are pinned and consistent across environments.
- Update the internal Lodestar ansible repositories to include an Ansible Navigator YAML file, which will specify the Ansible version.
- The transition will require Docker and Ansible Navigator as dependencies on the host machines.
-
Benefits of Using Ansible Navigator:
- Consistency: Ensures everyone is using the same Ansible version and tools, avoiding issues caused by version discrepancies.
- Linting and Testing: Opens the door for consistent linting and testing using the same methods.
-
Future Exploration:
- Potentially explore using Ansible Tower (AWX), which provides a web interface for deploying playbooks.
- Ansible Navigator provides playbook artifacts that enhance auditability, showing what was run and its outcomes.
-
Security Enhancements:
- Non-Standard SSH Ports: The team plans to update the SSH configuration on dev servers to use non-standard ports to mitigate security risks.
- Ansible Configuration: The Ansible playbooks will be updated to include the new SSH port, so deployment processes should remain unaffected.
-
Manual SSH Access:
- Users will need to update their SSH configurations or use the
-P
flag to specify the port when SSH-ing into machines manually. - An update will be made to the "update host" playbook to also change the SSH port configuration.
- Users will need to update their SSH configurations or use the
Nazar:
Differential Backups PR
-
Current Status:
- Nazar has submitted a PR for differential backups and received initial feedback from Tuyen.
- Most of the feedback has been addressed.
-
Critical Feedback:
- The default values for differential backup intervals (e.g., snapshots every 1000 epochs, differential snapshots every 256 epochs) were based on initial assumptions.
- Tuyen suggested conducting more thorough research to justify these default values.
Research and Simulation
-
Algorithm Development:
- Nazar is developing an algorithm to generate a score for different sets of parameters to determine the optimal settings for differential backups.
-
Simulations:
- Currently running simulations to visualize and analyze the effectiveness of different default values.
- The goal is to ensure that the selected parameters are optimal based on system performance and assumptions.
-
Documentation:
- Nazar plans to document the research process, including the algorithm, simulations, and final recommendations.
- This documentation will serve as a reference for why specific default values were chosen.
Next Steps
-
Complete Research and Finalize Defaults:
- Finalize the default values for differential backup parameters after completing the simulations.
- Document the rationale behind the chosen values.
-
PR Review and Deployment:
- Update the PR with the finalized values and open it for further review from the team.
- Once finalized, deploy the feature to a feature group for further testing and metric collection.
-
Gathering Metrics:
- Deploy the finalized feature to a beacon node with validators in a feature group to gather real-world metrics and validate the effectiveness of the chosen parameters.
Note: Nazar emphasized the importance of these defaults as they are immutable once set, requiring a full resync if changed later. Therefore, thorough validation is necessary before final deployment.
Matt:
PRs Submitted
-
Async Shuffling PR:
-
Current Status:
- Submitted the PR and received review comments.
- Currently addressing those comments.
-
Current Status:
-
Block Deduplication PR:
-
Current Status:
- Substantially complete but encountering a few issues during sim tests.
- Actively debugging those issues.
-
Current Status:
Goals for the Week
-
PR Comments:
- Aim to fully address all review comments on the Async Shuffling PR.
-
Block Deduplication:
- Plan to fully debug the block deduplication issues and get it running on a feature branch.
-
PeerDAS Branch Update:
- A request from Kev in research to update the PeerDAS branch using his library.
- Intends to prioritize this task and possibly complete it ahead of other tasks this week.
Gajinder:
- Focused on PeerDAS and ePBS
-
Bug Fix in Electra Devnet 2:
-
Issue:
- Lodestar beacon node encountered a bug in Electra DevNet 2.
- The bug was related to using
withdrawalIndex
instead ofwithdrawal.index
.
-
Resolution:
- Debugged and fixed the issue.
- NC also provided an explanation on why the spec test did not catch the bug:
- The spec test used a very specific index (
0
), which wouldn't have highlighted the mistake.
- The spec test used a very specific index (
-
Issue:
N.C.:
-
Light Client Workflow for Pectra:
- Reviewed the Light Client workflow.
- Identified missing pieces in Lodestar for a functional Light Client for Electra.
-
PR Draft:
- Created a draft PR with necessary patches.
- Waiting for the July 30th Electra branch to be merged before marking the PR as ready for review.
-
Beacon API:
- Realized the Beacon API is not up-to-date for Light Client features in Electra.
- Engaged in discussions with the API Discord channel.
-
Discussion:
- It was suggested that the only required change might be adding the new
Electra.lightclientUpdate
type, as the Light Client API is already fork-aware. - The API likely doesn't need a major change, just the inclusion of the new container type for Electra.
- It was suggested that the only required change might be adding the new
- NC will proceed by adding the necessary types to the Beacon API.
Attestor Slashing Workflow for Electra
- Currently reviewing the attestor slashing workflow for Electra.
- Noted that some minor changes may be needed in Lodestar.
Bug Fix for DevNet-2
-
Plans to open a PR to the consensus spec to add additional test cases to cover the bug found in Lodestar, which was related to DevNet-2.
-
ePBS Branch:
- Gajinder created an ePBS branch in the Lodestar repo.
- Began working on the PTC (Payload Timeliness Committee) portion of ePBS.
- Wrote some initial code and plans to focus more on it this week.
Nico:
-
Electra Branch Review
- Completed the review for the Electra branch.
- Added a few additional differences, though not significant.
- Ensured compatibility with pre-Electra versions.
- Implemented v1 APIs and added support for attestation and attestation slashings.
-
Event Stream and API Considerations
- Uncertainty about the usage of the event stream, but it could be useful for data observation or debugging.
- Open question regarding whether to update the entire API or just add new events.
- Current workaround involves extracting the fork type from the slot of the data, which works due to the lack of SSZ support, simplifying the process.
-
Continuous Integration (CI) and Release Automation
- Made updates to the CI in the beacon spec, leading to fully automated releases.
- Previously, releases required opening two pull requests and approvals, which was cumbersome.
- Now, releases can be done by simply pushing a tag, potentially leading to more frequent releases.
-
Keymanager Overhaul
- Conducted a major overhaul of the key manager, with the last release being one or two years ago.
- Current CI is broken for releases, but efforts are being made to enable a new release.
-
MEV Boost and SSZ on Builder API
- Participated in a joint MEV Boost call to promote SSZ on the builder API.
- Current process involves receiving a blinded block as SSZ and unnecessarily serializing it to JSON, which is inefficient for publishing or proposal flows.
- There is support from MEV Boost for this change, and it is simple to implement on Lodestar.
Tuyen:
-
Message ID Conversion Improvement
- Developed a PR to improve the conversion of a message ID from gossipsub to a string in Lodestar by reusing a buffer, resulting in better garbage collection (GC) performance.
- Implemented the
toRootHex
function, which reduced GC from 2.9% to 2.3% in the test mainnet node. - Plans to follow up with additional PRs to refine consumer-side implementations, as some areas still use SSZ to hex string conversion.
-
Refactoring Bytes to Utils
- Submitted a PR to refactor bytes into the utils package, separating import paths for browser and Node.js to ensure the correct utility is used.
- Discussion with Nico on whether to merge this before the Electra branch and the implications for browser compatibility.
- Nico suggested using the optimized version for Node.js while maintaining compatibility for browser applications.
-
State Regeneration Improvement
- Working on improving state regeneration, which currently takes longer when reprocessing blocks.
- Proposed loading blocks simultaneously from the database to avoid unnecessary delays, with the PR nearly ready.
-
Bug Fix in SSZ SliceFrom
- Identified and fixed a bug in SSZ
slicefrom
, though it was not caught in the spec test. - Applied the latest SSZ updates to the Electra branch.
- Identified and fixed a bug in SSZ
-
SSZ PR Review and Testing
- Continuing to review and test the SSZ PR on the Lodestar side, focusing on code cleanup.
-
Future Plans
- Opened an issue to address loading two versions of state at startup, aiming to streamline the process by using the log state API.
-
Discussion on Utils Package and Conditional Exports
- Nico raised concerns about using subpaths for imports due to limitations in browser compatibility.
- Proposed using conditional exports to differentiate between browser and Node.js implementations, allowing for optimized use in both environments.
- Tuyen agreed on separating implementations for browser and Node.js initially, with plans to explore isomorphic solutions later.
Cayman:
-
BLS Optimization and Collaboration
- Met with Sebastian from ChainSafe to discuss BLS optimization, receiving informal approval.
- Sebastian will draft a formal document detailing the optimization in mathematical terms for cryptography experts.
- The document aims to facilitate broader vetting and potential adoption by other teams, including sharing on ETH Research.
- The optimization, known as the Tuyen optimization, involves adding randomness to public keys and is similar to Vitalik's approach of adding randomness to messages.
- Internal and external security audits are planned to ensure the optimization's security guarantees, especially as more clients adopt it.
-
GossipSub Update
- Opened a PR in the GossipSub repository to implement a new control message, "IDONTWANT" to optimize bandwidth usage by preventing redundant message transmissions.
- The feature is particularly useful for large messages, such as blocks, by sending minimal data to avoid unnecessary bandwidth consumption.
- The PR includes mechanisms to manage lists of "I don't want" messages and requires further review and testing.
-
TLS and QUIC Support
- Investigated TLS support for libP2P, noting the absence of current support.
- Started developing a TLS module using self-signed certificates, incorporating peer ID as per libP2P specifications.
- The TLS module will also support QUIC, which uses similar certificate-based handshakes.
-
EIP 4881 Deposit Snapshot Interface - Current Status and Concerns
- The team discussed the status of the deposit contract snapshot interface (EIP-4881) in the context of EIP-4444, which proposes pruning historical data older than one year.
- There is concern about not being prepared if EIP-4444 is implemented, as it could affect the ability to access historical deposit data necessary for syncing.
- The deposit contract snapshot interface was deprioritized previously due to perceived phasing out with in-protocol deposits and reliance on a contributor who did not follow through.
-
Importance and Priority
- The team acknowledged the importance of implementing EIP-4881 to ensure continuity in deposit data management, especially if historical data is pruned under EIP-4444.
- It was noted that while the implementation is important, it is not currently urgent unless there is an imminent timeline for EIP-4444's adoption.
-
Potential Impact of EIP-4444
- EIP-4444 would eliminate the requirement for storing historical blocks, affecting the ability to fetch historical deposits if not addressed.
- The implementation of EIP-4881 would simplify the user experience by reducing the need for extensive deposit syncing when starting a node.
-
Industry Context and Next Steps
- There is uncertainty about the finalization of EIP-4444, with ongoing discussions about how to manage historical data, including potential solutions like the portal network.
- The team discussed the status of other clients regarding EIP-4881, noting that Lighthouse has implemented it, and there is a desire not to be the last client to adopt it.
-
Action Items
- Tuyen volunteered to work on implementing the deposit contract snapshot interface while awaiting other reviews, indicating a shift in priority to address this gap.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6970
The CommitBoost team, led by Drew and Kubi, presented their project aimed at providing a unified sidecar for Ethereum validators to make commitments safely and efficiently. The initiative focuses on decentralizing block construction and enhancing proposer autonomy while maintaining high operational security and compatibility with existing infrastructure.
Key Points from the Presentation
CommitBoost Vision and Goals
-
Independence and Nonprofit Nature:
- CommitBoost is an independent nonprofit entity, initially funded by Eigenlayer.
- The project aims to remain a public good, supported by grants, and avoid monetization.
Problem Addressed
-
Current Issues with Multiple Sidecars:
- Validators face complexity and operational security risks when running multiple sidecars.
- Home stakers and sophisticated actors alike prefer a unified solution to minimize complexity and risks.
CommitBoost Features
-
Validator Autonomy:
- Provides tools for validators to set conditions and constraints on block construction.
- Enables proposers to reclaim some control over block building, beyond relying on a few builders and relays.
-
Modular Design:
- Acts as an app store for validators, allowing them to opt into various proposer commitment protocols through modules.
- Unopinionated about downstream enforcement mechanisms, supporting a variety of approaches.
-
Operational Security and Insights:
- Standardizes module instantiation and resource allocation to enhance security.
- Offers real-time and historical insights into module performance via Prometheus and Grafana.
Governance and Development
-
Team Structure and Focus:
- A dedicated team of 3-5 people, distributed globally, will manage CommitBoost.
- Focus on sustainability, development, and research.
-
Governance:
- A neutral party with experience in Ethereum infrastructure will oversee governance.
- Aims to maintain transparency and community involvement in decision-making processes.
Technical Details
-
Compatibility and Integration:
- Short-term focus on using existing key managers (Web3 Signer, Dirk) to avoid consensus client changes.
- Future integration with consensus clients is planned to enable broader commitment functionalities.
-
Testing and Deployment:
- Currently testing on Holesky with plans for a significant release by mid-August.
- Targeting a production-ready version by the end of Q3 or early Q4.
Q&A Session Summary
Timeline and Testing
-
Timeline for MEV Portion:
- Simplified version of MEV boost is currently in Holesky testing.
- Full functionality, including additional extensions, expected by mid-August.
- Production readiness targeted for end of Q3 or early Q4.
Technical Clarifications
-
Replacement for MEV Boost:
- CommitBoost acts as a unified sidecar, replacing the need for separate MEV boost instances.
- It facilitates execution payload construction based on user-configured commitments.
-
Slashing Conditions:
- Concerns raised about potential slashing conditions due to commitments.
- CommitBoost provides the framework, but enforcement and slashing conditions are determined by the individual protocols.
- Validators must ensure due diligence when opting into commitments.
-
Compatibility with ePBS:
- CommitBoost aims to remain compatible with ePBS and is in ongoing discussions to ensure alignment with evolving specs.
Client Team Collaboration
-
Support and Feedback:
- Engagement from client teams is crucial for feedback on design and implementation.
- Specific areas for collaboration include testing, reviewing design decisions, and helping spec out future client-side changes.
-
Communication Channels:
- The CommitBoost team will share progress via GitHub and ETH Research.
- Future governance calls and community engagement planned for August.
Conclusion The CommitBoost team provided a comprehensive overview of their project, addressing key issues related to validator sidecars and proposing a robust, modular solution. The presentation was well-received, with the team expressing eagerness to collaborate with client teams and the broader Ethereum community to refine and implement their vision.
Contact Information
- GitHub: CommitBoost Repository
Summary and Action Points for Merging "electra-fork" with "unstable"
Discussion Overview The team discussed the strategy and concerns around merging the "electra-fork" branch into the "unstable" branch. The primary considerations were to ensure a clean rebase, maintain passing CI tests, and address any compatibility issues. The goal is to integrate the changes from "electra-fork" while preserving the stability of the "unstable" branch.
Key Points and Action Steps
-
Rebasing Strategy:
- Gajinder proposed that the "electra-fork" branch should be rebased onto the "unstable" branch to resolve any conflicts.
- This would involve moving the head of "unstable" to the top of "electra-fork," effectively making a fast-forward inclusion.
-
Ensuring Stability:
- Nico and Gajinder emphasized the importance of making sure that all spec tests and CI checks pass on the "electra-fork" branch before merging.
- Any remaining issues identified during the rebase should be resolved, ensuring that the baseline from previous forks remains unaffected.
-
Rebase and Cleanup:
- Gajinder will handle the rebase and clean up any conflicts that arise.
- Post-rebase, a thorough review and cleanup of the code will be necessary. This includes addressing any suboptimal decisions made in the "electra-fork" branch.
-
Intermediate Commits:
- The merge should be a non-squash merge to preserve all intermediate commits, ensuring that all contributions and credits are maintained.
-
Draft PR for Merge:
- A draft PR for the merge already exists. This placeholder PR will be updated with the rebased changes.
- The team will use this PR to track and discuss any issues that need to be resolved before the final merge.
-
Testing and Benchmarking:
- The team noted the need to run benchmarks on the new functions in state transitions, especially given the small validator dataset currently used for benchmarking.
- Tests that download real mainnet blocks need to be updated to reflect the current network situation.
-
Timeline and Release Planning:
- The current plan is to target the integration for version 1.22, not to impact the upcoming 1.21 release.
- This approach allows time for thorough testing and ensures that the changes are stable before being included in a release.
-
Potential Issues and Considerations:
- Any suboptimal decisions made in the "electra-fork" branch that do not affect the "Danube" branch will be noted and addressed separately.
- The team should avoid rushing the merge and ensure that the changes remain on the "unstable" branch for sufficient testing before final release.
Next Steps
- Gajinder will perform the rebase and address initial conflicts.
- Team to review the rebased "electra-fork" branch, ensuring all tests pass and identifying any remaining issues.
- Team to create issues or follow-up PRs for any additional cleanup needed post-rebase.
- Coordinate on scheduling the merge to avoid impacting the 1.21 release and to plan for inclusion in 1.22.
- Continue Testing and updating benchmarks to ensure the robustness of the changes.
By following this structured approach, the team aims to merge the "electra-fork" branch into "unstable" effectively, maintaining stability and ensuring all contributions are recognized.
v1.21 Release Candidate - Target Release Date
- We aim to decide on the release candidate (RC) by the next stand-up.
- The goal is to cut an RC next week.
Key Features and Updates
-
NAPI-RS Bindings:
- Cayman's work on NAPI bindings is nearing completion.
- Metrics look promising and targeting inclusion in this release.
-
Historical State Regen:
- Another review has been called for.
- The diff has been simplified and is ready for review.
-
YAML Updates:
- Deployed to feature three.
- Rebased on unstable from a few days ago and running well.
- Review scheduled for tomorrow.
-
ELClient in Graffiti:
- This task has been delayed but is fairly close to completion.
- Targeting inclusion in this release.
-
SSZ Batch Hash:
- Encountered memory issues that require refactoring.
- Testing needs more time.
- Will likely be deferred to the next release to avoid rushing.
Other Considerations
- The current set of features seems substantial.
- Further updates and decisions on the RC will be discussed in the next week's stand-up.
ePBS Updates
Breakout Room Insights
- Last Friday's breakout room had minimal discussions left on ePBS.
- Other client teams like Nimbus and Prysm are starting implementations.
Action Plan
- Gajinder and NC have discussed work distribution.
- Implementation work is set to start this week or next week.
- Gajinder and NC will collaborate to advance the implementation.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6893
Vision for Lodestar v2
-
Milestones Approach:
- Lodestar v1 signified readiness for mainnet with audits and block proposals.
- Lodestar v2 aims to mark the point where Lodestar is competitive with other clients.
- The goal is to attract users and major pools who previously didn't consider Lodestar due to performance concerns.
-
Competitiveness:
- V2 will be released when Lodestar's performance is on par with other clients.
- The focus is on genuine performance improvements, as users will independently verify these claims.
-
Comparison:
- Prysm releases major versions with each hard fork.
- Teku follows strict semantic versioning with frequent major releases.
- Lodestar uses major releases as significant milestones for PR and community engagement.
-
Incremental Improvements:
- Continuous performance improvements through minor releases.
- Aim to reach a point where data and profitability metrics show Lodestar's competitiveness.
Proposed Actions
-
Feedback and Iteration:
- Team members are encouraged to provide feedback to Cayman.
- Plan to iterate and improve continuously based on performance data.
-
Deep Dive Meeting:
- Plan a separate meeting to deeply analyze performance metrics and identify gaps.
- The goal is to convert broad ideas into concrete steps for improvement.
-
Rust and WASM:
- Explore using Rust and WebAssembly (WASM) for performance-critical functions.
- Leveraging WASM for both browser and server-side improvements.
-
Planning and Preparation:
- Schedule a dedicated meeting in 3-4 weeks to analyze performance metrics and brainstorm improvement ideas.
- Gather data and ideas beforehand to ensure productive discussions.
Matt:
Docker and LibUV Issues
-
Discussion with Ben:
- Ben provided insights into Docker issues.
- Key Insight: Docker is not C group or OS parallelism aware, leading to improper work scheduling and insufficient worker threads.
- LibUV Compatibility: LibUV does not play well with Docker, causing performance issues.
- Solution Path: There is a PR to address this, but it has not been merged yet.
- Alternative Approach: Utilizing Rust threads, which are native threads not relying on LibUV, is crucial.
-
Blinding/Unblinding Features: Rebase and Release:
- Rebased, fixed, and debugged the blinding/unblinding features.
- Released and deployed on feature two to gather metrics.
-
Ansible Documentation: Compatibility Issues:
- Encountered issues with Ansible not working properly due to breaking changes in the latest version.
- Documentation Updates: Updated the documentation to specify the correct version to install using pip instead of Homebrew.
- Merged the updated documentation.
-
Rust and mult Library:
- Collaborated on Rust thread implementation and added enhancements.
- Attempted to get the mult library working, considering an upstream PR to the Supernational repo to add missing exports.
Cayman:
Historical State Regen PR
- Status: No updates on the historical state regeneration PR. Plans to review it later in the week.
Experimentation with Rust Bindings
-
NAPI RS:
- Experimented with great success on a BLST binding in NAPI RS, the Rust version.
- Found Rust bindings preferable to C++ due to reduced manual operations, fewer bugs, and improved developer experience.
-
Benchmarks:
- Using the BLST Rust library, which utilizes multithreading within each operation, resulted in significant performance improvements.
- Observed a roughly 10x improvement on a multi-core machine.
-
Draft PR:
- Opened a draft PR in Lodestar to integrate the new Rust library.
- Metrics showed promising results, with signature verification time reduced to one-third (from 1.5 milliseconds to 0.5 milliseconds).
- Observed reductions in job time for gossip and processing time for gossip blocks, improving by 30-50 milliseconds.
-
Promising Metrics: Performance Gains
- Significant improvements in time to process gossip blocks and overall job time for gossip.
- Results indicate a smoother and faster performance with the new implementation.
-
Tooling and Code Efficiency:
- The Rust implementation required only 300 lines of code compared to 3,000 lines in C++.
- Emphasized the superior tooling and efficiency provided by Rust, recommending it for similar coding tasks.
Nico:
API Fixes
-
Minor Issues:
- Addressed several minor issues related to APIs.
- Engaged in discussions on handling empty requests according to the specification.
- Fixed Lodestar to comply with these specifications.
Builder Spec and Timing
-
Timeout Enforcement:
- Reviewed the builder specification and noted discrepancies in enforcing proper timeouts.
- Other clients enforce a stricter one-second timeout for builder responses to headers, as suggested by the specs.
- Recognized that while it's not highly relevant due to Lodestar's current 1% mainnet usage, aligning with stricter timeout enforcement is still necessary.
Metrics Improvements
-
Lack of Metrics:
- Identified a lack of metrics to assess the feasibility of enforcing the one-second timeout.
- Noted the absence of timing metrics for get header requests.
-
Builder Metrics:
- Plans to address and improve builder metrics.
- Referred to an issue Tuyen opened related to minimum bid metrics and their accuracy.
- Emphasized the need for proper metrics to monitor and enforce specifications accurately.
NC:
Pectra Devnet-1 Preparation
-
PRs Submitted:
- Three PRs have been submitted related to DevNet 1.
- Two PRs involve updating the code according to spec changes.
- The third PR fixes issues to ensure DevNet 1 spec tests pass.
-
Test Status:
- All tests for minimal and main nets are passing except for one.
- The failing test appears to be generated incorrectly, as it passes with an invalid beacon state.
- Investigating the cause of this issue is ongoing.
-
Bug Fix:
- The spec tests revealed a long-standing bug related to computing the proposer index for certain slots.
- Implemented a temporary workaround, but its permanence needs discussion.
- Comments will be added to the PR for further review and discussion.
-
Request for Review:
- Encouraged team members to review the submitted PRs to expedite joining DevNet 1.
- Plans to coordinate with Pari for DevNet 1 participation.
Anti-Correlation Penalty Logging
-
Logging Strategy:
- Developed a list of items to log, divided into two groups:
- Variables involved in the penalty calculation (related to the EIP).
- Data to investigate validator correlation for Lido and the EF.
- Developed a list of items to log, divided into two groups:
Gajinder:
-
Data Handling for Forks Implementation:
- Completed a PR to make block data fork-aware, allowing for extension of data types based on forks or other defined boundaries.
- Added an enum field to indicate database types, useful for PeerDAS and ePBS/ILs.
- The PR has been reviewed, finalized, and merged.
Obol Validator Performance Issues
-
Issues Observed:
- Validator responses to Obol's APIs showed poor scores on Holesky and Mainnet nodes.
- Performance degradation noted with the latest version 1.0.0 RC5, potentially due to Obol's middleware timeouts.
- Encountered an uncaught exception causing validator duties to freeze, impacting performance.
- Deployed a special Docker custom image that improved attestation performance by addressing the uncaught exception.
- Ongoing investigation into other causes of latency and failures.
Rebase and API Refactor for Electra:
-
Working on rebasing the Electra PR and familiarizing with the SSZ API changes.
-
Identified unmodified APIs that need to be made fork-aware, such as
getAttestation
. -
API Updates:
- Will align with the latest beacon API spec and refactor APIs accordingly.
- Plans to implement new V2 APIs, adhering to the updated spec for attestation and other functionalities.
PeerDAS Design Decisions
-
Custody Consistency:
- Discussed ensuring consistent data custody by not changing peer ID on restart unless data is wiped or brute-forcing peer ID to land in the same data shard.
- Plan to use the flag
--persistNetworkIdentity
to store and reuse peer ID on each boot.
-
Syncing Mechanism:
- Considered two approaches for syncing data columns:
- Sync blocks and incomplete data, then request remaining columns through request-response.
- Continuously run sync cycles until a peer with the required columns is found.
- Leaning towards the first approach for its simplicity and ease of coding and maintenance.
- Considered two approaches for syncing data columns:
-
Development for DevNet:
- Initial focus on custodizing the minimum required data to simplify development and avoid complex syncing issues.
- Will later extend custody requirements and refine syncing mechanisms.
Tuyen:
Hash Tree Consumer Optimization
-
Memory Allocation:
- Worked on the hash tree consumer to allocate memory more efficiently.
- Benchmark results showed a 5x speed improvement over SIMD SHA-256 hashing, even on an M1 Mac.
- Noted performance discrepancies between M1 and Intel Macs but identified no immediate consumer-side solutions.
Batch Hashing Experiments
-
Current Approach:
- Implemented a proof of concept (POC) for lazy tree computation of each validator and batch processing.
- Resulted in saving 100 milliseconds per 850 validators, with significant time (over 50%) spent on validator tree creation.
-
New Experiment:
- Maintained both validator value and validator tree to optimize the write side.
- Achieved faster write operations with reduced hashing and no need to recreate the tree.
- Benchmark showed a 2x improvement over the original method.
Spec Tests and Deployment
-
Out of Memory Issue:
- Faced out-of-memory errors during deployment, requiring a max heap node space of 16 GB RAM to load the state.
- Achieved one-second pre-compute epoch transition on Holesky but found the memory usage unsustainable.
-
Alternative Approaches:
- Will explore other methods for batch hashing next week due to the current approach's unsustainability.
-
HackMD Document:
- Shared a detailed document on Improve BeaconState
hashTreeRoot()
- Encouraged team members to review the document for more detailed insights.
- Shared a detailed document on Improve BeaconState
Agenda: https://github.com/ChainSafe/lodestar/discussions/6855
v1.20 Planning
- Timeline: Tentative release planned for two weeks from now.
- Issue Tagging: No specific issues have been tagged for v1.20 yet. Team members are encouraged to discuss or add any targets for the release.
Proposed Additions and Targets
-
Historical State Region:
- Proposal: Add a historical state region in the next release.
- Action: Target to get the related PR fixed and included.
-
Client Interoperability Fixes:
- Context: Nico's recent PR fixes remaining client interoperability issues.
- Urgency: No immediate rush as there have been no recent user reports of interoperability issues with Prysm VC or Nimbus.
-
Implement EL info into graffiti
- Nice to have for v1.20
- Marked for review
-
Get BLST-TS back into mainnet deployment
-
Refactor Improve Types to Use Forks as Generics:
- Status: Currently in draft, but a significant and valuable update.
- Timeline: Expected to be out of draft soon, with a hopeful inclusion in the next release after a thorough review.
- Context: Presented by Nazar in the previous week’s meeting. Team members are invited to review and provide feedback.
- Team members who missed the generic fork types presentation are encouraged to review the proposed changes and share any questions or concerns.
- Specific Feedback: Matt suggested considering splitting the changes, but a decision was made to review the whole change first to understand its impact and address potential issues.
EIP-7716: Anti-Correlation Penalties Implementation
- Timeline: NC was asked if the implementation of the anti-correlation penalty could be completed within two weeks.
- Data Logging: The need to log data from the mainnet to understand the impact of the penalties was emphasized.
Logging Strategy: Data Dimensions:
- NC highlighted that the penalty covers various dimensions such as ISP, geographic locations, etc.
- More comprehensive logging is needed beyond just proposed penalties and missed attestations.
- Team members were encouraged to suggest additional data points to log for a thorough analysis.
- Machine-Readable Logs: Cayman suggested that logs should be machine-readable to facilitate data visualization and post-processing.
- Raw Data: Logging raw data would help in interpreting penalties and their correlation with different factors like geography and infrastructure.
Current Status of the EIP: The penalty calculation for the EIP is still in the early stages and not finalized.
- Additional Factors: Logging should include more variables that contribute to the penalty calculation.
Implementation Considerations
- Separate Log Stream: Discussed whether to use a separate log stream for this data. A child log with a specific prefix was suggested as a convenient solution.
- Non-Critical for Release: The logging implementation does not need to be part of the next release and can be done in a separate branch.
- Transparency: The team agreed on the importance of transparency, ensuring that penalties and their data are not kept secret.
- Support for EIP: Using the logged data to support and refine the EIP, contributing to community research.
-
Potential Negative Impact:
- Concerns were raised about the EIP potentially discouraging client diversity as operators might favor clients with the best attestation performance.
- The team acknowledged the need to consider these unintended consequences.
New Testing Infrastructure Goals and Benefits
-
Standardized Naming:
- Opportunity to standardize naming conventions for the servers.
-
Enhanced Coverage:
- Ensure coverage of Docker, system, and binary deployments.
- Aim to catch problematic commits much earlier in the process.
Weekly Checkpoints
-
Tuyen's Proposal for weekly checkpoints:
- Utilize the beta environment to deploy weekly checkpoints at the end of each week.
- This approach is intended to identify any significant performance deviations from the week's changes.
Infrastructure Changes
-
Server Groups:
- The new layout involves adding a couple of additional servers to the existing groups to support the enhanced coverage and weekly checkpoints.
-
Feedback and Request:
- Team members are encouraged to provide feedback on the proposed layout and any additional considerations.
Proposer Boost Reorg Test Plan
-
Plan Confirmation:
- There was a need to finalize the Proposer Boost Reorg test plan.
- Cayman agreed with the proposed plan.
- Coordination with the infrastructure team is necessary to implement the plan effectively.
-
Ideal Node Selection:
- Holesky testnet nodes were identified as ideal for deployment, with one node having up to 10,000 validators connected.
- Holesky-prod nodes with 5,000 validators each were also suggested as suitable options for deployment.
- Deploying on Holesky prod nodes, even if issues arise, would not significantly impact the larger Holesky network.
Cayman:
SSZ Library in Golang
-
Development:
- Completed work on a small SSZ library in Golang.
- The library will be handed off to Light Clients (Matt from the Geth team) once he returns to the office.
-
Challenges:
- Attempted to use Go generics to improve the library but faced limitations.
- Go generics do not support variadic generics, which are needed for containers with an unknown number of fields.
- Ended up not using generics in the core library but created a type-safe wrapper.
Collaboration with Matt
-
HashTree:
- Worked with Matt on publishing HashTree.
- Completed a final review and merged the SSZ API PR, a significant update.
EIP-7688 StableContainer
-
Integration Tasks:
- Focused on integrating stable container on the consensus layer side.
- Identified additional tasks, including making parts of the code fork-aware where G indexes were hard-coded as constants.
- Working on a PR to update the EIP with these necessary changes.
Matt:
HashTree and SSZ Integration
-
Merges and Implementation:
- Merged the PR for HashTree in Potuz’s repository and also in the team's repository.
- Published the changes and integrated them into Tuyen's SSZ branch.
- Implemented the hasher using the HashTree library.
-
Testing and Results:
- Built specification and performance tests, posting results in the developer channel.
- Observed mixed results: batch hashing is faster, but not as fast as Tuyen’s assembly script for individual runs.
- Need to test on an Intel machine with SIMD instructions to understand the performance fully.
-
Blinding and Unblinding Branch Rebase:
- Worked on rebasing the blinding/unblinding branch, which was stale and had many changes.
- Aimed to update the branch, deploy it, and gather metrics.
- Expected to complete the rebase and run unit tests, acknowledging potential issues due to the rebase process.
-
Upcoming Shuffling Task:
- Plan to address the shuffling task, which has been on the back burner.
- Intends to work on it if time permits.
-
Upcoming BLST Debugging:
- Awaiting feedback from Ben to start debugging BLST-related issues.
- Will prioritize this task based on the feedback received.
Nico:
SSZ APIs Branch
-
Completion:
- Finished the SSZ APIs branch after a detailed self-review, which took about six hours.
- Addressed the remaining points to ensure compatibility and stability.
-
Documentation Updates:
- Identified missing documentation updates to be addressed after fixing issues in the README examples.
- Planned to fix README verification before updating the docs.
-
Light Client Routes Testing:
- Planned to add SSZ to the light client routes and test them with Etan’s website to ensure functionality.
Electra EIPs
-
Implementation Planning:
- Reviewed the Electra EIPs/APIs that need to be implemented.
- Discussed the need to rebase the Electra branch to remove hacks added for EIP-7549.
-
Rebase Process:
- Agreed to rebase against unstable, removing previous API changes to simplify the process.
- Confirmed that the SSZ refactor has been merged to unstable.
- Planned to merge a pending PR related to making blob data fork-aware before rebasing.
-
Timeline:
- Acknowledged that there is no immediate rush but aimed to be prepared for DevNet 1.
- Discussed the readiness of clients for EIP-7002, with an expected timeline of the following week.
-
Attestation Pool:
- Agreed with Tuyen's suggestion to avoid having two attestation pools, requiring reworking to align with proper APIs.
Gajinder:
PeerDAS
-
Node ID Consistency:
- Issue: Each time Lodestar starts, it assigns a new peer ID. This is inconsistent with data column sharding because a new node ID each time means losing custody of previous data, leading to potential bans.
-
Proposed Solutions:
- Do not change the ENR (Ethereum Node Record) on restart unless there is a data wipe.
- On each restart, use a brute force method to generate the same ENR private keys to maintain consistent data shard assignments.
- Considerations: This issue was discussed in the PeerDAS breakout and requires a decision.
ePBS (Enhanced Proposer/Builder Separation)
-
Context:
- ePBS Without ILs: Reviewing specs that exclude Intermediate Layers (ILs), specifically focusing on MaxEB and EIP-7002.
-
Beacon Block and Payload Coupling:
- Current tight coupling between beacon blocks and payloads will become looser.
- Beacon blocks can continue building a chain even without a valid payload. The next proposer will build the payload on the last valid payload.
- There may be a series of invalid payloads, requiring bookkeeping for withdrawals and other data from the last valid payload.
-
Payload Validation Time:
- Payload validation time is extended to the next attestation (three seconds into the slot).
- Only block producers need immediate payload validation to produce the next payload.
- Validating nodes, which form the majority, will have more time for payload verification. The impact on staking performance remains uncertain.
- Takeaways: These changes and their implications were discussed in the ePBS breakout calls.
EIP-6493: Stable Container
-
EIP-6493 Implementation:
- Introducing Stable Containers: EIP-6493 proposes introducing stable containers at both block and transaction levels.
-
Impact on CL (Consensus Layer):
- The consensus layer will need to be aware of the new stable container.
- CL will need to process transactions using the stable container for serialization.
-
Discussion with Etan:
- Future possibilities include having transactions as a separate mandatory field in the header to simplify processing.
- Current implementation might introduce complexity, but the consensus layer must adapt to the stable container if EIP-6493 is activated.
- Observation: The need for CL to handle the transaction stable container was noted, with potential future simplifications considered.
Tuyen:
Batch Hash Computation
-
Implementation:
- Worked on implementing batch hash computations by a single traversal.
- The previous approach involved two steps: committing from top-down and a separate hash computation traversal.
- The new strategy combines committing and hash computation in one step.
-
Benchmark Results:
- Added benchmarks simulating epoch transition with partially modified trees.
- Results showed only a slight improvement in speed, which was unsatisfactory.
-
Integration Challenges:
- When integrating the batch hash computation, it was not as fast as the previous two-step approach.
- Planning to spend more time understanding the cause of the performance issues.
-
Special Node Considerations:
- Needs to address the branch node struct in SSZ, which lazily computes and stores values.
- If unresolved, may switch to using HashTree and revisit this optimization later.
n-Historical State and Other PRs
-
End-to-End Test PR:
- Developed an end-to-end test for n-historical state.
- This PR was blocked due to the proposal boost reorg API, which has now been merged.
-
Pruning Invalid SSZ Objects:
- Worked on a PR for pruning invalid SSZ objects.
- The PR is currently under review, with comments from Nico.
NC:
Reviews
-
Stable Container Consensus Spec PR:
- Reviewed the PR related to the stable container consensus specifications.
-
Anti-Correlation Penalty PR:
- Reviewed the PR for the anti-correlation penalty EIP.
- Spent additional time focusing on the details of the anti-correlation penalty.
-
Next Release and DevNet 1:
- Reviewed all content and PRs slated for inclusion in the next release.
- Implemented the EL trigger consolidation as part of the preparations.
- Planned to post another PR for an updated flow on the validator top-up.
-
Spec Test:
- Awaiting the release of the next consensus spec to run the necessary spec tests.
-
Proposal Boost Reorg:
- Noted a follow-up PR needed for the proposal boost reorg.
Nazar:
Type Refactoring
-
Completion:
- Completed type refactoring across all packages.
- Presented the refactoring work to the team.
-
PR Conflicts:
- After merging the API SSZ PR, encountered two conflicts.
- Currently resolving these conflicts to make the PR ready again.
SIM Test Refactoring
-
User-Friendly and Configurable:
- Working on making the SIM test more user-friendly and configurable.
- Aiming to make it a standalone package soon.
ESLint Research
-
Latest Version:
- Investigated the major overhaul in the latest version of ESLint.
- The latest version is a complete rewrite of ESLint.
- Conducting comparisons and research on the new ESLint features.
-
Future Updates:
- Plans to share findings about the ESLint research next week.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6829
The team discussed the findings from the testing of version 1.19.0-rc.0, highlighting a key issue identified with performance degradation in Docker environments.
-
Issue Identification:
- Detection: The issue was discovered in feature four, where performance metrics showed significant degradation.
- Root Cause: The problem was traced to the last commit checked, which was identified through a thorough debugging process starting from the most recent commits and working backwards.
- Environment Specific: The performance issue occurs specifically in Docker environments, not on bare metal deployments. This was due to differences in how libuv operates in Docker.
-
Debugging and Resolution:
- Efforts: A significant debugging effort led by Tuyen revealed that the issue only manifested in Docker, which explained why it was not detected during initial testing on bare metal.
- Reversion: The problematic changes have been reverted, although some minor CPU time issues remain due to how Swig operates, which are being addressed.
-
Next Steps:
- Inquiry: A letter is being drafted to Ben to seek insights on why the performance degradation occurs in Docker and whether it is related to Node 22 or Docker's thread scheduling.
- Beta Testing: The team plans to revert the problematic PRs and conduct a beta test to ensure stability. Metrics will be monitored to confirm the fix.
-
Release Timeline:
- Tentative Schedule: If the beta testing shows positive results, the release could be ready by tomorrow. However, a cautious approach may lead to a release on Friday, allowing a few days for thorough testing and verification.
Current Testing Challenges
- Limited Coverage: Existing testing groups and processes may not be comprehensive enough to catch all issues, especially those specific to Docker deployments.
- Feature Groups: Testing primarily happens in feature groups, which may not be adequately structured to catch all types of issues.
- Deployment Types: There's a need to cover various deployment types (system, Docker, binary) and different settings (e.g., subscribing to all subnets).
Proposed Improvements
-
Beta and Stable Groups:
- Restructuring: Consider restructuring the beta and stable testing groups to ensure better coverage across different environments.
- Beta Group: Make the beta group a Docker-only deployment.
- Native Layer Testing: For anything that touches the native layer (crypto libraries, hashing, network), deploy it to a feature group as a service deploy and then wrap it in a Docker container on beta.
-
Service Deployments:
- Definition: Service deploy refers to building from source using a GitHub tag versus a Docker tag, ensuring that issues specific to Docker deployments are identified.
- Current Gap: The issue with BLST was only identified in Docker because previous deployments used GitHub tags.
-
Workflow Enhancements:
- Ad Hoc Docker Builds: Create a workflow that allows for ad hoc Docker builds based on specific commits. This would fill a current gap where Docker builds are only triggered on the latest unstable, RC tags, and stable tags.
- Triggering Builds: Implement a system to trigger Docker container builds from specific commits for more targeted testing.
Next Steps
- Open Discussion: Initiate an open discussion to brainstorm and refine the restructuring of the testing process.
- Async Collaboration: Continue the conversation asynchronously to finalize the necessary changes and improvements.
PR Scrubbing June 2024
- PR 6528: Awaiting decision on MaxEB for Electra because we don't know if we're going to reuse MaxEB queues and may not need an unfinalized key cache.
- PR 6479: Awaiting response from external contributor.
- PR 6033: Needs fixing up and dusting off. Tagged for v1.20.0 target
- PR 6483: Worth redeploying to see if there are any changes in metrics. Will also need to test docker deployment on this one.
- PR 5886: Must also docker deploy test this. More PRs coming through to deprecate mplex. Looks ok, but memory is higher. In a state of unknowns where we should deploy and do more testing. Higher priority than 6483.
- PR 6652: Review completed, one more comment to address. Apply a CLI flag for testing and can be used for testing on a Holesky production node.
- PR 6693: Ready for reviews from Tuyen and Cayman. Once merged, we will rebase Electra fork.
- PR 6824: Consensus is that we would feel a lot more comfortable to enable by default only if we had DOS protection measures in place. It was also suggested that we should make our documentation note more prominent about having to manually turn on debug API. Argument is that experienced power users of Lodestar will be able to enable it via flag rather than burdening inexperienced users with this risk.
- PR 6801: Needs a review.
- PR 6796: New way of doing things is using Package Manager. You just need to explicitly set a precise version.
- PR 6669: The team had an extensive discussion about the release page and the overall structure of the documentation, focusing on the need for proper layout and the introduction of a section for binaries.
PR 6669 Key Points
Installation Section and Binaries
-
Immediate Needs:
- With the upcoming release including binaries for the first time, there is an immediate need to properly layout the installation section to accommodate this addition.
- There was consensus on the importance of having a well-defined installation section to guide users effectively.
Documentation Structure
-
Current Structure:
- The conversation diverged into broader considerations about the overall structure of the documentation.
- There was a consensus on the importance of having a coherent and unified structure for the documentation.
-
Proposed Approaches:
-
Bundling with New Layouts:
- One suggestion was to bundle the new pages, including the binary installation, with the new documentation layout that is close to completion (referencing issue 6550).
- This approach aims to prevent the need for future redirects and broken links, promoting a more seamless and user-friendly experience.
-
Piecemeal vs. Unified Update:
- There was a preference for updating the documentation in a unified manner rather than a piecemeal approach to avoid incoherent structure and ensure a cohesive user experience.
-
Bundling with New Layouts:
Input and Consensus
-
Team Preferences:
- The team largely agreed on implementing the documentation updates in a unified manner, as this is close to being finalized and would offer a more organized and logical structure.
- Nico's input was acknowledged as pending, but there was a general agreement that the proposed approach aligns with the overall goals for the documentation.
Content Layout for Binaries
-
Binary Installation Structure:
- Discussion on whether to list binaries by version or by platform. Most team members leaned towards listing by version as it is common practice and straightforward for users.
- Temporary Solution: Until a final structure is agreed upon, the team decided to point users to the GitHub release page where binaries are hosted, providing a simple and immediate solution.
- Future Adjustments: The team agreed to revisit and refine this structure if necessary, adding new PRs to implement any changes.
Proposal from Tuyen after v1.19.0-rc.0 investigation
-
Current Challenge:
- Investigating performance issues can be time-consuming, particularly when the beta environment is not actively used.
- A recent issue related to file build problems, identified about a month ago, highlighted the instability of the "unstable" branch.
Proposed Solution
-
Weekly Checkpoint Commits:
- Tuyen suggested deploying checkpoint commits to the beta environment at the end of each week.
- This approach would help trace which pull requests (PRs) cause performance issues by the time of release.
- By maintaining a history of these checkpoints, it becomes easier to identify the origin of performance issues based on weekly deployments.
Team Feedback
-
Clarification and Support:
- The idea involves using checkpoint commits to track performance changes over time.
- Deploying both source and Docker versions for any native-related or special PRs was discussed as a complementary strategy.
- The proposal is seen as a fallback to ensure issues do not slip through the cracks, providing an additional layer of monitoring.
Implementation
-
Practical Application:
- If implemented, the team would deploy to beta weekly, providing a clear timeline to identify potential performance regressions.
- This method would enhance the ability to pinpoint when and where issues arise, making debugging more efficient.
Nazar completed an overview presentation on improve types package to use forks as generics #6825
. Video available upon request.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6803
Stable Container Update from Interop
- Status: Received unofficial approval for stable container signaled by client teams.
- Action: Planning to push changes on Thursday ACDC call.
Relevant EIPs
-
EIP-7688:
- Description: Adds stable containers for various components (beacon state, beacon block body, execution payload, attestation, index attestation).
- Scope: Affects consensus data structures.
- Effort: Minimal, completed in less than a day.
-
Consensus:
- Lighthouse: Favorable
- Nimbus: Favorable
- Lodestar: Implemented and interopping with Nimbus on a fork of Pectra with this EIP added.
- Next Steps: Push changes for inclusion on ACDC Thursday.
-
EIP-7493:
- Description: Converts EL transactions into SSZ-ified transactions.
- Scope: Mainly affects the EL side.
-
Consensus:
- Seen as a "nice-to-have" but not critical.
- Lacks strong consensus for inclusion in Pectra.
-
Next Steps:
- Push for inclusion in the next all-core devs call.
- Potential outcomes:
- First Best: Included in Pectra.
- Second Best: Included in Verkle.
-
Rationale:
- SSZ transactions align with the theme of formatting the state tree.
- Suggested as a secondary feature alongside Verkle.
Conclusion
- The stable container feature is progressing well, with EIP-7688 poised for implementation.
- EIP-7493 faces challenges in gaining consensus but will be pursued in upcoming discussions.
Interop Post-Mortem Reference: https://github.com/ChainSafe/lodestar/issues/6781
- Docker Image builds slowly:
- https://github.com/ChainSafe/lodestar/pull/6787 will allow us to build it offline due to bad network conditions and for speed. May want to rely on another base image.
- Github CI not caching and downloading spec tests every time:
- Issue: GitHub CI is not caching spec tests and is downloading them every time.
- Context: This issue arose during debugging but is still observed intermittently.
- Target Branch CI: CI is not running on the target branch.
-
Cache Access:
- Cache results from another branch are not utilized.
- If two branches have the same test and cache key, the cache is not shared unless it’s on the target branch.
-
Branch-Specific CI:
- CI works on the
unstable
branch but not on other branches. - Possible that CI hasn’t been run on the
electra fork
branch due to merge conflicts.
- CI works on the
-
Observations:
- Even on the same branch, publishing does not seem to share the cache.
- Cache sharing appears to be dependent on running CI on the target branch or the default branch.
-
Next Steps:
-
Testing: Test running CI on the
electra fork
branch to see if caching works.
-
Testing: Test running CI on the
-
Action: Test running CI on the
electra fork
branch to confirm if it resolves the caching issue. - Comment: Further discussion and testing are needed to identify a reliable solution based on GitHub's caching behavior.
- Electra-fork cleanup and merging:
- We should complete all remaining TO-DOs in that branch
- We can discuss having PRs to unstable once the implementation is more stable.
- Need to also focus on unit, E2E and sim tests for these new features.
- We should consider breaking up the PRs and review carefully.
- We should start merging stable items like SSZ types as an example
- State upgrade issue:
-
Issue: The function to update the state to
electra-fork
is causing segmentation faults or other breaking issues. -
Context: The current method involves getting a state view for Electra and assigning default values, which is not working.
-
Current Method Process:
- Obtain state view for Electra.
- Assign default values.
- Problem: This approach leads to segmentation faults or other errors.
-
Temporary Fix:
- Hard coding by creating an empty default view.
- Manually copying required fields from the Deneb state.
-
Current Method Process:
-
Detailed Observations: State Upgrade Tests:
- Using the function
upgradeStateToElectraOriginal
(the original method) in state upgrade tests results in breaking when accessing properties of the Electra view.
- Using the function
-
Temporary Solution:
- Created a new function
upgradeStateToElectra
. - This function creates an empty default view and copies data from the Deneb state.
- Created a new function
-
Issues with State Transitions:
- Similar issues noted in
upgradeStateToDeneb
. - Comment in
upgradeStateToDeneb
indicates a hack to avoid failures when accessing certain properties. - Both
upgradeStateToDeneb
andupgradeStateToElectra
need further investigation.
- Similar issues noted in
-
Action Items:
- Investigate why obtaining an Electra preview causes errors.
- Determine if changes in state trees over time are affecting the process.
-
Assignees:
- Cayman or Tuyen: Investigate the preview issue.
- Tuyen: Look into creating a separate issue for this in Lodestar or SSZ repositories.
- Immediate Fix: Continue using the hard-coded method to ensure state upgrades work until the issue is resolved.
-
Issue: The function to update the state to
- Add APIs to dump caches to more easily observe the cached items:
- Agreed by Cayman. We have endpoints to dump certain caches, we have previously used this technique.
- Conformance testing of Engine APIs:
- One of the bugs discovered at interop were naming differences. Doing conformance testing here similar to the Beacon APIs, we can get more certainty about errors.
- Need e2e tests with more than 1 node + multiple committees per slot for electra
- Need to add oppool metrics to grafana
- Investigate poor sync committee block packing after skip slots
- Tuyen believes that the highest priority here is to get Electra fork branch to a good state to not break current spec tests.
v1.19.0 Planning
- Upcoming SSZ Beacon APIs should be pushed to v1.20.0 to not coincide with large upgrades for v1.19.0. Let's achieve it on
unstable
within the next 2 weeks.- Awaiting checkpointz integration of some PRs, will also need those serving checkpoints to upgrade
- Potentially add workaround
- Large upgrades include:
- BLST-TS upgrade
- Binaries release
- Aim to have v1.19.0-rc.0 by Friday for weekend testing
- Merge BLST spec tests
- Merge switch from BLS to BLST
- Merge Node 22 upgrade (It has been tested by Nazar. He will capture metrics and post in the PR)
Agenda: https://github.com/ChainSafe/lodestar/discussions/6678
v1.18 Concerns and libp2p TCP Upgrade
- Primary concern discussed was the beacon attestation performance issue related to the subscribe on subnet, which appears to be a consequence of upgrading libp2p TCP.
- Currently deployed a reverted version on feature one for comparison against the Release Candidate (RC) running on CIP validators.
- Need more data to make a decision as the changes have only been live for 12 hours.
- Matthew has deployed nodes with and without the upgraded libp2p TCP to compare performance. Noted differences in performance indicating potential issues caused by the upgrade.
- Concerns about other libraries updated in the
yarn.lock
file which might also affect performance.
100 Peer Count Discussion
- Discussion on whether to revert the TCP fix and the increase to a 100 peer count due to a memory leak issue not being resolved by the updated libP2P TCP.
- Despite the memory leak, the increased peer count appears to improve the inclusion of beacon aggregates and general network performance.
- A proposal to keep the 100 peer count based on better performance metrics and feedback from validators, although there's an ongoing concern about memory leaks potentially causing node crashes.
Rocketpool Validators
- We are now running Rocketpool validators for the team, which have been performing well.
Grant Applications
- Lodestar decided to apply for the Libp2p RetroPGF Grant
- Lodestar is also participating in Gitcoin Grants Round 20 featured in the Infrastructure category.
Pectra Devnet-0 Updates:
EIP-7549 Implementation:
- Status: Approximately halfway completed.
-
Details:
- Beacon node implementations are complete.
- P2P/Gossip implementations are nearly complete.
- Validator-related implementations are pending, expected to take an additional 3-4 days before starting specification tests.
EIP-6110:
- Status: Implementation is complete and included in DevNet Zero.
-
Details:
- Future changes may involve the reuse of the MaxEB deposit queue for processing deposits, but these are not part of the initial DevNet testing.
EIP-7002:
- Potential Changes: Inclusion of partial withdrawals by MaxEB.
-
Details:
- Implementing partial withdrawals associated with MaxEB is considered straightforward and can be added quickly depending on the scope decided for MaxEB in the DevNet.
EIP-7251 MaxEB Implementation:
- Waiting for a new release of SSZ (Simple Serialize) needed for MaxEB implementations.
- Spec tests to start following minor code adjustments post-SSZ update.
Devnet-0 Readiness:
- Lodestar is nearing readiness to integrate with an Execution Layer (EL) client that has implemented all necessary features, although it's unclear which EL client (potentially Geth or Ethereum.js) is also DevNet ready.
Matt:
Review and Revision Process
- Extensive Review: Matt has undergone 17 rounds of reviews from Nico, appreciating the thoroughness provided.
- Final Touches: The discussions are down to final considerations, such as whether to throw an error or use a console warning in the apply process discussed earlier.
- Detailed Collaboration: A significant review session was conducted with Cam, including a couple of hours on the phone, reviewing this and other repositories to ensure all aspects were covered.
- Completion: With the last comment now addressed on GitHub, the PR is ready to be approved and merged.
NC:
- EIP-7549: NC's primary focus has been on understanding and implementing this EIP.
- Attestation Mechanics: He is delving into the current caching mechanisms used for attestations, including the attestation seen cache and attestation data encoding (base 64).
- Code Familiarization: NC needs to spend more time understanding the public code base to continue effectively implementing the EIP.
- Consensus Spec Fixes: While implementing EIP-7549, NC has also made some minor fixes to the consensus specifications.
Tuyen:
-
Historical State Issues:
- Initial Problem: Encountered an issue with n-historical state when configuring zero checkpoint state and one single block state, which led to difficulties in reaching a head state due to assumptions in the reload process.
- Resolution: The issue was resolved, and further improvements were made to the state caching mechanisms.
-
New Caching Strategy: Renamed and restructured caches into
BlockState
cache andCheckpointState
cache to enhance clarity and efficiency.
- Persisted State Metrics: Implemented a new metric related to persisted states. However, Tuyen expressed a preference to avoid testing this n-historical state in CIP nodes for the upcoming release, opting instead for a two-week stability period without issues before proceeding.
- Single Instruction, Multiple Data: The SIMD PR is ready and awaits review. Tuyen had discussions with Gajinder regarding the naming within the PR, signaling it's prepared for evaluation by Cayman or Gajinder.
-
Merkle Tree Hash Optimization:
- Inspiration: Tuyen was inspired by a presentation at DevCon about the optimal methods for hashing Merkle trees, particularly the advantages of batch hashing for flat array structures.
- Current Approach: Previously, Tuyen considered a different approach involving grouping hash computations by tree level from the root to the left node.
- New Potential Approach: After learning about batch hashing, Tuyen is considering grouping hash computations by level before committing, which could enhance performance and efficiency but requires closer examination due to its complexity.
Nico:
- v1.18 Finalization: Nico has been finalizing the necessary components for the v1.18 release.
- REST API Performance Investigation: Noted outliers in performance metrics for some REST API calls, indicating a need for further data and analysis to identify the causes of delayed response times.
- Data Conversion: Identified an issue related to how buffers are converted in multiple areas, leading to opening a review issue to address this properly.
- Nimbus Compatibility: Most compatibility issues have been resolved. However, there remains a problem with Nimbus, partly due to their side and partly because Lodestar does not support SSZ for the request body. More debugging is needed to resolve these issues fully.
-
SSZ Refactor:
- Nico plans to pick up the SSZ refactor branch, aiming to prepare it for a pull request and ensure it's build-ready.
Gajinder:
PeerDAS Development
-
Generalization of Block Input:
- Identified a need to make block input handling more generic and fork-aware to accommodate the requirements of PeerDAS and potential future needs in ePBS/IL.
- Implemented changes through a series of pull requests:
- The first PR includes the addition of blob metrics and some cleanup.
- A subsequent PR generalizes the block data to be fork-aware.
- Gajinder plans to continue his development work on PeerDAS.
-
Electra Readiness:
- Alongside ongoing projects, Gajinder will also focus on ensuring that systems and implementations are ready for the Electra upgrade.
Cayman:
-
BLS PR Collaboration:
- Worked closely with Matt on the BLS PR, conducting an in-depth review and collaboration session.
- Spent time catching up on notifications and conducting a number of smaller code reviews throughout the week.
-
Yamux Branch:
- Re-pushed the yamux branch to feature 3, expecting to gather relevant data soon.
- Highlighted an urgency due to potential deprecation of Mplex, stressing the need for expedited resolution and implementation.
-
Electra EIPs Engagementn:
- Started to delve deeper into the Electra EIPs to better understand the upcoming changes and enhancements.
- Expressed a desire to become more actively involved in specification discussions and PRs related to Electra.
- Aims to be well-prepared and hands-on for upcoming interop events, knowing the importance of being closely familiar with the code.
-
Dependabot PRs:
- Managed to merge several dependabot PRs after resolving a permissions issue related to code coverage token, which is managed separately for dependabot in the repository.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6653
- Discussion on the progress of v1.18, nearing completion, awaiting final updates.
BLST Integration Update
- Matt/Tuyen updated on the BLST integration:
- Workflow issues resolved with recent commits addressing bugs in the publishing flow.
- A small PR is up for approval to finalize testing code integration.
- Target to merge and move to beta testing phase soon.
API Updates
-
Finalized Property to API Responses:
- Nico has implemented suggested fixes and is awaiting further review (from Tuyen or NC).
-
POST Methods for State Validators and Balances:
- Discussion on implementing POST methods to address Teku incompatibility issues.
- This includes a fallback mechanism for 404 errors which Teku handles incorrectly with the current setup.
- Possible integration of these methods pending review of Beacon API PR 440 and consensus on implementation.
Release Planning
- The team discussed pushing the current build to beta as an RC1 soon, even if some features (like POST methods) are added later.
- Considerations for including the builder boost factor in the release:
- The current setup will proceed with default boost factor settings unless explicitly configured otherwise.
- A warning might be added for users setting the boost factor flag without using the max profit setting, as it will be overridden.
Heap Memory Snapshots Issue
-
Nico was thanked for obtaining the heap snapshot, which is critical for diagnosing memory issues.
-
Tuyen suggested the need for specific infrastructure to effectively manage heap dumps
-
The team is using Hetzner AX-11 servers, which are currently unable to handle the heap dump requirements.
-
Discussion centered on the errors encountered during the heap dump process:
- The main issue identified was related to memory requirements when writing the heap dump to the file system, leading to a system crash due to out-of-memory errors.
- The process requires doubling the memory temporarily, which was observed in metrics.
-
There was confusion about the term "swap" memory, clarified as a method to extend memory capacity using disk space, which is not currently configured on the servers.
-
Proposed Solutions
- Suggestions to adjust server configuration to include swap memory or to utilize streaming the heap dump to disk to avoid high memory consumption all at once.
- The feasibility of using API functions that allow heap dumps to be written as streams rather than a single large block was discussed.
- Matt will review the code to identify potential changes in the function calls used for generating heap dumps.
- Plans to collaborate with Faith to explore necessary infrastructure adjustments to facilitate ongoing heap capture without system overloads.
-
The Ethereum Foundation (EF) has released a specification for Pectra Devnet zero, which includes several EIPs.
-
Gajinder and NC are focusing on implementing EIP-7002 and Max EB respectively, as part of the preparations for an upcoming interop event.
Pectra-Devnet-0
-
Gajinder's Progress:
- EIP-7002 is mostly completed; remaining tasks include testing with an execution layer that supports EIP-7002.
- Current work involves fixing types and resolving any failing tests.
- The implementation is expected to be pushed for further progress within the week.
-
NC asked if the current submission of EIP-7002 is ready for review, to which Gajinder confirmed its readiness and highlighted the simplicity of the required changes, mainly processing exits.
-
Gajinder suggested that if NC handles the attestation updates, he could take on the PeerDAS implementation.
-
Inclusion Lists: Technical Challenges and Solutions
-
Reorganizations (Reorgs) and IL Construction: The necessity of constructing an IL during a blockchain reorganization (reorg) was highlighted if one is not already available. This is particularly significant when proposing new blocks after a reorg.
-
ILs and Blobs: Treating ILs similarly to blobs (block-like objects) could simplify their synchronization and maintenance, ensuring they are readily available as needed.
-
EIP-3074 Complications: Gajinder addressed complexities related to EIP-3074, which includes handling transactions that could drain wallets and those that are interdependent. These complexities affect the dynamics of transaction processing and the potential for balance-draining transactions that alter the conditions under which subsequent transactions are processed.
-
Synchronization of ILs: By always synchronizing ILs, the need to reconstruct them is eliminated, simplifying the handling of reorgs.
-
Validity Conditions Update: Validity checks for new blocks with respect to ILs have been adjusted to consider scenarios where balances might be drained by EIP-3074 compliant transactions. This update reflects changes in transaction eligibility based on account balances at different block heights.
-
POC Development: Gajinder has developed a POC that does not treat ILs like blobs, but he noted that this approach could be adapted with minimal adjustments. He also mentioned the potential for restarting development based on the settled design of ILs.
-
Focus Shift: If Electra development stabilizes without further requirements for ILs, Gajinder plans to concentrate on enhancing the IL implementation and possibly extending the POC.
MaxEB Discussion
-
NC and another team member participated in a call about MaxEB, deciding to continue with the current specification despite ongoing discussions about potential modifications.
-
Stakeholder Concerns: Lido has raised issues regarding the inability to control validator consolidations due to a misconception that all validators with the same withdrawal credentials are fungible.
-
Lido's need for more control over consolidations has led to a proposal for execution-initiated consolidations, which is seen as adding complexity to the system.
-
Complexity Debate: There's a disagreement on the complexity introduced by execution-initiated consolidations. Some believe it adds significant complexity, while others, including Gajinder, see it as manageable and similar to existing mechanisms.
-
MaxEB Code Updates: The MaxEB specification has been rapidly changing, requiring continual updates to the codebase to stay aligned with the latest consensus specifications.
-
Current implementation includes consolidation features, which may be removed depending on future spec changes.
EIP-6110 Considerations
-
Potential Integration: Discussion on integrating EIP-6110 to utilize a pending deposit queue in the beacon state, which could simplify handling of new validators and reduce the need for an unfinalized public key cache.
-
This approach would delay the processing of deposits but is seen as potentially reducing code complexity.
-
Attestation Data and Tracking:
- The only other major pending update is related to attestation data.
- A tracking system for changes, particularly around epoch calculations, is discussed to ensure the code remains up-to-date with spec modifications.
Increasing Default Peer Count to 100
- The team discussed the final decision on peer counts, specifically whether to increase the default setting to 100 peers. This follows from experiments some team members conducted but had not yet been tried on the CIP validators.
- Performance Improvements: Individuals like Nico and another team member have manually increased their peer counts to 100, observing notable improvements in network performance. For instance, beaconcha.in showed an increase in effectiveness from around 96-97% to 99%.
- Memory Concerns: Increasing peers to 100 has been associated with a slow memory leak, with memory usage incrementally rising by about 3-4GB over a month. However, the system has not crashed due to memory overflow, as the usage seems to cap (e.g., max observed at 10GB).
-
Risk vs. Benefit: The main discussion point was whether the improvement in data availability and network performance justifies the potential memory leak risk.
- It was noted that memory issues were also present at lower peer counts (e.g., 50 peers), and manual restarts or container reboots every 10-15 days have been effective mitigations.
- Community Practices: Evidence suggests that many users are already manually setting their peer count to 100, indicating a common practice and preference within the community.
-
Compatibility and Testing:
- The team has not yet tested the 100-peer setting with the new BLST on memory in the network thread.
- Tuyen noted the increased
seen_ttl
setting's impact on heap memory already
- Version Updates: Concerns were raised about users who do not frequently update their software, potentially facing issues if they run older versions with higher default peer counts.
Decision
- Given the benefits and common user practices, the consensus was to increase the default peer count to 100 in the upcoming v1.18 release.
- The team agreed to continue investigating and addressing the underlying memory issues in parallel with this change.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6615
Agenda and Release Planning
- The team is preparing for the v1.18 release and is assessing which features can be included as progress has been slower than expected.
- A release candidate is targeted for the end of the week, with hopes to push the update next week, marking three weeks since the last release.
Peer Count Increase
- There was a discussion about increasing the default peer count despite a known memory leak. The possibility of making this change optional via a flag was discussed, allowing users to increase the peer count at their discretion.
- Concerns about memory impact persist, and further investigation into heap snapshots is needed to better understand the issue.
- Agreed to test this on a smaller subset of mainnet nodes to see performance impacts, such as our CIP fleet.
Feature Flags and Testing
- The team considered allowing users to control settings such as peer count increases through the UI of platforms like DappNode.
- We will defer our default peers increase to 100 until we fix the memory leak and have better metrics on the impact.
- The feasibility of implementing feature flags for testing and gradual deployment of new features was discussed.
Memory and Performance Issues
- Tuyen expressed concerns regarding memory usage and has requested heap snapshots from DevOps to diagnose issues.
- Discussions covered the logistics of obtaining these snapshots, including access and permissions, to ensure they can be analyzed effectively.
Metrics and Testing
- Ongoing testing and review of new features' impact on system performance and stability are crucial, especially concerning memory usage and peer connections.
- The team is cautious about implementing changes that could lead to significant issues, particularly in areas like memory management, and seeks to gather more data before proceeding with certain updates.
Gossip Sub Batch Publish Review
- The Gossip Sub Batch Publish feature needs a final review to be included in the upcoming v1.18 release. It is nearly ready and just requires final checks.
Remote Signer and Token File Configuration
- Nico's Update: The Remote Signer feature is almost ready, pending a final review.
- Token File as an Alias: There was a finalized discussion about using a token file as an alias to simplify key manager configuration for DevOps, aligning with specifications.
Invalid Signatures and Block Production
- Tuyen's PR: A pull request is ready that addresses an issue with invalid signatures by ensuring correct head data when producing blocks. This PR is crucial for maintaining block production integrity and is straightforward.
Proposer Boost Reorganization
- Clarification that the proposer boost reorganization is not blocked by other PRs and is not a top priority right now.
- It is considered ready for merging as it does not interfere with current operations or fork choice performance and acts as standalone code.
Additional PRs and Cleanup
- Several PRs are open, including one for n-historical state issues related to end-to-end testing, which is still in draft but aimed to be finalized soon.
- A quick review of open PRs, including dependabot updates, is needed to clean up the repository before the next release.
Release Planning
- The team plans to push a release candidate for v1.18 by the end of the week, aiming to include all reviewed and ready features.
- A cautious approach is being taken regarding increasing default peer counts due to unresolved memory leak concerns, opting instead to allow users to adjust this setting manually.
Builder Boost Factor Discussion
- There was general consensus from a Twitter thread that setting healthy defaults for the builder boost factor is favorable. This allows clients some control over the settings, even if these are not entirely neutral.
- Some participants felt that setting a 90% boost factor is ineffectual and does not substantively alter the current dynamics, suggesting it might be redundant to implement.
- Lighthouse, another client, allows users to set their builder boost factor, which could be a preferable approach as it empowers users to determine their settings rather than having a preset default.
- It was suggested to rename the boost factor setting to something like "default" or use another alias that indicates setting the boost factor at 90% as the default. This renaming would clarify the purpose and expectation of the setting.
- The discussion highlighted the technical challenges of using a relative percentage value for the boost factor. It was noted that such a setting impacts high-value and low-value blocks equally, which might not be optimal. The suggestion was to consider using a minimum bid (min-bid) setting or a maximum delta flag as additional parameters to refine the decision-making process for block selection.
- The boost factor is seen as a tool to favor local blocks when there is no significant maximal extractable value (MEV) to be gained from builder blocks. This approach supports local block production when it aligns closely in value with builder blocks, promoting fairness and reducing potential censorship.
- A detailed analysis of block values during high and low network traffic times was suggested to better inform the setting of the boost factor. This would help establish a more nuanced approach that could dynamically adjust based on network conditions.
Docs Versioning Update (PR 6559)
-
Current Status: The migration PR for the documentation versioning has been merged.
-
Next Steps: The Docusaurus implementation will be published in the next release, enabling version-specific documentation features.
-
Technical Detail: The PR includes an empty array setup in a JSON file, which is a preliminary step. This setup allows Docusaurus to start displaying a versions feature on the documentation website. Future updates can populate this array to specify which versions are supported.
-
PR 6528: This PR involves renaming cache and related functions to better indicate their purposes post-finalization. It is part of a series of follow-ups planned after the initial implementation of EIP-6110.
-
Branch Concern: The PR is aimed to merge into the Electra fork branch, not the unstable branch, which may require adjustments or specific reviews.
Electra Fork Branch Maintenance
- The branch is currently named
electra-fork
. There was a suggestion to rename it tofeature/electra-fork
for better visibility, but it was noted that this branch is specifically a fork branch, not just a feature branch. - The Electra fork branch is being maintained separately from the main unstable branch due to the ongoing changes and uncertainties in the specifications and EIPs related to the Electra upgrade.
- Longevity of the Branch: There is concern about the branch being long-running, which could complicate maintaining parity with the unstable branch due to the need for frequent updates and rebases.
- The integration of the Electra fork into the main unstable branch is delayed until the specifications are more stable and finalized. This approach avoids the complexities and potential errors that could arise from premature integration.
- Lead Maintainer: Gajinder is noted as the primary maintainer of the Electra fork branch, ensuring it stays updated.
- Merging Strategy: The current plan is to avoid squash merging the Electra fork into unstable. Instead, a rebase strategy will be employed to maintain a clear commit history. This method helps in keeping individual contributions visible and simplifies the management of changes.
- Conflict Management: Regular rebasing is performed to minimize conflicts. When conflicts do occur, they are managed on a case-by-case basis to maintain a clean branch that can be merged into unstable when appropriate.
- Clean History: Rebasing is preferred because it keeps the git log clean and straightforward, which is beneficial for reviewing historical changes and conducting diffs.
- Ease of Cherry-Picking: A well-maintained rebase flow makes it easier to cherry-pick changes as needed without the clutter of unrelated modifications.
- Visibility of Contributions: By not squash merging, all developers' commits remain visible in the branch’s history, acknowledging their contributions.
- Ease of Integration: A clean and regularly updated branch through rebasing allows for smoother eventual integration into the main unstable branch.
- Previous Merges: The approach taken with the Electra fork is similar to past practices, such as the integration of significant features well before mainnet forks to ensure stability and thorough testing.
- The maintenance of the Electra fork branch is a strategic choice to cope with the fluid nature of upcoming network upgrades. By keeping this branch separate and employing a careful rebase strategy, the team ensures that the main unstable branch remains stable and that the Electra upgrades can be integrated smoothly once specifications are finalized.
Cross-Client Compatibility Issues
- Cross-Client Testing: The Ethereum Foundation (EF) DevOps team is emphasizing cross-client testing to identify compatibility issues among different beacon nodes and validator clients. This testing aims to uncover discrepancies in protocol implementations or misinterpretations of specifications that could hinder interoperability with clients like Lighthouse or Teku.
- Vouch Compatibility: There are intermittent issues with Vouch, particularly with aggregates. It's suspected that Vouch's issues stem from not consistently interacting with the same beacon node for attestation and aggregate requests, leading to cache misses and errors.
- Priority of Compliance: Identifying and resolving deviations from the protocol specifications is considered a high priority. An example given was a misinterpretation of a query parameter default value, which was corrected to align with the spec.
SSZ Beacon API Support
- Current State: There are ongoing efforts to address issues with the SSZ beacon API, particularly around supporting SSZ for V2 blocks. The complexity of these issues has led to a suggestion to temporarily remove SSZ support from certain APIs to expedite other updates.
- Implementation Strategy: It's suggested that SSZ support could be simplified by limiting it to essential APIs, reducing the immediate workload and focusing on stabilizing the core functionalities.
Obol and Diva Staking Compatibility
- Testing Infrastructure: Plans are being made to integrate consistent testing for Obol compatibility and to include Diva Staking in internal tests. This is to ensure Lodestar maintains compatibility with these implementations and to quickly identify any integration issues.
- Diva Staking Setup: There are challenges in integrating Diva Staking into continuous integration (CI) systems due to its closed-source nature and the lack of a simple setup for development environments. It's mentioned that Diva might provide a simpler setup in future releases which could be integrated into CI.
Tuyen:
- Gossipsub Improvement: Implemented a PR to improve publish delays by ensuring messages are published to at least a certain number of mesh peers, inspired by Lighthouse.
- n-Historical State: Addressing a bug in the n-historical state feature with a PR still in draft, planning to finalize the entry and conduct tests.
- Shuffling Optimization: Collaborating with Matthew to explore offloading the shuffling process to a worker thread, a task more complex than initially thought, requiring further brainstorming.
- Project Planning: The shuffling optimization work has been moved to a future version target to allow for thorough development and testing.
Nazar:
- EIP 4844 Testing: Opened a PR to integrate EIP 4844 tests into the SIM environment, moving away from separate EIP interop tests.
- Web3.js Support: Added support for a new transaction type in web3.js that accommodates blob transactions, enhancing testing capabilities in SIM environments.
- Builder PR Assistance: Requested help for a long-standing PR related to the builder, planning to consult with Gajinder to resolve issues regarding the flashboard builder's performance.
- Light Client and Prover Packages: Plans to open an issue to manage multi-version packages for the lite client and the prover in both package managers and browsers, aiming to start this task within the week.
NC:
- MaxEB Completion: Focused on completing the final 10% of the MaxEB implementation.
-
SSZ Repository Update: Plans to add a
sliceFrom
function to the least composite preview in the SSZ repository, noting that the implementation will differ significantly from the existingsliceTo
function and will require dedicated time to develop.
Julien:
- Docusaurus migration is merged, waiting for next release to be deployed; experimented with
light-client
usage in Docusaurus, some complexity left - Ddded missing headers for publish block requests
- Some DX improvements
- Started looking at unexpected high latency for some requests
Agenda: https://github.com/ChainSafe/lodestar/discussions/6553
BLST Updates
- BLST Rebuild Branch: Close to completion, awaiting another round of review. Despite the large diff, much of it has been reviewed by Gajinder. The integration of this branch is seen as a catalyst for the next release.
n-Historical States and Target Peers
- n-Historical States: There's interest in merging the n-historical states work behind a feature flag for v1.18.
- Target Peers Increase: The default target peers are likely to be increased to 100 based on preliminary reviews showing no significant increase in heap size, just more garbage collection in the network thread.
Proposal Boost and Testing
- Proposal Boost: Deployed on a feat2 group for testing, with coding and PR comments addressed. Further testing planned to ensure block production with n-2 parent root and to potentially write an end-to-end test for late block reorg by Tuyen.
Optional pushes for v1.18+
- PR 6033 Historical State Regen: Cayman plans to address comments and prepare the PR for a more mergeable state before his leave. The inclusion of this feature in v1.18 or v1.19 is up for discussion, seen as a nice-to-have rather than critical.
- Potentially look at including Matt's async shuffling refactor.
Issues with SSZ Releases
- With Cayman being away, Phil now has access to manually publish releases for SSZ incase of CI failure.
Cayman:
Merkle Tree NAPI-RS Experiment
- The experiment aimed to migrate all persistent Merkle tree code into Rust, utilizing the napi-RS ecosystem. This involved creating a Rust implementation of the persistent Merkle tree and wrapping it with a small NAPI layer for JavaScript interaction.
- The work was done in a branch named
cayman/napi-merkle-node
within the SSZ repository: https://github.com/ChainSafe/ssz/tree/cayman/napi-merkle-node - Unfortunately, the experiment did not yield the expected improvements. The Rust implementation resulted in significant slowdowns, with the beacon node struggling to sync and keep up with the chain due to the reduced performance.
- A suspected cause of the slowdown is the overhead of allocating temporary pointers to nodes, which negates the memory savings from pre-allocating nodes. This creates a dilemma where the goal of saving memory leads to decreased performance, without an apparent solution to balance both aspects efficiently.
- Cayman plans to document the experiment's details and findings in an issue for future reference. This will allow the team or interested individuals to revisit the experiment and potentially explore alternative approaches.
- The experiment is currently on hold, with the branch available for review. The team may consider revisiting this approach in the future if new insights or strategies emerge to address the identified challenges.
Tuyen:
n-Historical State Feature Flags:
- Tuyen has been working on one of the final tasks for the n-historical state, which involves creating feature flags to utilize new state caches. This work is progressing well.
SSZ Serialization Improvements:
- A PR that changes the type of balances for faster serialization has been merged.
- Another PR aims to change the type of validators to speed up serialization further. Incorporating these changes has shown promising results:
- Serialization of the Holesky state takes around 350ms, and less than 300ms for the Mainnet state, which is considered efficient enough to perform per epoch.
- This serialization typically occurs during the last third of the first slot of each epoch, ensuring it does not interfere with critical processing times.
Memory Usage Improvement:
- The n-historical state implementation appears to reduce the heap memory usage by approximately 1GB, indicating a significant optimization.
Proposal Boost PR Review:
- Tuyen has also reviewed the proposal boost PR code, contributing to its refinement.
Gossipsub Metric Fix:
- A PR has been submitted to address a broken metric in gossipsub, aiming to improve the accuracy and reliability of network metrics.
Gajinder:
-
Current Focus: Gajinder is deeply involved in the inclusion lists (IL) proof of concept (PoC) work, aiming for initial integration this week. The work is based on a primitive but workable spec.
-
Spec Status: Most of the design for IL has been finalized, with ongoing iterations on specific details, such as whether certain elements within the execution header need to be signed. The consensus is leaning towards requiring signatures to validate the execution payload's summary at the beacon layer.
-
Design Changes:
- The design has shifted from a bundled approach for transmitting the inclusion list P2P to an unbundled approach, as advocated by Potuz.
- Recent modifications include removing the need for the execution layer to keep parent spenders for validating the parent inclusion list summary. This change simplifies the execution layer's requirements.
-
Inclusion List Sync Mechanism:
- The mechanism for syncing inclusion lists differs from that of blobs. Blobs are essential for block import, whereas inclusion lists are not required unless operating at the head of the chain.
- During sync, inclusion lists are not needed for beacon blocks by range, as the child block should satisfy the parent's inclusion list. However, inclusion list availability is crucial on gossip when importing a block, as blocks cannot be attested to or built upon without available inclusion lists.
-
Fork Choice Extension:
- Gajinder is extending the fork choice with an additional flag to indicate the inclusion list status, which will help manage the new scenarios introduced by IL.
- The fork choice will confirm the validity of a block and its ancestors' inclusion lists. However, the availability of an inclusion list for a block validated by the execution layer must be independently verified.
-
Handling Invalid Child Blocks:
- A scenario where a chain has an invalid child block could occur if the execution layer initially indicates syncing status before later deeming the block invalid.
NC:
-
Plans to draft the engine API for the Electra PoC. A PR is expected soon.
-
Reviewing Technical Writing: NC is reviewing detailed educational material by Emmanuel, a technical writer active in the ETH R&D community. Emmanuel's work includes in-depth coverage of EIPs, including Max EB and inclusion lists.
-
MaxEB Spec Review: NC reviewed Lion's MaxEB spec, noting the absence of slashing penalty calculations, suggesting it's not finalized. Despite this, the overall spec appears solid, and NC has started coding a PoC for MaxEB.
-
Focus for Coming Week: NC plans to focus on testing the proposer boost with the aim of merging it soon.
-
Slashing Penalty Concerns: The discussion highlighted concerns from large node operators regarding the slashing penalty under MaxEB. Suggestions include adjusting the penalty calculation to be less severe for operators consolidating validators, possibly using a logarithmic scale related to the maximum effective balance or a fixed scale based on the number of validators.
-
MaxEB Discussion: The MaxEB topic seems less contentious than inclusion lists (IL), but there's uncertainty about proposed changes to slashing penalty calculations. NC is not aware of any new proposals addressing these concerns.
Nazar:
-
Integration of Production Ready Builder: Nazar has been working on integrating a production-ready builder into Lodestar's simulation tests to ensure future stability. This process revealed a bug in the attestation API related to incorrect slot numbers for future epochs, which has since been fixed.
-
Flashbots Builder and MEV Boost Layer: Nazar discovered that the Flashbots builder is designed to interact with consensus clients indirectly through an MEV Boost layer, rather than direct calls from the consensus client to the builder. This requires running the MEV Boost alongside the Flashbots builder in the simulation test environment to facilitate proper communication and payload delivery.
-
Experimentation and Communication: Nazar is experimenting with this setup and has sought clarification from the Flashbots team on Discord. The goal is to fully integrate this flow into Lodestar's simulation tests, enabling comprehensive testing of builder functionality, including scenarios where engine API is down or block proposals are builder-exclusive.
-
Pending PR Review: Nazar highlighted a pending PR (#6507) that has been awaiting review for two weeks. This PR involves moving the withdrawal test to the simulation test structure.
Julien:
-
Docusaurus Migration: Julien has successfully merged the Docusaurus migration into the Lodestar repository. The documentation is now on par with the previous setup, with a few minor adjustments remaining. The new documentation setup will be officially released with the next Lodestar release.
-
Documentation Layout Reconsideration: With the migration complete, Julien suggests it might be a good time to reevaluate the documentation layout, potentially introducing higher-level categories and exploring Docusaurus's capabilities, such as embedding JavaScript directly into the documentation.
-
Light Client Demo Enhancements: Julien fixed several bugs in the light client demo and explored integration with Farcaster. He suggests that the light client demo could be a good match for Farcaster frames, showcasing the light client's capabilities in a webcast environment.
-
P2P Implementation for Light Client: Julien has begun discussions with Cayman about implementing P2P for the light client, aiming to deepen his understanding of P2P and libp2p technologies before Cayman's temporary departure.
-
Pending PR Review Request: Julien also mentioned a pending PR (#6507) that has been awaiting review for two weeks, moving the withdrawal test to the sim test structure.
Matt:
-
BLST-TS PR Completion: Matt has finished addressing all comments from Hubert on the BLST-TS PR (#124). He's ready to merge and was waiting for Hubert to double-check the changes for any final updates.
-
Migration from Rebuild Folder: Matt has been working on migrating from the rebuild folder and removing all SWIG code. The PR for this work is #125. This involved refactoring the build/install system and updating the workflow action. He's aiming to finalize this work, ensuring there are no issues from the refactor. The PR is significant, with 11k lines added and 15k lines removed, but most of the code changes were previously reviewed by @g11tech during the PR process for merging individual pieces to the rebuild branch. The changes in PR #125 are mainly related to repo configuration and the publishing process.
-
Next Steps: Matt plans to activate the tests to check for any typos from the refactor and complete the PR with a full list of changes made during the process.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6510
v1.17 Release and Deneb Upgrades
- v1.17 Released: Thanks to everyone's efforts, v1.17 is out, ensuring readiness for the final Deneb upgrades.
- Node Upgrades: The infrastructure team has upgraded all nodes. Team members are encouraged to rebase their branches for compatibility with Mainnet feature groups.
Ethereum Protocol Fellows Cohort
- Upcoming Cohort: Discussion on preparing for the next cohort of Ethereum protocol fellows, with a focus on identifying potential projects for fellows interested in client development.
- Mentorship Experience: Sharing experiences from previous mentorships, highlighting the low time commitment and the benefits of mentoring, including potential recruitment opportunities.
- Project Ideas: Considering projects like beacon chain harness for testing or integrating Lodestar with the portal network as potential tasks for fellows. Please highlight any issues that may be useful for EPF fellows to pursue
- Potential Fellowship Task: Discussion on integrating Lodestar with the portal network as a suitable project for a protocol fellow.
- Use Case for Archiving Goerli Data: Exploring the idea of using Lodestar and the portal network to archive and distribute historical states of deprecated networks like Goerli.
- Clarification on Integration: Questions about the specifics of integrating with the portal network, including the mechanism of data provision and the potential benefits of such integration.
Action Items and Considerations
- Mentorship Participation: Encouragement for team members to consider becoming mentors for the protocol fellowship program.
- Project Identification: Need to identify and outline specific projects that would be suitable for fellows, ensuring they align with Lodestar's goals and can provide meaningful contributions.
- Understanding Portal Network Integration: Further discussion required to clarify the technical aspects and potential impact of integrating Lodestar with the portal network.
Beacon Chain Harness for Testing
- The main topic revolves around the development of a beacon chain harness for testing. NC seeks input on the list of features or requirements for the beacon chain harness, including suggestions for additions or removals.
- Purpose of the Harness: The harness aims to generate test fixtures for use in testing, rather than dynamically generating fixtures. The consensus leans towards the utility of fixed scenarios over random elements for testing purposes, with Sim tests and other extensive tests covering the randomness aspect.
-
Proposed Scenarios: The discussion suggests having a few specific chain scenarios for testing, such as:
- A simple linear chain that finalizes.
- A forky chain that doesn't finalize.
- Possibly one or two more scenarios with distinct characteristics.
- Value and Resource Allocation: The team acknowledges the value of embarking on this project. The next steps involve figuring out resource allocation to develop the harness. Discussion to follow on issue: https://github.com/ChainSafe/lodestar/issues/6518
Proposal to Increase Memory Limit: There was a proposal to bump the max old space to 16GB, considering that most staking individuals likely have at least 16GB of RAM. However, concerns were raised about overallocation and its impact on garbage collection cycles and overall system performance.
-
Concerns and Suggestions:
- Optimal Allocation: A suggestion was made to set the limit to 12GB instead of 16GB to avoid overallocation and potential performance issues.
- Server Impact: Concerns were raised about setting the limit too high, potentially affecting servers with only 16GB of RAM, leading to crashes or performance degradation.
- Dynamic Allocation: The idea of dynamically setting the memory limit based on the server's total memory was discussed, with a minimum of 12GB suggested.
- Default Settings Alignment: It was noted that the default memory limit should align with default settings, and adjustments to settings should be accompanied by corresponding memory limit adjustments.
- Increasing Default Peers: The discussion also touched on increasing the default peer count to 100, aligning with other clients, and potentially improving network effectiveness.
-
Action Items:
- Testing with More Peers: The team agreed to test running nodes with more peers (around 100) and observe the impact on memory usage and network effectiveness.
- Observation and Analysis: Before making any changes to the memory limit, the team decided to analyze memory usage with increased peers to identify any potential bugs or unusual memory consumption patterns.
- Potential PR: A PR to increase both the memory limit and peer count was considered, with a focus on first understanding the implications of increasing peer count on memory usage.
SSZ Version Release
- There's an upcoming optimization based on a new SSZ version, including a PR for caching the root for the list type.
Block and Blobs Pulling Technique
- A new pulling technique for blocks and blobs is being developed, targeting edge cases where the block isn't seen within a specific time window. This technique starts aggressively pulling blobs through request-response when the block is seen, and not all blobs are there. Another scenario covered is when a blob is seen but not the block, indicating the block's presence. A PR for this is almost ready and aims for another RC, though not targeting the mainnet immediately.
v1.18 Release Planning
- The upcoming v1.18 release will include significant improvements, such as the optimization mentioned above and the proposer boost feature, which is being held off until v1.18 for testing.
- Another important inclusion will be the addition of point randomness to sameMessage verification, aiming to enhance security.
- The BLST merge is close to completion and is expected to be part of v1.18, marking it as a substantial release with numerous improvements and new features.
Matt:
Swig Version of Multiplication
- Implemented the swig version for multiplication of randomness. The metrics showed good peering, handling twice the amount of traffic due to better peering compared to unstable. However, the attestation queue was significantly longer than expected, raising concerns about some underlying issues needing tuning. It was decided not to rush this into v1.17 and instead aim for inclusion in the next release alongside the BLST PR.
Shuffling and Spec Tests
- Completed the shuffling work, passing all unit and sim tests, and running on a node. However, spec tests are failing, indicating a potential issue with the test setup rather than the code itself. This led to a pause in this work to focus on the multiplication feature.
BLST TS Refactor
- Received extensive and valuable feedback from Hubert, with 95 comments on the BLST TS refactor PR. This feedback is currently being addressed, with a high level of confidence in the quality of the refactor due to Hubert's expertise in C and Rust.
- The primary focus is on finalizing the BLST integration, addressing the remaining feedback, and preparing for merging the rebuild branch into the main. This work is expected to extend beyond the current week, with a target to complete uninterrupted by the next stand up.
Julien:
Docusaurus Migration
- Worked on migrating to Docusaurus, aiming for a quick merge. The migration is at a stage where it can be merged, with the last comments related to links addressed. This should facilitate easier documentation management and improvements.
Light Client in Alternate Runtimes
- Made progress with running the light client in alternate runtimes, specifically noting improvements with the latest Lodestar release allowing compatibility with Vite. However, some polyfills related to buffer are still needed. Julien shared ideas for further improvements, aiming for a seamless integration of the light client library without the need for users to worry about polyfills or configurations.
Light Client Demo Enhancements
- Continued work on enhancing the light client demo, including fixing bugs and conceptualizing improvements. Working on mockups to share with Nazar for collaborative planning on the demo's future direction. Suggested leveraging the demo for showcasing capabilities with Farcaster or Warpcast, noting the potential to engage a vibrant developer community interested in L2 solutions and decentralized applications.
Cayman:
Merkle Tree Experiment with Rust
- Cayman has been working on an experiment to transition the Merkle tree implementation into Rust. The goal is to see if memory usage can be significantly reduced by utilizing Rust's more efficient memory allocation compared to JavaScript. This could potentially lead to performance improvements.
- The experiment involves creating Rust objects for branch and leaf nodes of the Merkle tree and wrapping them with JavaScript for necessary interactions. This approach aims to minimize the memory overhead associated with JavaScript objects.
- The Rust-based implementation has been integrated into the Persistent Merkle Tree library. Cayman is currently addressing bugs related to tree navigation in Rust to avoid unnecessary JavaScript object creation.
- If successful, this experiment could pave the way for further performance enhancements, including optimized hashing and the possibility of parallelizing hashing operations.
Tuyen:
- State Cache Clone Method: Introduced a clone method for items retrieved from the state cache to address an issue that led to increased block processing times. This was fixed by implementing a transfer cache option, and the fix was included in the recent release.
- SSZ Serialization Improvement: Focused on improving SSZ serialization, particularly for the n-historical state work which requires serialization per epoch. Improvements include leveraging cache nodes within the ViewDU for validator cache after epoch transitions and caching balances to enhance block processing and serialization speeds.
-
SSZ Release Incorporation: Plans to incorporate the new SSZ release and use the new type for
BeaconState
balances to further improve serialization efficiency. -
New Validators Type: Working on a local branch to create a new validators type within
BeaconState
to facilitate faster serialization. - Epoch Transition Tracking: Submitted a PR to track epoch transitions by reason, which is ready for review.
Nico:
Rewards Calculations Review
- Nico spent time reviewing NC's branch on attestation rewards to deepen his understanding of rewards calculations in the spec and Lodestar's implementation. He suggests that the branch could benefit from additional reviews before merging.
Token Path Testing Improvement
- Following a suggestion from the Ethereum Foundation, Nico prepared a PR to add a flag for configuring the token path to facilitate easier testing between clients. The discussion is ongoing, and the change may be included in the v1.18 release.
SSZ Refactor Progress
- Nico has been working on the SSZ refactor, updating tests, and ensuring they pass. The HTTP client has been made more complex but also more powerful, allowing for per-request settings. This opens up possibilities for future features like fetching blocks from multiple beacon nodes and selecting the most profitable block, which could be attractive to large node operators.
Finality Issue on Goerli and Attestation Processing
- Investigated a finality issue on Goerli related to a bug in Prysm and checked Lodestar's attestation processing. Lodestar does not seem to have the same bug as Prysm, but Nico identified a minor issue where not all attestations were included in post and app blocks, which is resolved in v1.17.
- Nico reported that his Goerli node was initially up but started struggling after a restart, facing issues with finding peers and syncing due to the lack of finalized state.
- Gajinder suggested using a checkpoint sync from a known peer as a workaround for syncing issues during periods of non-finality. However, he noted a problem with the current block fetching algorithm, which could stall sync if too far behind due to peer disconnections.
- The discussion highlighted difficulties in finding operational nodes to sync from, with most nodes, including ChainSafe's, being taken down. Prysm and Lighthouse might still have nodes running, but the network's state is fragmented with very low participation rates.
- There was mention of possibly using a historical state for syncing, but the focus shifted towards learning from the situation to improve future responses to similar scenarios. The conversation touched on the importance of having strategies for syncing in testnets and handling long periods without finality.
NC:
- Beacon Chain Harness: NC mentioned working on the beacon chain harness, with more updates to come.
- EIP-6110: NC is involved in work related to EIP-6110, contributing to the development and discussions.
- Support for Naito's PR: Assisted Naito with a PR regarding expected withdrawals, showcasing collaborative efforts within the team.
- New Specs Review: Engaged in reviewing new specifications released last week concerning Inclusion List (IL) and Max Effective Balance (Max EB). Plans to delve deeper into these topics and potentially start working on a Max EB Proof of Concept in the upcoming week.
- Engine API for Electra Drafting: Collaborating with Mikhail to draft the engine API for Electra, initially focusing on incorporating EIP-6110 and EIP-7002 into the drafting phase.
Nazar:
- Continue on working flashbot/builder support to our sim tests. There is some compatibility issues between versions of clients, that is making a bit troublesome.
- Latest version of Geth is now only supported for the post merge networks, so finding a way to use different Geth versions for different sim tests as we are testing merge scenarios as well.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6493
Optimizations Update:
- Main optimization for inclusion is related to pulling blobs, with a PR ready and under review by Tuyen.
- Mentioned a PR in SSZ for adding cache, pending review, which will optimize transaction-related routes in the beacon.
- Running tests on Holesky showed considerable improvement in blob handling.
- Proposed further optimizations for handling blobs and blocks when critical time has passed in a slot.
Release Planning:
- Discussed the possibility of doing an RC release today, considering the significant optimization front.
- Considered targeting Monday for the release to include additional optimizations if possible.
- Agreed on proceeding with an RC release today, with the possibility of another RC by Friday if further optimizations are completed.
- Discussed the importance of making clear that the upcoming release is recommended but not mandatory, with Tim Beiko indicating flexibility in updating the blog post with the latest release information.
- Reviewed the commits since v1.16 and discussed whether to cherry-pick specific optimizations or push new changes from unstable.
- Preference expressed for bumping to v1.17 to avoid patch releases unless absolutely necessary, noting that most changes are small and not feature-heavy.
Open PRs Discussion
- Historical State Regen (PR 6033): Waiting on Cayman to address comments from Matt's review. No external blockers.
- Proposer Boost Reorg: Needs review; recently updated to resolve conflicts and is now ready for review.
- Late Block Handling: Discussion on the readiness for review, with recent conflict resolution.
- Proposer Boost Merge Timing Concerns: Suggestion to defer merging Proposer Boost due to sensitive changes, like fork choice modifications, until after v1.17 release to maintain stability.
-
Julian's
eth_getBlockByNumber
Fix: Awaiting review completion by Nazar, who has started looking into it. - Dependabot Updates: Plan to handle asynchronously; may close stale external contributor PRs if no response is received.
-
Use
Uint32Array
for Shuffling Committees: Requires review.
Support for Docusaurus Migration
- General Consensus: Strong support for migrating to Docusaurus due to its flexibility, React-based framework, and compatibility with current web technologies.
-
Advantages:
- Integration capabilities, allowing for direct inclusion of components like the prover as example pages.
- Enhanced flexibility and developer familiarity due to its React basis.
- Web3JS Team Experience: Positive feedback on Docusaurus for static documentation, with notes on challenges related to dynamically generated documentation from source code.
- Focus on Documentation: Acknowledgment that Docusaurus is primarily for documentation, fitting the project's current needs.
- Webpack Compatibility: Docusaurus's reliance on Webpack 5 is advantageous for working with light clients.
Concerns and Observations
- Migration Goals: Emphasis on ensuring the migration brings additional value beyond just a platform change, with specific improvements outlined for the documentation.
- Routing and References: Importance of maintaining stable routing and references to avoid breaking external links to the documentation.
Action Items
- Outcome-Oriented Migration: Agreement on pursuing tangible benefits through the migration, not just changing platforms for the sake of it.
- Continued Discussion: Plan to continue the conversation on documentation improvements in an existing issue started by Matthew, focusing on collective effort and planning.
- Analytics and Insights: Interest in exploring tools or plugins for analytics to understand documentation usage and reader interests, potentially through Docusaurus plugins or Google Analytics.
Lodestar Developer Blog
- The idea of establishing a developer blog for Lodestar was discussed, aimed at publishing technical posts for contributors and users.
- Decision: To integrate Lodestar posts within a dedicated section of the ChainSafe blog.
- Rationale: This approach leverages SEO benefits and avoids starting from scratch, fostering a symbiotic relationship with ChainSafe.
Metrics Discussion
- During the refactoring process, it was discovered that approximately 80% of collected metrics are not being utilized in dashboards, raising concerns about the performance burden of collecting and scraping unused metrics.
- The team proposed evaluating which metrics are actually useful by first adding them to dashboards for visualization. This would help determine their utility before deciding to remove any unused metrics.
- A significant gap identified is the lack of documentation explaining the meaning and use of various metrics. Enhancing documentation could improve understanding and utilization of metrics.
- A script exists to assert metrics usage, but a more thorough review is needed to decide on the inclusion of metrics in dashboards.
- The team suggested listing all metrics and creating a checklist to determine their usefulness, with the aim of making informed decisions on which metrics to retain or remove.
- Sharing knowledge and experiences with metrics, as exemplified by a productive discussion between team members in Ho Chi Minh, was highlighted as immensely valuable. Such exchanges can deepen understanding of how different metrics interrelate and impact the system's performance.
Next Steps for Metrics
- Organize an issue to list all metrics and facilitate a collaborative review process. This will allow the team to share insights on the relevance and utility of each metric.
- Continue the conversation on metrics through this organized issue, aiming to enhance both the dashboard and documentation with meaningful and useful metrics.
Tuyen:
n-Historical State Progress
- Regen PR Merged: Continued work on the n-historical state with the successful merge of the regen PR.
- State from Cache and getStateOrBytes API: The next step involves pulling binary data of the upcoming finalized checkpoint, already persisted with the n-historical flag, and persisting that to the state DB.
SSZ Serialization Improvement
- Current Serialization Time: Noted that serialization of 1.5 million validators takes around seven milliseconds.
- Optimization Goal: Aiming to reduce serialization time by simplifying the process, which in minimal testing scenarios, reduced the time to 1.5 to 2 milliseconds.
- Initial Findings: Identified that serialization currently operates based on type structure without caching nodes, despite using ViewDU where fields are cached during state transition. Plans to utilize cached nodes for improved efficiency.
SIMD for sha-256
- Initial Success: Achieved promising results on personal computer and mainnet node (feature four node) using SIMD (Single Instruction, Multiple Data) for sha-256, aiming for parallel processing improvements.
- Benchmarking Hesitation: Despite initial success, benchmarking results did not reflect expected improvements, leading to a pause on further development due to lack of confidence in the approach.
NC:
Progress on Reward Endpoints
- Sync Committee Reward Endpoint: Successfully merged the PR for the sync committee reward endpoint.
- Attestation Reward Endpoint: Completed coding for the attestation reward endpoint and has the PR ready for review.
EIP 6110 and Effective Balance Increment Issue
- Investigated the effective balance increment issue related to EIP 6110. Conducted local testing but indicated the need for more extensive testing before opening a PR.
Block Reward Unit Tests Optimization
- Explored solutions to address the issue of block reward unit tests taking excessive time, as highlighted by Tuyen.
- Attempted to use the unit test version of Altair states following Tuyen's suggestion but found it insufficient for the requirements, which include needing a mock block that can undergo state transition.
- Plans to develop a more comprehensive solution, potentially enhancing the test utilities to support this method in the future.
Gajinder:
- Blobs Pool: Continued work on optimizing the blobs pool.
- SSZ Merkle Caching: Submitted a PR for caching the SSZ Merkle of lists, which is currently up for review.
- Inclusion List: Reviewed the ongoing discussions about the inclusion list but did not have significant updates from the previous week.
- Electra Proposals: Engaged in reading and understanding new proposals for the upcoming Electra upgrade, including Lion's proposal for inactivity score improvements among other topics.
Nico:
- SSZ Refactor Cleanup: Focused on cleaning up the SSZ refactor branch, preparing it for further development.
- PR Reviews: Dedicated time to reviewing PRs from contributors and other team members. Plans to review NC's PR and provide feedback.
- SSZ Refactor Progress: Awaiting the merge of NC's PR to perform another rebase on the SSZ refactor branch. Anticipates addressing and resolving numerous build issues post-rebase.
- Focus: The main focus remains on advancing the SSZ refactor to a buildable state, navigating through the challenges of integrating changes and ensuring compatibility.
Cayman:
Prysm's Hash Tree Library for JavaScript
-
New Repository: Created a new repository named
hashtag Js
to make Prysm's hash tree library accessible in JavaScript. - CI Issues: Encountered difficulties with CI builds, particularly for Windows and Mac OS, where builds are failing.
-
Project Details: The project is a straightforward
napi-rs
project, involving a C program built through Cargo's build process, with a Rust wrapper around the C library. - Performance: The performance of the library is reported to be very good.
- PR Reviews: Continued reviewing PRs and is available for assistance. Urges team members to reach out if needed.
- Upcoming Leave: Cayman announced he will be getting married on the 23rd and will be off for several weeks, approximately three to four weeks, starting from the 20th.
Nazar:
- Moved the withdraw interop test to the SIM test suite. The PR is now open and ready for review.
- Working on a PR related to the builder, expected to be opened by Tuesday. This effort aims to transition most existing sim tests within the beacon node repository to the sim test suite and some to end-to-end tests, concluding this particular chapter of work.
UI/UX for Light Client Demo
- Engaged in brainstorming sessions to enhance the UI/UX for the light client demo. The goal is to make the demo more attractive and user-friendly with improvements such as sync committee animations and better user experience designs.
- Decided against merging the light client demo with the prover demo into a single page, opting to keep them separate for clarity and focus.
- Plans to implement suggestions from Julian to improve the demo's UX by simplifying configuration and setting default values, making it more accessible to users.
- Future work will focus on refining the light client demo and developing a separate prover demo.
Julien:
- Creation and Management: Spent time creating "good first issues" for external contributors. Noticed an increase in external contributions, suggesting a need for a strategy to manage these issues effectively, considering the team's overhead in overseeing them.
- Long-term Engagement: Emphasized the importance of engaging contributors for long-term involvement rather than one-off contributions.
Light Client Library Testing
- Environment Testing: Continued testing the light client library across various environments, including React Native and Cloudflare Workers, noting compatibility issues.
- Docusaurus Testing: Tested with Docusaurus, based on Webpack 5, showing potential compatibility. Optimistic about making the light client work with Docusaurus due to its Webpack 5 basis.
Light Client Demo Improvements
- UX Improvements: Worked on fixing longstanding issues in the light client demo and enhancing user experience. Acknowledged the demo's potential but noted its current UX complexity.
- Collaboration with Nazar: Plans to collaborate with Nazar on refining the demo, focusing on making it more user-friendly and showcasing the value of the light client.
Matt:
- Shuffling Refactor Issue Diagnosis: Identified issues with sim tests failing, suspecting incorrect shuffling cache assignments. Currently investigating the cause, which seems related to cache management rather than the shuffling algorithm itself.
- Code Review with Hubert: Received valuable feedback from Hubert, a C developer, on the BLS refactor. Focused on addressing comments related to buffer safety and other C-specific concerns to enhance code quality.
- BLS Refactor Progress: Worked on incorporating Hubert's feedback to prepare the BLS refactor for merging. Aims to integrate these changes into the main branch, eliminating the need for a separate rebuild folder and dual testing.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6467
Update on Optimizations and Patches for Deneb
-
Optimizations:
- Handling big blocks by changing the SSZ library to not cache routes for composite lists, aiming to optimize for full blocks in memory.
- Implementing cache route optimizations for transactions and withdrawals to avoid recalculating roots.
- Considering extending the publish API for validators to include the block root they signed over, potentially saving recalculations.
- Aggressive Pull of Blobs: Addressing a ~2% blob drop rate on Holesky by proposing an aggressive pull strategy for blobs to improve efficiency.
Patch Release Considerations:
- Debated whether to include the aggressive block pull optimization in the next patch release or to wait.
- Gajinder suggested focusing first on the blobs part, as big blocks are unlikely on mainnet due to optimizations already merged.
Release Timing:
- Discussed the possibility of waiting for the aggressive block pull strategy to be ready before releasing the next patch, targeting completion by Friday before the fork date on the 13th.
Strategy Going Forward:
- The team leaned towards waiting for the next set of optimizations before pushing out a new release, aiming to avoid confusion before the hard fork.
- Considered following a similar approach to Geth, offering optional upgrades for optimizations that are unlikely to impact mainnet performance.
Light Client Roadmap for Electra
- Sync Committee Slashing
- The team acknowledged the importance of introducing sync committee slashing to enhance trust and security in the light client infrastructure. This addition is seen as crucial for making the client infrastructure more reliable by penalizing misbehavior.
- Light Client Data Backfill
- There was a debate on the necessity and complexity of adding light client data backfill to the protocol. The current scheme does not support backfilling data efficiently, which is a limitation that the proposal aims to address.
- Security and Use Cases
- Questions were raised about the security benefits of adding slashing and how it would enable new use cases, especially considering its potentially low impact on securing high-value applications like bridges.
- Canonical Data and Weak Subjectivity
- The discussion touched on the need for light clients to sync from periods before the weak subjectivity period, questioning the practicality and security implications of such a feature.
- Meeting with Etan
- The team considered organizing a breakout session with Etan to address questions and gain further context on the proposal, potentially leading to a monthly light client-focused meeting.
Stale PRs Cleanup
- Suggested cleaning up old, stale PRs across repositories, noting that some repos are neglected with a lot of stale content.
- Discussed establishing basic guidelines on handling robot dependency upgrade PRs.
- Consideration on whether to accept automatic dependency upgrade PRs, especially noting that most stale PRs are related to these.
- Observed that many stale PRs are for dev dependencies, which are less critical, especially if they involve indirect dependencies of tools like webpack.
- Suggested that accepting PRs for dev dependencies might not significantly impact security and could be a straightforward way to manage some of the backlog.
- Cayman proposed tackling the cleanup process asynchronously, possibly in a one-on-one session, to efficiently address and close unnecessary PRs.
- Recommended creating an issue with a checklist of PRs to be reviewed and closed, allowing for a systematic approach to the cleanup process.
Move ENR-app to discv5
- Action approved to move it into the discv5 monorepo
Removing Support for Older Node Versions
- Existing practice is to remove support for older Node.js versions when a new LTS version is adopted.
- Suggested supporting one major even-numbered version back, implying Node.js 18 should not be dropped until version 22 is released.
- Consideration of when support for version 18.17 was inadvertently broken, despite being listed as supported in
package.json
. - Proposed running unit tests against two Node.js versions to ensure compatibility and facilitate a rolling update strategy.
- Discussed the impact on CI time, with a suggestion to perform these checks with each release rather than continuously, to avoid prolonging CI processes.
- Mentioned that unit tests are relatively fast and wouldn't significantly add to total CI time.
- Emphasized the importance of identifying the specific commit or PR that breaks compatibility with an older Node.js version for accurate release notes.
Action Items for deprecating Node.js support
- Update the
engine
field inpackage.json
to warn users of incompatible Node.js versions during installation. - Plan to prepare for the release of Node.js 22, ensuring continued support for Node.js 20 until then.
- A PR will be created to test the proposed strategy on CI and evaluate the impact on test duration.
- Prepare for the upcoming Node.js 22 release by ensuring compatibility with Node.js 20 is maintained.
Matt:
- Implemented a
multiply by
function to address a security vulnerability identified in the crypto library. - Successfully integrated the fix into BLST and BLS, and subsequently into Lodestar.
- The implementation and performance have exceeded initial expectations, with no performance issues reported.
- Deployed the update approximately 6 hours prior to the standup, noting that it's performing better than the unstable version.
- Awaiting a full day's metrics to comprehensively assess the impact and stability of the changes.
- Finalizing the last unit tests and integrating metrics into the shuffling PR, with completion expected by tomorrow.
- Plans to create a PR to move the rebuild branch onto the main in the block repository, indicating readiness for broader use.
Tuyen:
Exploration of SIMD with as-sha256
- Investigated the use of SIMD (Single Instruction, Multiple Data) to enhance the performance of sha256 by processing multiple inputs in parallel.
- Achieved a performance improvement: 25% better on a MacBook and 70-80% better on a server.
Proposed Improvements
- Identified potential improvements to the digest 64 process by:
- Chaining the extended block with the main loop inside a hash block.
- Using
Uint8Array.slice
instead of getting a subarray to separate data fromUint8Array
.
- These improvements could result in a 60-70% performance enhancement.
Julien:
Light Client Libraries in Browsers
- Explored integrating light client libraries with modern bundlers like Vite and Parcel for browser usage.
- Faced challenges making it work seamlessly, indicating the process is not trivial.
- Created issues with ideas to outline potential improvements.
Dependency Cleanup
- Identified the need for cleaning up dependencies, either due to lack of maintenance or minimal value to the project.
- Initiated some pull requests to remove such dependencies, suggesting these tasks could be good first issues for newcomers.
Light Client and Prover Learning
- Continued learning about light client and prover functionalities.
- Addressed a prover issue and is in the process of responding to comments from Nazar to determine the best solutions.
Cayman:
-
Committees Optimization PR: Introduced an experiment converting committees into a single
uint32
array for slicing, achieving a 5% reduction in total running memory. - Snappy for Gossip Messages PR: Implemented encoding and decoding of gossip messages using Snappy for larger payloads and snappy JS for smaller payloads, aiming to deploy to a feature branch for testing.
- Explored using shared array buffers for data transfer from the networking thread to the main thread to reduce event handling. The initial deployment showed poor performance, with ongoing investigation into the cause.
- Discussed previous work on optimizing SHA-256, showing potential for significant performance improvements based on benchmarks against Prysm's implementation.
- Open to further developing this work for production use, noting Prysm's capability to hash full state in milliseconds.
Nico:
Beacon API Release and Updates
- Announced the release of beacon APIs and updated Julian's branch to align with the latest spec, enabling merging and testing against updated specifications.
- Confirmed that all desired fixes are now incorporated, allowing for the removal of workarounds previously implemented in API tests.
SSZ Refactor Progress
- Completed the translation of all routes for the SSZ refactor, including the previously missing key manager.
- Working on refactoring the spec tests and planning to rebase the refactor branch again.
- Mentioned the potential need to merge the rewards APIs (pending confirmation from NC) but indicated no rush, as refactoring for those routes can be addressed later.
NC:
Beacon API Development
- Sync Committee Reward Endpoints: Submitted a PR for review, with Nico providing extensive feedback. Plans to address these comments.
- Attestations Reward Endpoint: Initiated a draft PR for the last reward endpoint related to attestations, currently halfway through development.
- Issue 6110 on Effective Balance Increments: Investigated the effective balance increments issue raised by Gajinder, with an initial solution in mind. Coding and testing of the solution are pending.
- Inclusion List PoC for ACDC: Participated in last week's ACDC discussion where Lion highlighted the need for an Inclusion List Proof of Concept (PoC).
- Engaged in Discord channel discussions related to the PoC, noting that immediate development has not yet commenced.
Gajinder:
Optimizations and Protocol Enhancements
- Focused on optimizations and delving into ePBS (Enshrined Proposer Builder Separation), IL (Inclusion Lists), and pre-confirmations, updating knowledge with the latest information.
- Aimed to implement aggressive blob pull and complete the 7002 implementation for IL support.
Questions and Concerns
- Raised pertinent questions regarding the design and rationale behind forward Inclusion Lists (IL), particularly questioning the emphasis on optimizing builder payouts over addressing censorship concerns.
- Expressed reservations about the forwardness of IL, preferring a model where proposers can directly inform builders of their CR (Censorship Resistance) list or inclusion list for the same slot.
- Noted the absence of clear explanations for discarding same-slot IL proposals in favor of forward IL, aiming to fully understand the proposed design's alignment with ePBS.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6434
v1.16 Release Discussion
-
Progress on v1.16: Discussion on the current status and remaining tasks for the v1.16 release.
- A dependency upgrade PR has been merged and included in the RC, specifically the disc v5 update, which is expected to fix previously identified errors.
- Several PRs tagged for 1.16 are mostly merged, with a few remaining open for fixes and enhancements, including typo fixes and adding logs for HTTP retries.
- The team plans to merge these PRs promptly to cut another RC from the unstable branch, focusing on small changes like CLI flags and logs.
Performance and Memory Usage
- Performance Issues: A performance issue was noted in Holesky, potentially due to a recent increase in validators. The team plans to confirm this on mainnet.
-
Memory Usage: Increased RSS memory usage observed, up to 9GB on
beta-mainnet
server. The team is considering this in the context of the 1.16 release and potential impacts.
Release Strategy and Versioning
-
Handling Breaking Changes: Discussion on managing breaking changes, especially in CLI flags, and the versioning scheme for customer-facing products.
- The team considers major version bumps or separating the Prover from the monorepo to avoid versioning conflicts.
- The possibility of independent versioning for specific packages within the monorepo was discussed, with a focus on the implications for the Prover and other unique dependencies.
-
Contingency Plans: Should significant performance issues be confirmed in v1.16, the team discussed contingency plans, including:
- Potential for a 1.15.2 Release: If necessary, a patch release (1.15.2) could be considered to incorporate critical commits without the changes introduced in 1.15 that may be causing issues. However, this approach is complicated by the fact that 1.15.1 also exhibited similar problems.
- Reverting Changes in a 1.16.1 Release: Another approach could involve releasing 1.16.1, reverting problematic changes from 1.15, and including only the Deneb config. This would allow for a focused investigation into the cause of the issues, to be addressed in a subsequent 16.2 release.
Project Updates
- Shuffle Refactor: Substantial completion of the shuffle refactor, with ongoing work on metrics for cache hits and misses.
- BLST Library: Approval and positive feedback on the BLST library work, with plans to address a security bug identified by the Lighthouse team related to message optimization.
- Blinding Blocks: Progress on rebasing and cleaning up the blinding blocks PR, with plans to finalize and address any outstanding issues.
Matt:
-
Shuffle Refactor: The shuffle refactor is substantially complete, with attention now turning to metrics for cache shifts and cache misses. A slight refactor may be needed to integrate these metrics fully.
-
Collaboration with Julian: Discussed the "past monster" issue with Julian, with plans to solidify the approach used.
-
BLST Approval: Received approval on BLST work, which is now looking very promising. There's excitement about the progress and the quality of the work.
-
Security Bug Mitigation: A security bug identified by the Lighthouse team is being addressed. Tuyen has developed a proof of concept for mitigation, and Matt plans to implement a function to address this.
-
Cleanup and Publication: Additional cleanup is required before the changes can be merged into the main branch and published. Progress is close to completion, with optimism about nearing the finish line.
-
Blinding Blocks and PR Rebasing: The rebasing for blinding blocks has been cleaned up after discussions with Julian. Matt plans to finalize this work and then consult with Gajinder about an issue related to the sim merge test. The test failed due to a missing API function (
get payload body
) in the execution layer container, which needs to be addressed either by updating the container or adjusting the test strategy. -
Looking Ahead: The focus for the upcoming week includes finalizing the blinding blocks work and addressing the sim merge test issue, with a positive outlook on the progress made and the tasks ahead.
Cayman:
-
Tree Hashing Exploration: Investigated alternative methods for tree hashing in Lodestar, focusing on memory efficiency and performance improvements. Experimented with different SHA implementations and storage methods for hashes.
-
HackMD Documentation: Shared findings and exploratory work in a HackMD document, accessible here.
-
Exploration Outcomes:
- Initial attempts using NAPI-RS and exploring various storage and implementation strategies did not yield memory or performance improvements over the current setup.
- AssemblyScript SHA-256 remains the most efficient for hashing two 32-byte arrays compared to Rust implementations and a ported version of AssemblyScript SHA-256 into Rust.
- The hash tree library from Prysm, designed for bulk hashing, showed potential for speed improvements when hashing large contiguous data arrays.
-
Potential for Bulk Hashing Optimization: Identified a possible optimization for parts of the tree that change entirely at once (e.g., balances) by using bulk hashing techniques, which could significantly improve performance.
-
Memory Usage Comparison: Noted that Lighthouse also consumes around 8 GB on Holesky, suggesting that Lodestar's memory usage is competitive. Further exploration into Rust and NAPI wrapping indicated that memory efficiency challenges are partly due to the inherent overhead of Node.js structures.
-
Next Steps: Cayman plans to continue exploring hashing optimizations and invites team members interested in this area to join the discussion in the developer channel.
NC:
- Research Catch-up: Focused on catching up with discussions on Inclusion Lists (IL), Enshrined Proposer Builder Separation (ePBS), and max effective balance (MaxEB) from the past two weeks.
- PR Merges: Thanked Gajinder for merging the 6110 PR into the Electra branch and outlined plans for follow-up PRs related to 6110.
- Notify New Payload: Plans to start work on the early call for the notify new payload feature proposed by Tuyen.
- Optimization of Effective Balance Array: Gajinder requested NC to look into optimizing the effective balance array fixed in the 6110 PR, which currently isn't optimized for memory.
- Notify New Payload Latency: Discussed the latency involved in calling notify new payload, with observations indicating a 10-20 millisecond latency upon block reception. The discussion extended into the efficiency of this process and comparisons with Nethermind's response times.
Nethermind Latency Observations:
-
Initial Observations: Noted that Nethermind quickly responds to payloads it has already seen, but exhibits a significant latency of approximately 300 milliseconds for new payloads. This latency was identified through detailed analysis of the time from making a call to receiving the first streaming response.
-
Investigation Findings:
- For already seen payloads, Nethermind's response time is very fast, approximately 10-20 milliseconds, indicating efficient handling of known data.
- For new payloads, the latency before Nethermind starts sending data back is around 300 milliseconds. This delay aligns with Nethermind's logs, which show a 200-millisecond gap before acknowledging receipt of a new block.
-
Analysis of Latency Causes:
- The latency for new payloads is not attributed to Lodestar's network thread being busy or any inefficiencies in Lodestar's handling of network responses.
- The delay primarily occurs on Nethermind's side, from the moment Lodestar sends out the call to when Nethermind begins to respond. This was corroborated by matching timings with Nethermind's logs.
-
Communication Efficiency:
- For repeated blocks, the observed low latency demonstrates the actual communication efficiency between Lodestar and Nethermind, excluding block processing time.
- The increased latency for new payloads suggests a delay on Nethermind's part in processing and responding to new block information.
Implications and Next Steps
-
Optimization Focus: The findings suggest that while Lodestar's notify new payload calls are made efficiently, the optimization focus should perhaps shift towards understanding and reducing the latency of payload processing on Nethermind's side.
-
Further Analysis: Additional comparisons with other clients like Geth, and further metrics implementation on Lodestar's side, could help pinpoint optimization opportunities and improve overall response times for new payloads.
-
Comparison with Other Clients: Highlighted the need to compare Lodestar's performance with other clients, particularly focusing on the latency of payload processing and the efficiency of notify new payload calls.
Nico:
-
Issue Review for Release: Reviewed all issues tagged for the upcoming release, including those assigned during the retreat. Picked up a variety of tasks, focusing on smaller, scattered issues.
-
PR Status: Noted that some PRs are still open. While Nazar reviewed some, they are not critical for merging in the 1.16 release, except for two tagged PRs.
-
Beacon API Spec Review: Conducted a review of the beacon API specification to finalize adjustments before the release. The focus was on cleaning up details to prepare for the 3.0 release.
-
API Deprecation and Release Planning: Discussed the approach to API deprecation, noting that newly deprecated APIs from Capella will not be removed until after one more hard fork, adhering to the consensus on API removal timing. This strategy aims to ensure stability and backward compatibility.
-
Release Timeline: Anticipated cutting the 3.0 release possibly within the week, pending final reviews and adjustments.
-
Miscellaneous Fixes: Engaged in fixing minor issues across various aspects of the project, contributing to overall improvements and readiness for the upcoming release.
Tuyen:
- Gossipsub Migration: Worked on migrating protobufs to protons for gossipsub during vacation. The work has been merged.
- N-Historical State PR: Plans to divide the large n-historical state PR into smaller PRs for easier review. The first one focuses on regeneration logic and when to use the API after state cache.
- Bug Fixes: Addressed a bug seen in Sepolia related to block head size and added the error fix in the PR.
- BLS as a Map Update: Compared Matt's branch for BLS implementation and identified the need for randomizing factors, which, however, doubles the processing time. Plans to refactor for cleaner implementation and maintain two versions for optimization.
- Performance Concerns: Noted the traffic through the worker boundary and the double queuing of attestations as areas for potential optimization. Also highlighted the need to prepare for EIP 7549 changes affecting attestation data.
Discussion Highlights
- EIP 7549 Changes: Discussed the ongoing discussions around EIP 7549, particularly moving the attestation index out of the attestation data, and how it might affect Lodestar's optimizations.
- Latency Between Main and Worker Threads: Raised concerns about the 20-millisecond latency from main to worker threads, suggesting it adds significant delay. Discussed exploring other worker libraries that use Atomics for performance optimization.
- Optimization Strategies: Agreed on the importance of optimizing the latency between main and worker threads and considered prioritizing workers CPU-wise. Also discussed the need for breaking down the BLS work into multiple PRs for clearer performance impact assessment.
Gajinder:
-
6110 PR for Electra: Focused on making the 6110 PR mergeable into the Electra branch by resolving issues, ensuring test files pass, and overall preparation for integration.
-
Deneb Related PRs: Worked on PRs related to the upcoming Deneb upgrade, alongside extensive research on Enshrined Proposer Builder Separation (EPBS), Inclusion Lists (ILs), and pre-confirmations.
-
7002 PR and Devnet Preparations: Aiming to address EIP-7002 PR next, to ensure the Electra branch is ready for any upcoming Devnet activities. This involves achieving parity with the latest consensus specs, particularly 6110 and 7002 merges.
-
EPBS and ILs in Electra: Advocated for starting EPBS trials with Prysm, despite its likely exclusion from the Electra upgrade due to scope considerations. However, shifted stance to support the inclusion of ILs in Electra, citing benefits for censorship resistance and pre-confirmations.
-
MaxEB Considerations: Discussed the potential inclusion of Max Effective Balance (MaxEB) in Electra, noting its relative ease of incorporation if decided.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6313
Sepolia/Holesky Hard Fork Release:
- Focus on finalizing the release for Holsky and Sepolia hard fork.
- Also Chiado hard fork ready
- Deploying v1.15.0-rc.0 after Chiado merge is complete and ready for inclusion
- Block production update PR not required for v1.15, needs reviews
Memory Limit Increase Discussion:
- Debating the need to increase the default memory limit. Agreement on raising the limit to 8GB to ensure stability during periods of non-finality.
- Decision to set the limit commensurate with Holesky's requirements and communicate it as a risk mitigation step.
Invalid Block Errors from Execution
- A community member reported an
PROTO_ARRAY_INVALID_LVH_EXECUTION_RESPONSE
in Lodestar during the Nethermind consensus issue. The team sought to understand the nature of this irrecoverable error and communicate it effectively to the community.
Explanation of the Error:
- Block Validation Process: When Lodestar sends a block to the execution engine (EL), it expects a response on the block's validity. The EL indicates whether the block is valid or invalid and provides the latest valid hash in its canonical chain.
- Error Propagation: If a block is marked invalid, this status is propagated up to the latest world hash in the fork choice. An inconsistency arises if a block previously marked valid is later deemed invalid (or vice versa).
- Fork Choice Poisoning: This inconsistency poisons the fork choice, creating uncertainty about the validity of blocks. Since it's challenging to revert these changes, Lodestar is designed to shut down and restart in such cases.
- Dependency on EL Client: If the EL client continues to exhibit the same behavior, the issue recurs, leading to repeated fork choice poisoning. The problem is essentially irrecoverable without human intervention from a fix in the EL client.
Proposed Actions:
- Error Logging Improvement: Enhance error logging to provide clearer information about the issue, including specifics about the blocks causing the problem.
- Communication Strategy: Explain to users that the EL client has changed its decision on a block's validity, leading to a critical error in Lodestar's fork choice.
- Documentation: Develop a handbook detailing scenarios where specific error codes may arise, aiding in user understanding and troubleshooting.
Eth1Data Deposits for Block Production
- The current process involves attempting to retrieve and process ETH1 deposits at the time of forming the block body. This approach has been identified as problematic and is likely contributing to the missed block proposal issue.
- Pre-Triggering Deposit Processing: The team suggests that the process of retrieving and processing ETH1 deposits should be triggered beforehand, rather than during the block proposal phase.
- Data Caching: It is recommended that the relevant data be cached and prepared in advance, rather than being processed in real-time during block proposal.
- Fallback Mechanism: In cases where pre-triggering is not feasible or the data is not ready, the system should default to using whatever data aligns with the current last block to run the proposal.
- While the root cause of the delay in processing ETH1 data and deposits is still under investigation, implementing the proposed changes to the deposit retrieval process is considered a priority, even though the edge case is quite rare.
NC:
- Spent time reviewing Electra EIP candidates to form opinions on each EIP.
- Conducted a detailed review of the MaxEB (Maximum Effective Balance) specifications, preparing for potential implementation.
- Continued follow-up on the existing PR for block reward endpoints, addressing comments and refining the code.
- Investigating and addressing comments related to the proposal boost feature.
- Aim to stabilize the code for block reward endpoints and prepare the synchronization from the endpoint PR for the upcoming week.
- Begin implementation of the attestation reward endpoints.
Gajinder:
- Reviewing PRs and EIPs for Electra inclusion
Tuyen:
- Completed the last part of the n-historical state work related to the buffer pool.
- Rebased the n-historical state against stable and addressed the invalid state root issue.
- Updated the cache to always return the current version of the state to prevent issues if the state is mutated. No issues observed after four days of testing.
- Implemented a debug version to persist block states at the last slot of an epoch and the dialed state to the next epoch for further investigation if the issue recurs.
- Addressed an issue with an unknown client sending bad SSZ responses regarding metadata. Merged a fix to block unknown clients.
- Created an issue to downscore peers that send SSZ errors, which Matt is reviewing.
- Investigated an issue related to block production and ETH1 deposits. Created an issue for further analysis as the problem is rare.
- Noticed frequent delays in attestations taking up to one second. Successfully reproduced the issue in PerfTest and working on a fix.
Nico:
- Addressed issues where errors were thrown on the API for already known attestations. Updated the system to ignore these based on the spec and avoid 500 responses. This update should help both in DVT setup and for those running fallback nodes.
- Noticed some configuration parameters were set as constants. Reviewed and aligned presets and configs with the latest spec.
- Updated networking terms to support the Gnosis Chiado fork, which recently moved certain values to the config. This change aligns with other clients like Teku.
- Discussed the possibility of moving more parameters to configs for customization and closer adherence to the spec, though not considered urgent.
- Investigated better .lock file handling methods to address issues where power loss or unexpected shutdowns prevent restarting a valid data client. Exploring libraries like level DB for smarter log file management.
- Delayed rebasing the SSZ branch due to many open branches modifying API stuff. Suggested merging these first.
- Conducted reviews on various open branches and planned to review the rewards API after the meeting.
Cayman:
- Made small PRs in the libp2p monorepo and Ethereum.js, focusing on cleaning up discv5.
- Addressed an issue reported by Nico regarding unhandled promise rejections in discv5.
- Opened a PR in the Discv5 repo that refactors callback handling and adds try-catch blocks to prevent unhandled promise rejections. The PR is ready for review.
- Worked on the add historical state regen PR.
- Integrated the newly published classic level and added panels in a new dashboard for timing analysis.
- Deployed the feature to feat3 server group for testing. It functions but is slow, taking about two minutes to retrieve a historical state.
- The PR is marked as ready for review, though it requires further investigation to improve performance.
- The slow retrieval time for historical states is a concern. The issue might be related to the storage frequency of historical states (every ~thousand epochs), potentially leading to numerous epoch transitions during retrieval.
- Metrics did not clearly indicate the cause of the delay, suggesting the need for more in-depth investigation.
Julien:
- Continuing work on cleaning the beacon API repository. Nearing completion for a new release, though unclear about the release schedule and management in the repo. Seeking assistance for the release process.
- Implemented beacon API-related updates in Lodestar, including:
- Filtering on the blob sidecar, addressing a missing feature in recent beacon APIs.
- Ensuring the light client event has the proper shape and adding version information.
- Investigating the addition of ERC55 support to execution addresses, though decision-making on adding dependencies and addressing security concerns is needed.
- Proposed adding test cases to the beacon API repository using OpenAPI syntax.
- The goal is to integrate automated tests directly into the specification, ensuring all implementers share the same test cases and have a clearer understanding of API intentions.
- Initiated discussions with Nazar to understand and potentially contribute to the project, aligning with the team's goals for the year.
Nazar:
- Completed the last PR for cleaning up the test structure.
- Transitioned all integration tests, sim tests, spec tests, etc., to use Vitest, moving away from Mocha and other dependencies.
- The only remaining area using Mocha is the performance test. Plans to contribute to a third-party library to make it test runner agnostic, but this is not urgent.
- Sim test issues have been resolved, leading to stable test outcomes. Encourages team members to investigate any failures and suggest improvements.
- Working on making the same merge tests more stable and relevant. Plans to review each test case, moving some to sim tests and fixing others.
- Engaging in discussions and work on the block production update PR.
- Preparing to open PRs for each case in the same merge tests for easier discussion and resolution.
- Will open the first PR for sim merge tests after the call.
- Noticed the issue created by Tuyen regarding performance on ETH1 data and deposits. Previously worked on a dashboard that generated charts for this data. Plans to investigate further if the issue is not already assigned.
Agenda: https://github.com/ChainSafe/lodestar/discussions/6274
Version 1.14 Release
- Completion of v1.14 release.
- Pending announcement and documentation update needed for last-minute builder boost factor inclusions.
Planning for v1.15
- Suggestions for v1.15 release contributions.
- Aim to release in about two weeks.
- Tag potential items for v1.15 and discuss asynchronously if needed.
Discussion on Builder Selection Query Parameter
Context and Current Implementation
- The team discussed the current use of the builder selection query parameter in beacon nodes.
- This parameter enforces old behavior and works alongside the builder boost parameter for selecting builders.
- It's particularly relevant when no viable builder block is available, allowing the system to error out instead of providing an execution block.
Debate on Removal
- The main focus was whether to keep or remove this additional parameter.
- Given the new builder boost functionality and spec compliance, the team deliberated on its continued necessity and utility.
Use Cases and Impact on DVT
- Examined specific use cases for DVT (Distributed Validator Technology).
- Discussed potential shifts in handling execution blocks, especially when a builder block request is not viable.
- Middleware solutions like Charon or SSV nodes might need to locally reject such blocks.
Exception Scenarios
- The parameter currently allows for two exception scenarios:
- Execution only block: Errors if there's a viable builder block but no viable execution block.
- Builder only block: Errors if there's a viable execution block but no viable builder only block.
Specification Compliance
- Noted that with default builder selection (max profit and set builder boost parameters), the system remains spec compliant.
Conclusion and Next Steps
- No final decision was made on the removal of the builder selection query parameter.
- Highlighted the need for further evaluation of its impact in light of new functionalities and spec compliance.
The discussion reflects the team's commitment to balancing system functionality with evolving requirements and specifications, ensuring alignment with overall goals and user needs.
Handling Builder Boost Zero Parameter
Overview of Builder Boost Zero Parameter
- Focused on the implementation specifics of the builder boost zero parameter in beacon node operation.
- Key in determining how the node selects blocks under specific conditions.
Implementation Challenges
- Discussed the beacon node's behavior when a validator passes a boost factor of zero.
- This scenario implies the node should always select the local block, effectively skipping value comparison with builder blocks.
Current System Behavior
- Currently, the beacon node waits for the resolution of two promises (local and builder) before making a comparison.
- With a builder boost of zero, this comparison becomes redundant as the local block would always be preferred.
Proposal for Early Resolution
- Suggestion to modify the system for an early resolution in cases of builder boost zero.
- This would involve the node producing a block immediately once the local promise resolves, without waiting for the builder promise.
Concerns and Considerations
- Discussed the efficiency of the current system for this scenario and the need for a new approach.
- Raised concerns about potential complications in introducing new logic and maintaining system simplicity.
Decision and Next Steps
- Agreed on further investigation and potential iterations on the race helper function to efficiently handle the builder boost zero parameter.
- Aim to refine the system to accommodate this scenario, ensuring overall functionality and system integrity.
Potential Use of RxJS
- Suggestion to use RxJS for timing issues and race conditions in builder flow.
- Decision to focus on existing solutions for efficiency and maintainability.
Decision on Builder Selection Parameter
- Agreement to keep the builder selection parameter for now.
- Consideration for retiring it if it becomes a hindrance or redundant.
Further Iterations on Race Helper Function
- Agreement to iterate further on the race helper function.
- Focus on efficient handling of builder boost zero parameter.
Status of Historical State Regeneration
- Completed implementation of the historical state regeneration feature.
- Currently blocked by an issue with Level DB.
- A lack of response on a blocking pull request (PR) submitted to Level DB.
- No communication from the Level DB team for a month.
- Intend to reach out to the Level DB team again, informing them of the decision to fork.
- The approach aims to unblock the impasse and progress with the historical state regeneration feature.
Discussion on Increasing Peer Count to 100
- Question raised about whether the issue in gossipsub version 1.10 has resolved the libp2p peer count.
- The problem involved having more peers than necessary, acting as a blocker for the increase in peer count.
- Peer Rotation in libp2p 1.0: Observed smaller peer rotation in the libp2p 1.0 branch.
- Quicker peer rotation previously led to spikes in peer count.
- Newer branch showed longer peer connections with fewer disconnects and reconnects.
- Suggested testing the increase to 100 peers on the new branch.
- Noted that the branch is nearly ready for merging, with only minor issues left.
- Mentioned performance problems in Lighthouse with flood publish when increasing peer count.
- Questioned if Lighthouse has batch publishing, which might mitigate issues.
- Agreed that batch publishing should bring improvements.
- Next Steps: Decided to first proceed with the libp2p upgrade, followed by rebasing and further testing.
Discussion on the self-hosted runner
- Identified a problem with unknown blocksync simulation tests, being addressed with Gajinder.
- Concerns about whether issues with sim tests were related to unknown block sync.
- Regardless of immediate fixes to stabilizing sim tests, there was agreement to change the self-hosted runner.
- If not needed, the expenses for the machine might be reconsidered.
- Suggested running benchmarks on the same machine.
Discussion on Validator Config Documentation
- Proposed removing excessive content and directing users to official sources.
- Additional Feedback
- Direct users to Ethereum Foundation's staking website for initial steps.
- Guide users to use Lodestar after completing initial steps.
- Add the mainnet beacon contract address.
- Include the launchpad address for better accessibility and awareness.
- General agreement on updating the documentation to make it more streamlined and relevant.
- Acknowledged the need to link important resources and update terminology.
Julien:
- Julien continued work on the Beacon API, focusing on the Beacon API repo itself.
- Noted that error handling in the Beacon API is still a bit confusing and needs attention.
- Understanding**: Proposed to migrate examples and tests from Lodestar repo into the spec directly.
- Goal to ensure consistent understanding and implementation across different platforms.
- Suggested creating a script or infrastructure to test on a running node for validation and comparison of implementations.
Lodestar Light Client Development
- Continued work on integrating Lodestar code directly into documentation for demonstration purposes.
- Encountered issues with node abstractions leaking in the browser, such as process and buffer objects.
- Identified potential improvements on the browser side and Lodestar side to facilitate browser development.
- Plans to discuss with Nazar for insights on integrating the light client in browser applications.
- Noted the existence of a chrome extension by Fireblocks that successfully runs in the browser.
Matt:
- Three PRs are up for the BLST library, covering memory, fuzzing, and performance tests. Comments on these PRs have been addressed and are ready for further review.
- Created a fork of the Node repository to release a debug build and attach binaries to releases. Addressed the issue of binaries not being consistently available by hosting them more officially on GitHub. Provided a URL for pulling Node builds, ensuring reliable access to necessary binaries.
- Noted the GitHub runner's limitation of 8GB, insufficient for linking Node together. Discussed the requirement for access to larger instances to use the solution effectively. Added a recipe to the unofficial builds repo of Node to build and host non-standard builds of Node debug. Preparing a PR for this alternative solution but faced access rights issues.
- Updated the git checkout action to include a debug flag. Attempting to host debug builds either through the releases from the forked repo or the unofficial builds repo. Plans to integrate the download URL for debug builds into the git checkout action.
- Requested a final review of the PRs for the BLST library.
- Awaiting the build and approval for the Node debug unofficial builds, or access to use the build jet for testing the checkout action fork.
- Continuing work on QUIC while waiting for resolution on the Node debug build issue.
Cayman:
- Finalizing the libp2p v1.0 PR which had been pending for a while. With recent updates to libp2p dependencies, the PR has reached a stable state.
- Addressed a bug in multi-stream select causing protocol stream errors, which has now been resolved.
- Plans to update the gossip sub version to the latest release and finalize the PR.
- Recognized that the SSZ (Simple Serialize) API PR has not been progressing on his end and needs attention. Considering either advancing the PR himself or handing it off due to the risk of it becoming outdated.
Nico:
- Finalized the last fixes for the Goerli fork, ensuring stability and functionality.
- Collaborated with Julian on multiple merges in the Beacon API spec. Noted only one PR left to be merged in the Beacon API spec.
- Emphasized the need to carefully handle errors to avoid incorrect attestation publishing.
- Examined updates required for the Chiado Deneb hardfork, particularly for the Gnosis testnet. Identified an issue with Deneb values being set as constants rather than part of the configuration, affecting network support flexibility.
- Highlighted necessary updates before supporting other networks like Gnosis. Mentioned that for mainnet tests, scheduling the fork epoch should suffice.
- Plans to rebase the SSZ branch as part of ongoing work. Aims to clean up all server handlers for improved functionality. Anticipates updating the client side and achieving a buildable state within the week.
N.C:
- Specifically focused on the 6110 PR and the block reward API PR related to it.
- Conducted testing on the proposal boost reorg. Believes the PR is now ready for review. Expressed uncertainty about the approach and is open to opinions and suggestions.
Tuyen:
- Completed the implementation of the confirmation rule prerequisite in fork choice. Awaiting the latest release of spec tests for further validation. Noted a small improvement to recalculate the total balance of the justified checkpoint.
- Addressed the issue of redundant block downloads by root at every slot. Implemented a catch to avoid downloading blocks already available through gossip but in the process of being processed. The Pull Request for this solution has been merged.
- Received the first round of review from Cayman on the latest N-Historical States PR. Encountered a hurdle with vitest not supporting explicit resource management.
- Configured zero historical states in a test branch for n-historical state. Discovered a bug causing an invalid state root error after three days of running. Acknowledged the bug's complexity and the time required to resolve it.
- Deployed a node for batch publish analysis and noticed unusual metrics, including missed attestations and significant balance delta increases. Intends to revisit the issue after merging the libp2p changes for further analysis.
Gajinder:
-
Actively working on cleanup Pull Requests for Lodestar in preparation for the Deneb release. Continues to prioritize cleanup tasks leading up to the mainnet release.
-
Reviewed new candidate EIPs for the next hard fork, including EIP-7002, EIP-7549, EIP-7251, and PeerDAS. Most EIPs appear easy to implement, with a special focus planned for PeerDAS.
-
Discussed some of the EIPs, with NC possibly working on EIP-7002.
-
Consulted Lion about EIP implications, specifically on pulling the committee index out of attestations.
-
Understanding EIP Implications:
- Clarified that the committee index removal from the signing root doesn't eliminate different committees.
- Noted that although it may seem redundant, it still benefits signature aggregation and verification.
Topic: Invalid State Root Issue Analysis
- Issue Overview: During the weekend, an issue was encountered with the invalid state root in block proposals. This involved two types of blocks: one from the builder and the other from execution.
- Block Publishing Discrepancy: When the builder block failed to build, the execution block was chosen, signed by the validator, and submitted for publishing. However, there was a mismatch between the state root at publishing and the calculated state root.
- Initial Assessment: The issue initially seemed related to state transition but was later identified as a change in block hash during serialization between the validator and the beacon node.
-
Investigative Approach:
- Observed that the block produced at the beacon had a different hash root compared to what appeared at the validator.
- It was theorized that the body root was incorrect, which led to the state root mismatch.
-
Resolution Steps:
- Introduced patches to take a hash tree root of the execution body immediately upon receipt, even before sending it for state transition computation.
- Added caching for the body root hash to ensure consistency across computations.
- Implemented additional logging for the body root, the block hash calculated by the beacon, and the block hashes signed by the validator.
-
Outcome: These changes provide a robust logging trail for any future occurrences of similar issues, enabling more efficient debugging and resolution.
- As of standup, this issue has not been seen again.
-
min-bid
lowered could be a reason why, but we should reinstate 0.07, give it a few days of observation and release the patch. - There have been deployments where the builder block consistently errors, triggering fallbacks. This issue was observed on Holesky, where a produced blinded block version was running instead of produced block V3.
- If such errors occurred, they would manifest as mismatches in the roots because the full block is reconstructed from the local cache when published through the published blinded block.
- Theory: The error might only occur when the fallback is triggered. For instance, when a builder block request fails, it falls back to another block type.
- This issue was not observed in some deployments where only local blocks were used without a builder connected.
- Devnet 11: The flow being discussed was operational on DevNet 11 for an extended period. Despite frequent builder errors, the specific issue never surfaced there.
- The plan is to monitor the issue starting today, with a decision to be made by Friday on whether to push out the patch. If no incidents are observed by then, the issue could be considered resolved.
Patch Release Planning:
- Infrastructure teams will be brought in to monitor the situation. The intention is to release a patch on Friday, incorporating several suggested fixes.
- Additional suggestions for the patch include addressing the Auth headers encoding issue raised by Jacob. This requires review before inclusion.
Latest Gossipsub Version: The recent version of Gossipsub (version 10) may potentially resolve the memory leak issue.
- Heap Snapshots Analysis: After examining different heap snapshots, it appears that the leak may originate from the outbound stream within Gossipsub.
- ES-Lint Fix: A recent PR addressed an ES-Lint issue by catching an error during the closure of outbound streams. This fix is believed to address the memory leak by ensuring proper removal of peer-related streams from the peer map.
- Draft PR for Confirmation: A draft PR is in place to validate this fix. It requires time to conclusively prove its effectiveness.
- Awaiting Confirmation: If the fix proves successful, it could be included in a hotfix version.
Tuyen:
-
PR Merges: Successfully merged PRs to improve process slashing and implement a shuffling cache, which is part of the n-historical state project.
-
Historical State Work: Progressing on a method to persist only one checkpoint state per epoch, work is ongoing.
-
Invalid State Root Issue Investigation:
- Deployed a MEV boost hardcoded version to always return no min-bid received, mimicking their setup.
- Despite the deployment on mainnet, the invalid state root issue did not occur, continuing the investigation.
-
Memory Leak Investigation:
- Identified that
streamsOutbound
was holding a significant amount of memory. - Observed that streams outbound might not be removed properly from peers.
- Noted that the latest gossipsub version addressed this issue.
- Identified that
-
Additional Work: Created a PR to track the input generation step.
Gajinder:
-
Invalid Proposals Issue: Focused on resolving the issue with invalid proposals on the engine.
-
DevNet Twelve Performance:
- Successfully participated in DevNet Twelve, featuring the new format of Blobsidecars with inclusion proofs instead of signed Blobsidecars.
- The builder flow is functioning well in this new setup.
-
Hive Test Results:
- Mario from Hive conducted tests against the build, with most sanity cases passing.
- Identified few edge cases related to the equivocation of Blobsidecars.
- Lion's suggestion on the PR about not accepting new sidecars could potentially resolve some issues.
- Another noted issue is the late gossiping of Blobsidecars in the block, currently only waiting for 4 seconds as per the PR.
NC:
-
Consensus Block Value for Produce Block V3:
- Completed implementation and received a review from Gajinder.
- Identified some ambiguities in the API spec, which requires clarification from the API discord channel.
-
Rewards API Development:
- Progress made on the Rewards API, but currently on hold due to the need for API spec clarification.
-
Reorganizing Late Blocks:
- Spent time understanding the spec for reorganizing late blocks.
- Initially planned to start working on it later this week, but this may depend on the progress with the Rewards API and spec clarifications.
Cayman:
-
Js-libp2p 1.0 Integration:
- Actively working on integrating Js-libp2p 1.0.
- Inadvertently addressed a memory leak by adding more Linter rules to gossipsub.
- Collaborating with Alex to debug some persisting issues, likely related to noise.
-
SSZ API Progress:
- Making significant progress with Nico on the SSZ API.
- Most route definitions have been transitioned; only the key manager routes remain.
- Approximately 50% completion on the project.
- Next steps involve tweaking tests and updating client call sites.
-
Exploring ChatGPT Plugins:
- Experimented with Chat GPT and its plugins, particularly the 'ask the code' plugin.
- Successfully used the plugin to query information about the Beacon Chain class in Lodestar.
- Suggests the potential utility of the plugin for querying Consensus specs, Lighthouse, and other projects.
Lion:
-
EIP-7549 Proposal:
- Proposed EIP-7549 to move the index from the Attestation data to the Attestation body.
- The change significantly reduces the padding cost (64 times cheaper) and network complexity.
- This simplifies processes for beacon nodes by reducing the number of signatures they need to check.
-
Max Effective Balance (MaxEB) Design Suggestion:
- Francesco, a consensus researcher at EF, proposed a simpler design for MaxEB.
- Instead of consolidating stakes, introduced a new concept called a "cluster."
- Entities can register as a cluster and aggregate their signatures, sending only one aggregate signature to the network.
- This approach simplifies network complexity without consolidating state size.
- The main goal of MaxEB is to allow single slot finality by reducing computational and networking costs.
-
Clustering Concept Explained:
- Clustering is about reducing the number of attestations sent to the network.
- It involves entities forming clusters and agreeing to aggregate their signatures.
- This does not impact the state size, focusing more on operational efficiency.
-
Compounding and Clustering Impacts:
- Discussed the impact of compounding rewards and the difference it makes.
- The compounding effect may not be as significant, especially when considering the frequency of compounding.
- Clustering at the network level is expected to have minimal negative impacts.
- Benefits of clustering include reduced operational costs and the facilitation of single slot finality.
-
Operational Changes with Clustering:
- Large node operators could potentially scale down the number of beacon nodes they're running.
- Clustering by a factor of 100 would significantly reduce the network burden.
Matt:
-
PR for Race Condition Fix:
- Posted a PR to fix a race condition that appeared in Electron during package publishing.
- It was a simple issue of an asynchronous function hidden in synchronous constructors.
-
Native Heap Analysis:
- Explored various tools for native heap analysis, including Valgrind, Massive, and GDB.
- Settled on Heaptrack, which works well for both C++ and Rust.
- Documented the process of using Heaptrack for analyzing memory usage, which includes creating flame graphs.
-
Analysis on Lodestar Unstable:
- Ran Heaptrack on Lodestar Unstable to identify components contributing to RSS.
- Found that BLST points, node internals, and LevelDB are major contributors.
- The analysis provided insights but more work is needed to fully understand memory usage.
-
Debugging BLST Rebuild:
- Detected a segmentation fault in the BLST rebuild and identified its cause.
- Working on a fix for the issue, which stemmed from the structure of a function/class.
-
Heap Dumps Analysis:
- Analyzed heap dumps provided by Nazar, focusing on tracking function pointers and memory addresses.
- Using LDB for analysis, which has a steep learning curve.
- Writing documentation on how to analyze core dumps.
Nazar:
-
Benchmark Performance Analysis:
- Investigated the performance of phase zero block attributes.
- Added numerous metrics to block production processes for detailed step-by-step measurement.
- Concluded that phase zero attributes are not the primary contributors to block production time.
- A PR related to this analysis has been approved and will be merged soon.
-
Migration of Proverbs Tests to Vitest:
- Preparing to migrate Proverbs tests to Vitest, with an emphasis on browser-based testing.
- This migration will facilitate easier transfer of other tests in the future.
- An older PR in this regard is being finalized for regression fixes and updates.
-
Upcoming Work:
- Planning to open additional PRs next week, focusing on unit tests for other packages.
- Will address other high-priority issues as they arise.
-
Segmentation Fault (Segfault) Issues:
- Shared insights from the Vitest community and Matthew regarding segfaults in projects using native code.
- The problem may relate to thread implementation details in Node.js.
- Suggested moving from threads to child process forks for better stability with native code.
- Plans to share a discussion thread for further examination and potential solutions.
Nico:
-
Debugging Depth Node Issues:
- Resolved an issue with Key Manager API not returning proper responses, which disrupted synchronization between validator keys and web3 signer keys.
- Fixed a minor issue where duties were not being deleted upon key deletion.
-
Synchronization of Local and Remote Signer Keys:
- Explored native implementation for syncing local public keys with remote signer keys.
- Noted that third-party solutions typically use scripts or services for this purpose, questioning the necessity of a native implementation.
- Related discussion in the Loadstar help channel about maintaining sync.
-
Improvements and Reviews:
- Made minor fixes and log improvements.
- Reviewed Luca's PR for voluntary exit, noting a useful refactor for the web3 signer that can be leveraged in tests.
- Continued work on the Server implementation refactoring for SSZ, with some aspects still pending and cleanup required.
The planning meeting focused on addressing performance concerns and setting goals for upcoming Objectives and Key Results (OKRs). Key points discussed included:
- Performance and Future Requirements: Discussion on how Ethereum roadmap and evolving specifications might affect Lodestar's performance. Current performance is within acceptable limits, but future updates might increase requirements.
- Metrics and Iterative Improvement: Emphasized the importance of having good metrics, especially performance-oriented metrics, for iterative improvement.
- Deneb Hard Fork: Discussed the need for metrics related to the Deneb hard fork, acknowledging a lack of insight into how this would affect the node's runtime.
- Garbage Collection Concerns: Discussion on the importance of monitoring garbage collection metrics, particularly spikes in port condition and their impact.
- Debugging and Tooling: Highlighted the need for effective debugging tools for understanding memory usage spikes and main thread performance.
- Actionable Goals: Consensus on setting actionable goals for performance improvement, focusing on developing tooling for diagnosis and incorporating these goals into the OKRs.
Matt:
- Documentation Update: Completed a documentation update and planned to run a verification script against the Light client example.
- Debugging BLAST Memory Usage: Investigating memory usage of individual objects in BLAST, which are heavier than in the existing library.
- Heap Debug Research: Researching GDB for heap debugging to understand heap dump contents, addressing memory usage concerns.
- Level DB PR and Race Condition: Addressing a race condition discovered during publishing of the Level DB PR, which was not detected in unit testing but appeared in Electron testing.
- Goals for the Week: Aims to respond to PR comments, continue heap debugging research, resolve the Level DB race condition, and potentially collaborate with Cayman on reviewing Quinn project aspects.
Nico:
- Packaging Solutions Research: Investigated different solutions to compile binaries on GitHub without ARM runner support. However, no successful solution has been found yet.
- Tasha Package Evaluation: Examined the Tasha package, previously mentioned by Cayman, as a potential solution. Despite its recent deprecation, it's considered a better option than its replacement. Nico plans to review the Tasha code in-depth and potentially discuss its long-term viability with the package author.
- Ethereum on ARM Solution: Explored how Ethereum on ARM currently handles Loadstar releases. They publish new packages for each release, including DBN packages, using Kasha on an ARM server. Nico has contributed updates to their repository to streamline this process. The goal is to simplify their workload by enabling Loadstar to publish binaries directly, eliminating the need for Ethereum on ARM to run the script for Loadstar in future releases.
- General Activities: In addition to this research, Nico also addressed various issues and conducted code reviews.
- Ethereum on ARM's Current Process: They currently run a script in their repository to build and publish Loadstar binaries using the Kasha package. This process is executed on an ARM server. Nico has updated this process to ensure efficiency and correctness. The aim is for Loadstar to take over binary publication to reduce the workload for the Ethereum on ARM team.
Lion:
- Published EIP-6914 Document: Successfully published the EIP 6914 document, with input and comments from the team.
- Blob Sharing Protocol: Released the Blob sharing protocol spec, developed in collaboration with Dankrad. The protocol promises a trustless solution, and Lion plans to implement it when time permits.
- Max Effective Balance Work: Plans to resume work on the max effective balance feature, especially in light of the upcoming Electro hard fork.
- Slashing Risks Analysis: Prepared a document analyzing slashing risks, which he discussed during the team retreat. Lion is now taking a mathematical approach to this analysis and is keen to receive feedback from the team.
- Consolidation Challenges: Addressing challenges related to consolidation, which has received mixed responses. Lion anticipates this will be a complex and time-consuming task.
- Worked with Jimmy at Lighthouse to implement a light client server. We are 1 of 9 PRs merged.
Tuyen:
- Validator Generation and Flare CLI: Attempted to generate a larger validator set for testing, but faced challenges due to the time required. Also used the Flare CLI to simulate validator slashing and discovered a bug related to block production when handling a large number of attestations and slashings. A fix has been identified but needs further review.
- Memory Leak in Network Thread: Submitted a pull request to address a memory leak in the network thread. The tests are close to completion and the PR is pending review.
- Shuffling PR Testing: Tested the shuffling pull request and found it to perform comparably to the current unstable version. This testing is part of the ongoing historical stack work and is crucial for future development.
NC:
-
Benchmarking for 6110 Pubkey Cache: Published the initial benchmark results for the 6110 pubkey cache improvements. Adjustments were made based on the findings, and the benchmark was shared in the PR comments for further opinions and feedback.
-
Implementation for DevNet 12: Started working on implementing the addition of the consensus block value to the produce block v3 endpoint. This feature is critical for the upcoming DevNet 12, which is scheduled to take place in approximately three days. The implementation process revealed a significant overlap with the rewards API, suggesting that once the current task is complete, the rewards API implementation would be more straightforward.
Nazar:
-
Core Dump in CI Server:
- Successfully merged a PR that enables core dump storage on the CI server.
- If there are any segmentation fault errors on the CI server, core dump files will now be attached as PR artifacts.
- Team members are encouraged to download these artifacts and share them on Discord for further investigation.
- This approach was adopted due to the difficulty in manually detecting core dumps during trials.
-
Investigation of Slow Block Production:
- Ongoing investigation into the causes of slow block production, particularly when it exceeds 1 second.
- Preliminary findings suggest the delay isn't due to the speed of the builders or execution APIs. Instead, it may be related to the sequence of execution, where the process waits for one execution before moving to the builder.
- The issue doesn't occur uniformly across all servers but is limited to a few.
- Creating initial benchmarks for block production to serve as reference points for future performance assessments.
- Plans to open a draft PR and test on servers, although the exact cause of the slowdown is still under investigation.
Cayman:
-
SSC API Refactoring:
- Currently at the halfway point of the refactoring process.
- Completed route definition refactoring for all endpoints except validator and key manager ones.
- Plans to implement changes across all call sites once these remaining endpoints are refactored.
- Future work includes enhancing tests and making request/response/header validation more robust.
-
Collaboration with Libp2p:
- Working with Alex from Libp2p on improvements for the upcoming 1.0 release of Js-libp2p.
- Key performance enhancements include:
- Reduction of buffer copies in the stack.
- Elimination of one round trip in session establishment.
- Halving the time to first byte for a peer, as observed in tests.
-
Quinn JS Binding Scaffold Project:
- Created a scaffold project for Quinn JS binding.
- Intends to share this with Matt for further development.
Transcript: https://pastebin.com/PPgaTfWK
Reprioritizing Yamux: Yamux was initially deprioritized as Mplex was removed. However, there is an initiative by Pawan to reintegrate it. The team discussed the potential challenges and the need to reintegrate Yamux before the upcoming release. Reason for Change: It was mentioned that the older system (Mplex) was deprecated by lipp2p specs, making way for Yamux, which is touted to be better and faster. However, our specific implementation had issues, notably a potential memory leak.
Memory Leak in discv5: There was mention of a discv5 memory leak. The connection or difference between this memory leak and the Yamux issue was clarified. Status: The memory leak issue was noted as something they wanted to work on previously, but other priorities kept pushing it aside.
Libp2p upgrade to 0.46: There was an upgrade to libp2p, and the team discussed its relevance to the Yamux issue. It was stated that the upgrade unblocked Yamux's integration. Challenges: The team discussed the challenges of deprecating Mplex and whether the integration of Yamux should be expedited for v1.11. We will try to get it in depending on the results seen.
Holesky Testnet: The speaker mentioned their focus on the Holesky testnet, emphasizing that the keys for it have been generated and uploaded for its Genesis. Infrastructure Challenges: It was noted that they plan to use 5,000 keys per server, a task never done before. The infrastructure team is prepared for potential breaks and might reduce the number of keys if issues arise.
Beacon Data Pruning: A significant focus was given to dealing with finalized archived states in the beacon data. An issue was created just before the meeting to discuss potential pruning features, aiming to limit the growth of beacon data.
Boot Node CLI Command: Update: There's a draft PR for a Lodestar boot node, which would simply run a discv5 server without running a full beacon node. The idea is to contribute to the boot node ecosystem, aiming for better distribution among different jurisdictions and providers.
Potential Features for v1.11: Fork Choice Improvements: Tuyen was working on performance improvements in this area. Protons Upgrade: A mention of the Protons upgrade in Gossipsub which had been pending for a long time.
BLS (Boneh-Lynn-Shacham) Implementation: A significant part of the meeting was devoted to discussing the integration of BLS, a cryptographic method, into their system. It's a large change, requiring meticulous review. Status: While promising metrics were observed, there's some apprehension about rolling it out without thorough testing and review.
Node Performance & Improvements: Target Peers: Tuyen had plans to work on increasing target peers post the testing of the Protons feature. GC Time Reduction: Adjusting the 'new space' had resulted in reduced garbage collection time, impacting block times positively.
Issue of Aggregated Attestation Errors: There had been observed errors in Lodestar beacon node's aggregated attestation. Status: Nico provided an update, and Nazar mentioned working on a PR that would enable different validator and beacon node combinations in simulation tests to detect if the aggregate isn't produced.
Tuyen:
- Bigboy Testnet Issue: Problem: Observed a long update head call taking up to eight seconds, resulting in a notably low number of peers. Cause: In the devnet test, many unfinalized proto nodes were detected, causing the updateHead call to grow exponentially. This was primarily due to excessive checks to verify if nodes shared the same finalized checkpoint. Solution: The main fix applied was within the 'node verify for head' which significantly reduced the time consumed. With the introduction of this fix, Tuyen believes the next version of Lodestar, when used on that testnet, won't reproduce the same problem.
- Performance Improvement: Tracked votes by index to enhance the compute delta function. Results: While performance tests showed a 2x or 3x improvement, the mainnet node testing exhibited an even more remarkable 8x speed-up. The typical updateHead call duration reduced from 240 milliseconds to about 30-40 milliseconds. Tuyen requests reviews on this matter.
- Protons Migration: Protobuf in gossipsub has been implemented by protobufs-js. There has been a plan to transition to protons, which was postponed. Current Status: Tuyen is now looking to implement this migration into the system and is currently testing it within a group.
- Gossipsub Metrics Issue: A small PR in gossipsub intended to de-duplicate metrics. An earlier PR aimed to unbundle metrics, but it resulted in two validation phases: one in gossipsub and another at the application level. The naming of these metrics (like the count of invalid or valid messages) became confusing due to duplication. Solution: In Tuyen's PR, he proposes renaming metrics at the system level (perhaps as 'pre-validation result') to distinguish them from application-level results. This de-duplication is essential since running Lodestar with the latest gossipsub becomes unfeasible without this PR. He welcomes alternative suggestions and feedback on his proposed changes.
- Index Gossip Queue: The queue's implementation has been merged. Tuyen is currently working on a PR to utilize this, marking the final PR for this work segment.
Nico:
- Nico investigated a problem related to Lighthouse and has documented his findings.
- Nico updated the bootnodes. This involved pulling the latest changes and updating the hard-coded values, especially the ENRs (Ethereum Node Records) present in their code.
- Nico addressed an issue in which the system didn't support the authorization header, or more precisely, basic authentication was not supported. This deviation from the specification was due to a bug in their past implementation. An observation was made: had they upgraded to NodeFetch version 3, they would've faced the same problem. Nico's solution was to build the header in advance since the fetch library doesn't do it by itself.
- Nico has been exploring the transition of moving the network from being a thread to being a process. He managed to get it operational, but a new issue arose: after some time, the main thread stops receiving any message events. Nico has identified that there is a high volume of events being dispatched from the worker to the main thread, which could be causing this issue. He's unsure if it's a bug in the child process's execution or another issue, and he's currently focused on debugging this problem.
Gajinder:
- Gajinder has been focusing on the integration of "produce block V3." This new version serves as a unified API for both the execution block and the builder block. The main intent behind this development is to shift the race between the block builder and execution over to the beacon.
- Challenges:
- Beacon Node URL Treatment: Deciding if each beacon node URL should be treated distinctly or as fallbacks.
- Handling "Produce Block" Actions: Issues arise especially if some URLs don't respond in time.
- Variability in Connections: The connections of beacon nodes to builders can influence block optimization.
- Beacon URL Perspective: Consideration on whether all beacon URLs should be seen as separate block producers and raced with a specific cutoff.
- Fallback URL Racing Structure:
- Gajinder was prompted about his opinions on this structure and its design trade-offs.
- Vouch's Strategy: While Vouch races multiple beacon nodes to get the most optimal proposal, many validators usually link to a single node.
- Variation in Setup: Different beacon node setups can have diverse builder attachments, affecting the block's value.
- Two-Second Cutoff Strategy: A method where there's a two-second cutoff and then picking the first to resolve seems optimal. This might be beneficial even if the race is transferred to the beacon node.
- Publishing Blocks: Broadcasting to all, as opposed to depending on a fallback system, might prove more efficient.
- Proposed Solution:
Gajinder is contemplating introducing a mechanism where:
- The HTTP client will race all URLs with a specified cutoff and timeout.
- This would provide a generic interface for any calls they'd want to execute in this mode.
N.C.:
- ePBS Discussion:
- N.C. and the team last week concentrated on the design of the inclusion list for ePBS.
- Various inclusion list designs exist:
- Forward inclusion list
- Same slot or same block design
- Top of block
- Bottom of block
- Despite these variations, when it comes to the engine API, most of these designs have similar specifications.
- N.C. documented these specifications and offered to share the link for others to review. Validator and Builder Spec:
- The team from Prysm is still contemplating and delving into many intricate details that need to be addressed. Analysis on 6110 - Pubkey Cache in Lodestar:
- N.C. analyzed the pubkey cache in Lodestar's current codebase.
- Personal Opinion:
- N.C. believes that there's no requirement for the unfinalized index to pubkey cache. The rationale is:
- The beacon API doesn't use the index to pubkey. Instead, it uses pubkey to index.
- For any use cases that leverage index to pubkey, it always mandates an active validator, rendering the unfinalized cache unnecessary.
- N.C. believes that there's no requirement for the unfinalized index to pubkey cache. The rationale is:
- Feedback:
- N.C. acknowledged comments from Gajinder, particularly one pointing towards non-finality. This aspect still requires some thought.
- N.C. is looking forward to receiving more insights from Lion regarding the pubkey cache design.
- Once a consensus on the design is achieved, N.C. plans to commence with coding.
Cayman:
- Libp2p Update: Cayman successfully updated libp2p to its latest version during the past week. The update primarily involved a significant amount of package renaming.
- A notable change in the update is that js-libp2p has transitioned to a monorepo. Consequently, all the relevant content is now housed within the libp2p/jslib2p repository.
- Exceptions include ChainSafe maintained packages such as gossipsub, noise, and yamux.
- Impact of Libp2p Update: The update to libp2p has paved the way for Tuyen to proceed with tasks associated with gossipsub and yamux, as well as other upgrades. The team was previously unable to upgrade certain dependencies due to the breaking changes present. Upgrading libp2p was a necessary precursor to address those.
- Boot Node CLI Command: Cayman introduced a boot node CLI command during the past week. While the command is currently functional, Cayman wishes to conduct further refinements. He is particularly interested in streamlining the initialization processes for both the beacon node CLI and the boot node. Cayman aims to complete the enhancements on this feature in the coming week.
- Investigation into Yamux: Cayman expressed concerns over the performance of Yamux, which, in his observation, is lagging behind Mplex. The performance differential is roughly in the range of 5% to 10%. In order to enhance Yamux's performance, Cayman has been incorporating several tweaks, paralleling those used in Mplex. However, he has yet to achieve a comparable performance between the two in initial tests.
- Memory Leak Issue: Cayman identified a potential memory leak during his comparison tests.After conducting a heap snapshot, he encountered challenges in viewing the snapshot. The sole tool Cayman is aware of for visualizing heap snapshots is Chrome DevTools. However, it seems unresponsive, getting stuck in the "building the dominator tree" phase. A suggestion was made to try using Brave DevTool, which had previously been effective when Chrome DevTool was not.
Nazar:
- Prover with Web3.js 4: Nazar dedicated his efforts last week to refine Prover to make it compatible with the Web3.js 4 version. During the process, he encountered some problems and subsequently opened a PR to address and rectify these issues. With the corrections in place, Prover is now functioning smoothly.
- Bug in Browser Logger: While working on Prover, Nazar identified a glitch in the browser logger, which rendered it incapable of logging anything within the browser. He successfully rectified this error.
- Unit Testing for Logger: Nazar observed that there was an absence of unit tests for various logger components, such as the environment logger and browser logger. He identified this gap as a potential reason why the aforementioned bug in the browser logger went unnoticed initially. To rectify this oversight, he has written unit tests explicitly for the logger package.
- Lightclient Demo PR: Nazar completed the Lightclient demo PR, which is now primed for review. This version uses Prover instead of directly employing the Light client. He extended an invitation for anyone available to review the PR.
- Decoupling in Simulation Test: Currently, Nazar is engrossed in a PR that seeks to segregate the beacon and validator within the simulation test. This move aims to introduce flexibility, allowing for the interchange of execution beacon and validator in the simulation test, much like how execution and content can currently be mixed. Nazar anticipates wrapping up this PR shortly, as it is nearing its completion.
- Upcoming Work - Integration with MetaMask: Nazar shared his plans for the forthcoming week, which primarily revolve around integrating with MetaMask. He mentioned that he had previously commenced work on a draft related to this integration. Consequently, he already possesses some groundwork that he can expand upon. His objective is to finalize this draft and produce an initial demo that showcases Prover's functionality within MetaMask.
Matt:
- BLST PR and Loadstar: Matt completed the blast PR and incorporated it into loadstar. All pending PRs in the blast repo received approval and are only waiting on CI and a few other processes.
- Metrics in Lodestar: Noted a performance decrease in Lodestar when it ran with four LibUV threads due to an insufficient number of worker threads. Once the number of worker threads was adjusted to match, the performance noticeably improved, though not drastically. Several metrics, including aggregated keys, signature sets, block epoch transition times, and block production times, showed improvement. A few metrics worsened slightly, but overall, the changes seem either neutral or slightly beneficial. CPU usage decreased by 30%, providing more resources for other tasks, even though memory usage increased slightly.
- Garbage Collection and New Space: Matt experimented with different settings related to garbage collection and found that increasing the new space leads to better performance. He plans to deploy another version to validate his findings and ensure the assumptions are accurate, particularly regarding the performance impact of setting garbage collection limits higher than necessary.
- Research on Error Zero: Matt conducted research on GitHub to understand the root of the "error zero." He suspects the issue might arise from serialization/deserialization in cache management during the startup phase. Matt posted some of his findings on Discord for reference and will continue investigating this lead.
- Work on Blinding Blocks: Matt reviewed Blind's suggestions for updating blinding blocks and is working on integrating these ideas into his branch. He plans to start with unit tests for transition functions, then incrementally work on the more complex aspects.
- Feedback Request: Matt seeks feedback on the metrics of feature two to ensure he's focusing on the right aspects. He expressed a desire for candid feedback to better his understanding and asked for collaborative insights from his team.
- Infrastructure: In the context of utilizing feature groups for testing, Matt has been using three feature groups, but feels he might be monopolizing them. He plans to release some of these groups now that he's collected sufficient data.
Transcript: https://pastebin.com/tiTnAVVJ
Big Boy Testnet Issues: Lion detailed certain issues related to Big Boy testnet in issue 5855 and 5857. The participants discussed these problems, particularly focusing on memory usage and cache-related matters. A significant observation was that memory usage spiked up to 12 GB in certain scenarios. Lion indicated that a new DevNet mimicking the mainnet environment was being set up, which would provide more insight. Cayman brought up the topic of reducing cache beacon state size as a potential solution. This approach would be major, and alternatives were discussed. Cayman emphasized the importance of efficient representation of pubkeys and withdrawal keys.
Memory Leak Issues: A potential memory leak was identified when upgrading to Node.js 18.17, although this was not confirmed. It was noted that there was no memory leak in Node.js 20. A discussion about whether to continue supporting Node 18 arose, with most agreeing to support it until Node 20 becomes Long-Term Support (LTS). Nevertheless, they also acknowledged that Node 20 might be preferable, especially if it simplifies things.
Interoperability with Other Clients: There were concerns about the compatibility of Lodestar with other popular clients like Lighthouse, Prysm, and Nimbus. The fallback logic on the Lodestar Validator Client (VC) seemed to be a significant cause of incompatibility, as other VCs expect the beacon node to handle it. Nico highlighted that using Lodestar with Prysm might be challenging since Prysm employs a distinct API for communication.
Updates on the network worker thread. There's a mention of adjusting memory by bumping the "new space," which impacts the event loop.
Performance Issues: A memory leak was identified, which led to exploring various solutions. There's an experiment to deploy with the "new space update" on version 20 to see if it offers a solution.
Worker Threads vs. Child Process: There's a debate about whether using worker threads or child processes would be more effective. Worker threads are suggested for short-lived, CPU-intensive tasks, while the network task in question is long-lived and I/O-intensive. Using child processes might lead to better OS resource allocation and prevent memory sharing with the main process.
Existing Tools: Mention is made of NodeCluster, a tool which uses child processes. A suggestion to try this is made.
Desire for Understanding: There's a quest for a deeper understanding of why worker threads might be superior or inferior to forks at a basic level.
Memory and IPC: A point is raised about memory overhead with child processes and the possible increase in Inter-Process Communication (IPC) cost. However, it's also mentioned that, on Linux, the performance difference is hardly noticeable.
Event Loop Times: Splitting tasks between threads reduces the event loop time, but doesn't necessarily improve API response times due to latency. This suggests that while individual loops perform better, the overall system doesn't see a significant boost.
Performance Expectations: The introduction of worker threads aimed to improve performance. However, changes in the landscape, such as deterministic long-lived subnets, may have made the anticipated gains less noticeable.
**Testing: There's consensus that more testing is required, especially under conditions with all subnets subscribed. The current test data may not reflect real-world performance.
Child Processes for Performance: One participant highlights their experience using detached child processes in another project to maximize hardware performance. They emphasize the importance of using detached child processes without IPC connection for full-core utilization. They also mention using a third-party serialization library.
Action Points: More testing is needed to gather data about performance benefits. An exploration of the differences between worker threads and fork processes is essential. Sharing of relevant implementation details is awaited. The conversation emphasizes the importance of deep understanding, thorough testing, and making informed decisions for optimal system performance.
NC:
- Met with the Prysm team for the ePBS. Progress is slow but tasks are being divided, and weekly meetings are established.
- Started exploring the 6110 implementation on Lodestar. Recognized a dependency on the pubkey cache.
- Refactoring needed for pubkey cache integration.
- Current focus is on the PTC design.
Lion:
- Worked extensively on Whisk, discussing optimizations and security aspects.
- Identified a potential optimization to reduce state size increase.
- Furthered work on the devnet and addressed 'big boy' issues.
Nazar:
- Developed an EL provider proxy which assigns 100 ETH to any connected account.
- Discovered that the prover wasn't working correctly due to Web3.js version 4.x's different RPC implementation.
- A PR is in progress to make the provider compatible with Web3.js version 4.x.
- Working on documentation and addressing an issue related to hiding simulation tests.
Gajinder:
- Worked on Verkle and successfully read local genesis after type adjustments.
- Identified issues while attempting a sync with lighthouse.
- Finished a PR about fee recipient and conducted mock tests.
- Assisted EF developers in using Lodestar as a boot node.
- Plans to address interop issues and continue working on syncing the verkle testnet.
Nico:
- Investigated worker threads vs. Child process for performance benefits.
- Looked into boot node maintenance.
- Addressed an issue about enabling/disabling doppelganger protection.
- Plans include further research on state cache and reviewing code.
Matt:
- Addressed a bug in Ansible, updated dependencies.
- Investigated a memory leak issue.
- Updated BLST code in Lodestar. Noticed a 40% reduction in CPU usage.
- Investigated new space and semi-space.
- Aims to finalize ongoing work, address feedback from Ben, and work on deduplicate payloads.
Tuyen:
- Addressed an issue where Lodestar had more than the maximum peers due to only counting inbound connections.
- Raised an issue with Nethermind syncing.
- Worked on updating protobufs to protons in gossipsub.
- Investigated a memory leak.
- Plans to finalize the index Gossip queue and study Lighthouse's method of maintaining zero historical state.
Cayman:
- Worked with Alex from the libp2p team on the varint library.
- Discussed strategies for using varint across various libraries and how to consolidate to a single implementation.
- Achievements include: 10x improvement in decoding speed. 5x improvement in encoding speed.
- The varint library accounts for only 3% to 5% of total CPU time, but Cayman believes there's potential for further optimization.
- Cayman and Nazar examined the kind of JavaScript produced through TypeScript.
- They decided to switch from ES 2019 to ES 2021 output.
- This change addressed issues like the inefficient output for nullish coalescing.
- Cayman observed that their current max mesh peer count was set to 9. The spec recommends a count of 12. They had previously reduced it from 12 to 9 due to performance issues.Cayman believes it's worth re-evaluating a return to 12, but emphasized the need for thorough testing before merging.
- Plan to update to the latest version of LibP2P.
- This update is crucial to incorporate all modifications and fixes in the Gossipsub library.
- The latest LibP2P version is also necessary for testing YAMUX, which Cayman intends to resume.
Transcript: https://pastebin.com/1LkydEhH
Block production times: There have been issues recently with missed blocks on the mainnet. This is due to delays at the validator client side, particularly in pulling for proposer duties at the start of an epoch. To address this, Tuyen has a proposal (PR 5409) to pull proposer duties earlier. We should also look into why there was a huge delay (14s) in getting an execution block on missing slot 6940832.
Tuyen's PRs: Tuyen presented two PRs during the meeting. The first PR was designed to address the delay on the validator client side by polling for proposal duties one second in advance. The second PR was still under investigation and aimed to reduce the delay in producing the phase 0 beacon block body.
Long epoch transition time: There was a concern about the length of the epoch transition time, with it sometimes taking more than three seconds. This was not deemed a blocker but still a significant issue to look into.
Network thread status: One of the main challenges with the network thread was event loop lag. There were ongoing efforts to address this through metrics and exploring the message queue between the worker and the main thread. The meeting also discussed the addition of new BLS APIs and their consumption, the reduction of IO traffic related to the thread, and other measures to improve network thread stability.
Release planning: The team considered the readiness of the next release, including whether to cut a release immediately or wait to fix ongoing issues. Potential solutions included creating a hotfix for some of the recently merged PRs or cutting a scaled-down 1.10 release. The team also discussed dependency issues with node 20 and cross-fetch, and potential ways to resolve this.
Cross-fetch and Node-fetch dependencies: The meeting closed with a discussion on cross-fetch and node-fetch dependencies. The team agreed to take this offline and figure out a solution, which may include downgrading cross-fetch to resolve issues related to connection close headers.
Code coverage tests: The team discussed a pending code coverage test (PR 5225) and decided to merge it, even though it was not in immediate use. The test was self-contained and could be deleted if desired.
Matt:
- NodeFetch issue: There's an issue with the NodeFetch upgrade to version 20. The problem arises due to a conflict between an existing bug in Node.js and the bug in NodeFetch, which pertains to a "close connection" header addition that conflicted with the "keep-alive" option. This issue originally arose with Node 8, got resolved by Node 12, but has resurfaced. It's a structural problem related to how the socket, agent, and readable stream interact within Node. Matt contributed information to the ticket addressing this issue and contacted the developer who was supposed to submit a PR for it. He offered help and suggested possible fixes, but the developer said he had already prepared a PR, it just hadn't been submitted yet. Matt indicated the issue is complicated and will require time to resolve.
- NodeFetch Header Issue: This problem was caused by a header addition in NodeFetch, imported via CrossFetch. However, a PR has been merged that removes this header as default, reverting back to the Node agent's behavior, which should prevent sockets from auto-closing. While this has been tested on the NodeFetch side, Matt hasn't personally tested it yet.
- PR for DU Command: Matt submitted a PR (Pull Request) for the du (disk usage) command, which had failed in a unit test after a computer restart.
- Network Worker Message Latency: Matt submitted another PR to capture metrics on network worker message latency. He intends to discuss these metrics with Ben.
- Run Micro Task Function: Following a question from Tuyen, Matt plans to examine how to break up the run micro task function to better schedule it and improve network performance.
- Set Timeout vs. Set Immediate: In response to a question from Nico, Matt will look into the strategies of using scheduling methods, such as set timeout and set immediate.
- BLST work: Matt is close to finishing his work on the BLST project, which has shown promise in stabilizing the network by freeing up the main thread to process other tasks. He specifically mentioned reducing the need to serialize and deserialize keys for state transition validations, which he expects will conserve resources and enhance stability. He intends to focus on this during the week, assuming everyone on the team is agreeable.
- Follow Up with Ben: Matt plans to follow up with Ben regarding the metrics collected and discuss the progress made on the issues he's working on.
Cayman:
- P2P Protocol Update: Cayman got a minor PR merged in the P2P that allows for manually dialing the identify protocol. This could potentially improve the identification of peers and client versions in Lodestar, reducing instances of encountering "unknown" peers.
- Closing Old PRs: Cayman has been working on closing out old PRs in the queue and plans to continue this effort throughout the week. Specifically, he mentioned the PR concerning the discv5 using vanilla events.
- Multi-fork Types PR: Cayman expressed interest in revisiting a PR regarding multi-fork types, which was previously blocked by a type error. He believes improving the organization of types will be beneficial as more forks are introduced.
Nico:
- Network Worker Issue: Nico spent time investigating issues raised last week, particularly one involving a hanging process. This was discovered to be unrelated to any IPv6 updates, instead, it was found to be an issue with the network worker.
- Metrics Configuration: He discovered that metrics were not configured to listen on localhost. He has since resolved this issue with a PR.
- Simulation Tests: Nico delved into investigating why simulation tests were hanging on a particular PR where the order of shutting down the peer manager was altered. This led to the discovery of numerous "cannot set header" errors. Upon further investigation, he discovered that this was due to a race condition in closing the event stream, which occasionally resulted in the event stream still receiving emitted events even after it was closed or was no longer writable. He has now fixed this issue.
- Node Health API PR: Nico aims to finalize another open PR regarding the node health API. He intends to implement a good approach suggested by NASA, which is designed to improve the current system.
- Region Strategy & State Caching: He aims to make progress in reviewing and possibly improving the current region strategy and state caching system, by studying strategies used by other clients. He plans to discuss this further with Line, as he needs to understand some points better before making decisions on potential improvements.
Nazar:
- Prover Package Issue: Nazar had been facing difficulties using the prover package in the React application due to problems with the package's conditional exports (a mechanism by which building tools like the TypeScript compiler or Webpack can detect the runtime environment and switch import paths accordingly). This issue arose because these conditional exports were not standardized or properly utilized by most libraries.
- Fix for Conditional Exports: After facing challenges with the above issue, Nazar made changes to make the conditional exports work for webpack. However, a bug surfaced in a package in their repo that was used to lint readme files because it was only detecting one level of conditional exports and not nested ones as webpack could.
- Named Export Solution: As a solution, Nazar used named export for the browser when using the prover. This method is more streamlined in all building tools.
- Beacon Node Shutdown Issue: There was a problem with an error message being displayed inaccurately when a beacon node was shut down. The error message indicated that execution had gone offline, while in reality, the execution was still there, but the node was shut down. This issue was due to an abort error being detected as a communication error between the execution layer and the beacon layer. Nazar has opened a PR to address this and is currently writing tests for it.
- Logical Error in Prover Implementation: Nazar discovered a logical error in the prover's implementation when there weren't enough finalized blocks. If the prover was initialized and there was only one finalized block at that time, this error limited fetching some payloads. He plans to open a PR to address this.
- Upcoming PRs: Nazar mentioned that he is preparing three PRs which he expects to release either today or tomorrow, including the one addressing the logical error he found in the prover implementation, and the one addressing the beacon node shutdown issue. The third PR is expected to be for the React application, which is almost done and was held up due to the logical error found earlier.
Tuyen:
- New BLS API: Tuyen completed the new BLS API and will now start working on the index. A PR is expected by tomorrow.
- Proposal Duties and Subnet Subscriptions: Tuyen submitted two PRs. The first one is to handle proposal duties before the next epoch, and the second one is to avoid subscribing to too many subnets. Tuyen noted that when they joined a sync committee, there were around 50 long-lived subnets on average, leading to a considerable increase in bandwidth usage due to 120K message IDs received in the IHAVE gossipsub. This led to significant IO lag. Tuyen has proposed a solution to restrict the subscription to six subnet peers to manage this.
- Subscriptions to Short-lived Subnets: The next task Tuyen plans to work on is to avoid subscribing to short-lived subnets too early. This early subscription leads to an increase in bandwidth usage. Instead, Tuyen is looking to subscribe just some slots in advance of an hour later duty in the next epoch.
- ChaCha-Poly Update: Tuyen mentioned that the "noble guys" have a new ChaCha-Poly update, which will now support the destination as an optional parameter. Tuyen plans to run a performance test to see if this is better than their current assembly script, and if so, they may switch to it.
Lion:
- Processing Attestations: After a conversation with Terence regarding how long it takes Lodestar to process all the attestations in the aggregate moment, Lion admitted that they currently don't do it. Lion spent significant time trying to understand the extent of this problem and created a new dashboard called "Lodestar Good Behaviour" to monitor things that do not directly affect Lodestar but affect others.
- Network Impact: Lion expressed concern that Lodestar is growing while potentially being a detriment to the network. One of the problems Lion highlighted is that Lodestar tends to drop messages, creating a situation where messages that are propagated through the network don't get through. If Lodestar had a significant share of the network, this could potentially have catastrophic effects. However, at the current rate, the redundancies in the network minimize this impact.
- Coordination with Tuyen and Strategy Shift: Lion is coordinating with Tuyen to address this issue. They're considering a more radical approach: if they're not processing a distinguish (an important element) in time, they might as well not do it at all. Lion proposed the idea of potentially turning off their aggregator completely, focusing instead on making Lodestar more performant. Lion sees this as a compromise to give them time to address the overload in Lodestar while they develop a more permanent solution such as networking threads.
N.C.:
- ePBS Project: N.C. has started working on the ePBS (Enshrined Proposer-Builder Separation) project last Friday.
- Collaboration with Prysm and Lion: Terence has invited N.C. and Lion to join the ePBS discussion on the Prysm Discord, and it seems like future discussions on ePBS will occur there. An initial meeting with Terence, Lion, and a few individuals from Prysm has been set up for the following Wednesday.
- Draft P2P Spec on ePBS: Terence has posted his first draft on the P2P (peer-to-peer) spec on ePBS, which N.C. still needs to review.
- Learning Goals for the Coming Week: N.C. aims to familiarize himself with P2P, particularly with libP2P and gossip sub-protocols, to understand better what Terence is doing with the P2P side of ePBS.
- Project Documentation: Over the next two weeks, N.C. plans to create project documentation for ePBS to formalize the project. The intention is to set objectives, goals, and break the project into phases for better organization and progress tracking.
- Current Focus: N.C. mentioned that their current focus is on the P2P aspects of ePBS, and more in-depth discussion on other areas is yet to occur.
Gajinder:
- PRs for forkChoiceUpdate v3: Gajinder worked on creating pull requests for forkChoiceUpdate v3 for DevNet 8.
- Broadcast Validation PR: He also addressed the concerns raised by Cayman and Lion on a broadcast validation pull request.
- Syncing Constantine, the Verkle TestNet: Gajinder experienced issues with loading the genesis while trying to sync Constantine, a Verkle TestNet. After spending a significant amount of time debugging, he discovered a discrepancy regarding the payload header. It now has an execution witness header, as opposed to an execution witness. He plans to address this change and attempt to run the network again.
- Discussions and PRs on Consensus Specs for publishBlockV3: Gajinder also engaged in discussions and raised pull requests about publishBlockV3 on consensus specs. It seems the current process of builder vs execution race will need to be moved to the beacon, as opposed to the validator which is the current practice. The reason is that the current API format assumes this race and selection are happening in the beacon.
- Parent Beacon Block Header PR: Lastly, he discussed a PR about the parent beacon block header on consensus specs. While the execution layer (EL) team was in favor of the PR, the consensus layer (CL) team was not. This issue will most likely be resolved in a meeting scheduled for the next day.
Transcript: https://pastebin.com/A6K4RLX2
During the July 18th stand-up, the team introduced a new member, NC, who will be working as a freelance contributor on enshrined PBS projects. NC has been in the Ethereum space for approximately 8-10 months and has previously contributed to Lighthouse and Besu on the execution layer side. He is now working on the ePBS project with the Lodestar team.
The team also discussed scheduling a demo for the prover. Nazar, who is working on the prover, proposed the following Wednesday for the demo, but since not everyone may be available, they decided to schedule it when the whole team can attend. A follow-up was made to ensure everyone got the invitation.
The team pushed a hotfix release v1.9.2 to their CIP fleet, which included several issues related to reducing the race time. Team members had a chance to observe how their CIP nodes have been performing in the last 12 hours, and they considered whether the hotfix was suitable for release. They didn't see any significant performance changes, but they noticed irregularities with the attestation subnet count of their peers. They decided to proceed with the release and continue monitoring the metrics, particularly block production, on Lido nodes after upgrading them to 1.9.2.
They also discussed scheduling a Grafana education session next week to refresh their knowledge and understanding of Grafana. This session could include topics like understanding and reading heat maps and line graphs, the significance of different metrics, and PromQL.
They discussed fixing bugs that Nico filed over the weekend before cutting a release candidate. They also talked about a potential issue with the upgrade of libp2p which resulted in consistently having more than 55 peers. They agreed to continue monitoring this situation to see if it presents an issue.
There is a proposal to extend the block deadline in the slot and compress the attestation and aggregate sections of the slot. Currently, the team drops a majority of attestations upon subscribing to a subnet. The aim is to become a better network participant.
With the subnet refactor merged, it's hoped that the team can reduce the traffic substantially. They now only subscribe to two subnets, a change from previous approach of subscribing one subnet per validator. This change should reduce bandwidth significantly, which can be tested in the upcoming 1.10 release.
For a successful trial of the subnet refactor, the team is considering increasing the peer count to 100. However, they feel there are not enough mainnet nodes currently to support this. The team realized the lack of a feat-3
mainnet node in their infrastructure. They plan to rectify this to ensure better testing and deployment. The team plans to release v1.9.2 and start deploying it to Lido nodes after the call. The issues raised by Nico over the weekend will be prioritized, and once resolved, a potential release candidate will be thrown into beta for data collection.
Transcript: https://pastebin.com/U1xS8PqV
Nazar finished creating the Lodestar test utils package and the end-to-end test for the prover. He's now working on a feature to track the execution engine status without dependency on the ETH_ namespace.
There is a request to fill out a Protocol Guild survey for the Protocol Guild members. There's also a proposal to change some significant eligibility requirements, and the team was asked to review it.
There's an ongoing issue regarding the deployment of Lighthouse and Prysm nodes. There's a problem with downscoring due to lack of backfill enabled. The proposed solution is to sync Lighthouse from genesis, but this would significantly affect deployment speed.
There's an issue with memory leak monitoring that needs to be addressed.
Matt provided an update on the network thread investigation. The latency issue is being caused by page faults at the kernel level due to increased RSS from using the worker thread. He is currently researching more on this to find a resolution.
They discussed planning for version 1.10 with a focus on performance upgrades. They proposed to have a beta version for testing by the end of the week and then look at the results by the next standup.
The team agreed to target the end of the week for cutting the release candidate. They also want to merge a couple of things like the subnet stuff Tuyen is working on and close the Peer Manager issue #5746. Cayman's IPV6 was also merged, and they need to upgrade discv5. They'll continue the release planning on Discord.
Protocol Berg, the upcoming conference, has received a large number of high-quality applications, making it challenging for the organizers to select the final lineup. Despite having limited spots, they are considering extending the event to two days to accommodate more speakers and topics. The organizer encourages the team members to attend, and if they are planning to come to Berlin as a team, the organizer can help arrange for a workshop room in a coworking space for them. This would offer the team an opportunity for onsite work or a subset of team meetings.
Gajinder:
- Gajinder is working on the integration of the Verkle trie with the Shanghai testnet, aiming to have it ready for the next launch of the testnet. While there were discussions on the gas cost concerning the transition during the second half of the Verkle call, it doesn't impact the CL side significantly.
- Gajinder has been keeping up with the developments in devnet-7. He created a PR for attestation validations updates in the devnet. After some discussions and input from Lion, Gajinder is planning to implement gossip validations per spec. There's a clarification that the transition to the Deneb validations will be based on the current slot and not the attestation slot.
- Besides this, Gajinder has been reviewing the BLST node API update and the second PR by Matthewkeil. He plans to run Lion's current Verkle branch on the existing Verkle testnet, Constantinople, in the coming week. This will help decide the next steps.
Nico
- Investigated a user-reported problem concerning the Lodestar API package being used to extract the state from the Beacon node and calculate the tree root. Nico determined that the issue was not in Lodestar itself but rather stemmed from a problem with Prysm's state API returning an invalid response. In the process, he also discovered a confirmed bug in Lighthouse's new archive implementation, which yields a different result when calculating the hash tree root from the state than the value in the block.
- Nico revisited an unresolved problem concerning the Beacon node failing to shut down properly. He has a potential fix for this and has submitted a PR, but he's still testing it and would like someone else to review it.
- He has been trying to diagnose an issue where some users report that their Beacon node takes an extended time to find peers. Nico suspects the problem might be related to range sync, as there are frequent disconnections when performing block range sync network requests.
- In addition to troubleshooting, Nico has been working to understand how 'regen' works, focusing on the components that trigger and consume it to understand how it all works together and where improvements might be made.
Tuyen
- He has rebased a PR related to subscribing to two subnets per node. A new flag,
--deterministicLongLivedAttnets
has been added. By default, it's false. When set to true, a node will subscribe to exactly two subnets based on the node ID, which changes per 256 epochs. This flag reduces traffic because the node subscribes to just two subnets rather than subscribing to random subnets based on the connected validators. He plans to include this in version 1.10. - Tuyen has been working on verifying signature sets with the same signing root. He has been discussing this with Matthewkeil, and it's expected that Lion will review the PR soon.
- He has also been working on multi-address support, specifically on catching the path in the constructor when a string is received. He has received some comments from Nazar and will address them soon.
- Next, Tuyen plans to work on prioritizing signature sets from the API.
Matt
- Matt has expressed appreciation for the work done by Tuyen on the multi-signature PR and Lion for updating the dashboards. He suggests that making the dashboards easier to understand and better organized should be a focus for next quarter.
- Matt's progress this week was not as far as he anticipated on the blinded and non-blinded blocks. The work was bigger than expected, affecting various sections of code, including Regen, backfill, the API, and two repositories. Though these are mostly built out, some testing still needs to be done.
- Matt's focus was shared with supporting Ben, which took a significant amount of his time. He also spent some time working with Gajinder and celebrated the approval of their first PR. The second piece is actively being reviewed, and Matt hopes it can be incorporated soon as he believes it will bring significant improvements.
Cayman
- Cayman has been involved in discussions about converting the multi formats library to TypeScript. The current maintainers prefer JavaScript, so a meeting is being arranged to resolve the issue.
- A user-reported bug introduced with the new libp2p, where objects that don't conform to their types were being emitted, has been fixed. Cayman has a PR open for it, which he probably should have already merged.
- Cayman has been working on fixing the end-to-end tests for the node 20 update as some of the error types being thrown differ between node 18 and node 20.
- He has merged the discv5 IPV6 support and is working on integrating it locally. He plans to push this integration to Lodestar for version 1.10. He intends to finalize these two tasks this week.
Lion
- Lion has opened a PR for testing Whisk and the spec is now executable. He is currently testing the Proof of Concept (POC) that was rewritten in Rust. If this can be run on a testnet successfully, they will then move forward to tackle politics and future steps.
- The library that Whisk uses was originally written in Rust, which is why the original POC was also in Rust. A Python version for the specs existed but was very slow, so Lion shifted to a faster crypto backend. Interestingly, an unidentified individual has now written the entire library in Go, opening up the possibility for a Go implementation. At some point, they will need to take the Rust version, change the backend to BLST, and establish some bindings. However, this work won't be undertaken until there is tentative inclusion of the feature somewhere.
- In addition, Lion and his team have continued with the Max EV proposal. They believe they have found a way to handle execution layer partial withdrawals, which was identified as a necessary task during previous discussions. They have moved away from their initial designs, which were deemed unappealing, towards a solution they are happy with.
Transcript: https://pastebin.com/rhniTqBQ
A proposal for a patch release was made, aiming to include PRs 5714 and 5708, which address issues related to logs from duplicate blocks and syncing logs. There was a discussion on whether the patch should include anything else.
It was suggested to include Gnosis, a fix for a bug in metrics (5715), and a fix for the beacon node not shutting down in certain cases (5716). However, there was a concern about the potential risk of this latter fix, but ultimately it was decided to consider it closer to when the PR gets merged.
There was discussion on including Node 20 in their work. It was reported that Node 20 was able to process more attestations and had a more efficient metric till becoming head. However, there were concerns about the garbage collection pause time rate.
An update was given on implementing deterministic long-lived subnets, which would reduce the subnet mesh peers, thus reducing the I/O lag issue. This was highlighted as particularly beneficial for home stakers.
There were notes provided by Tuyen about his work on deterministic long lived attestation nets (5704). Main change is to always connect to exactly 2 subnets per node instead of based on number of validators, this reduced subnet mesh peers a lot, hence the I/O lag issue.
Gajinder:
- Addressed a previous issue with DevNet 6 where Lodestar was unable to sync blobs by range, suspecting that it was an issue with Lighthouse. However, similar issues were noticed with other clients too. Upon investigating, he found that the 'count' value wasn't being multiplied by 'blocks per slot', causing a mismatch. This problem has now been resolved with a PR.
- Even after this issue was resolved, the system was not syncing to the head. Upon further investigation, Gajinder found that the chain wasn't finalizing because only a few nodes were up. Consequently, after syncing about 11 to 15 thousand slots, the system would stall. It was noted that every time a new peer was added, syncing would start from the last finalized epoch, causing repeated attempts to sync from the same point every time a new peer connected.
- Gajinder proposed a potential solution to address the issue of non-finalizing chains over many slots. The suggestion is to update the sync process so that a new peer can join the chain that's already synced, rather than always starting from the last finalized point.
- The issues with DevNet 6 have been resolved, and it is set to be relaunched as DevNet 7.
- The previous DevNet 7 is now DevNet 8 and is scheduled for launch in two weeks. Two PRs for DevNet 8 are already in, with Gajinder working on an additional PR.
- Gajinder mentioned his involvement in a PEEP and EEP presentation for direct changes. He anticipates the recording of this presentation will be available soon.
Nico
- Investigating the issue of false positives in the doppelganger protection mechanism. Nico found that the current implementation is similar to that of Lighthouse, where attestations in blocks can occur in the next epoch. This is what triggers false positives. One possible resolution could be to increase the wait epoch time by one more epoch, but this might negatively impact user experience.
- Nico is exploring the possibility of implementing zero downtime doppelganger protection. This could be done by checking the local attestations produced by the client. If there's an attestation in the previous epoch, doppelganger protection could be skipped for that validator. However, there are some potential downstream issues with making the registration of signers async, which would need to be addressed.
- Nico pointed out that his approach might even improve security because two validator clients cannot connect to one database. This would prevent two validator clients from starting attestations at the same time, which is a scenario in which the current doppelganger protection would fail.
- He also spent some time reviewing the latest beacon API spec and created issues based on his findings. Nico investigated why the spec tests are failing.
- His focus for the coming week will be mainly on the topic of regen.
Nazar
- Nazar has been working on a PR that aims to finalize end-to-end test cases for the prover package and introduce a new package called 'testutils'. This new package is designed to consolidate code used for testing that was previously scattered across various packages. The team is encouraged to incorporate any useful testing elements into the 'testutils' package.
- After the end-to-end testing for the prover is finalized, it will be made public and the first release of the prover package will be done.
- Nazar plans to incorporate this first release of the prover package into the light client demo. This will help to reduce the amount of code in the light client demo, make troubleshooting easier, and demonstrate the practical use of the prover package.
- He will then work on creating a MetaMask snap as a proof of concept for the prover. This will help initiate discussions with the MetaMask team on whether it is the right approach for production to use MetaMask snaps or whether it should be integrated into MetaMask itself. This is Nazar's primary task for the week.
Matt
- Matt has finalized the blst package and resolved the associated bug. The new format of the package has been reviewed by Gajinder and has been successfully deployed to Feature 2. Metrics from the deployment indicate that everything is working well.
- The worker was activated on Feature 2 and this further improved the metrics, which Matt views as a positive outcome.
- The blst package has also been deployed on mainnet Feature 2, which Tuyen was using. This will allow the team to observe the package's performance in a mainnet environment.
- Matt has taken up the task of addressing a database duplication issue. This involves digging into the state and SSZ (Simple Serialize), which has turned out to be a slightly tricky problem. However, he suggests that progress is smooth.
- In the coming week, Matt plans to conduct another round of review with Gajinder on the next part of blst, finalize the associated PR, and potentially pick up the next task. He notes that this might be a task that involves "similar state stuff" and is akin to what Lighthouse is doing.
Transcript: https://hackmd.io/@philknows/Bk0tuWedn
Version 1.9 Observations: Tuyen reported on the observations regarding version 1.9. There were no sync issues, mesh peers are somewhat less stable but similar to version 1.8, more attestations processed with less dropped, gossip attestation process time increased, CPU usage increased, REST API time increased, and missed attestations remained the same. The team discussed if the increase in event loop lag for the beta node is a concern. It was also observed that Goerli testnet is becoming less reliable for gauging performance and metrics.
Decisions on RC3 as a Release Candidate: The team agreed that the observations were acceptable and there was nothing to prevent RC3 from being a potential release. They decided to continue running it for another 36 hours to observe if anything changes drastically. There was also a discussion regarding relying more on the CIP nodes for final-stage testing and potentially splitting the nodes further to simulate different setups (e.g., home stakers).
Worker Thread and Network Thread: Tuyen mentioned that there is an improvement in the worker thread due to the inclusion of more sleep zero and that it might be ready to be enabled in the unstable release. There were also discussions on whether to wait for version 1.10 to include several large upgrades (like libp2p upgrade) or to have a patch release.
Engaging Libuv Maintainer for Network Thread Issues: Approval has been granted to re-engage the maintainer of libuv for help with network thread issues. The budget approved is up to $10,000. Lion was identified as the person who will compile questions for the libuv maintainer and continue communications with him.
Planning for Version 1.10: The team discussed possible inclusions for version 1.10 which include enabling the network worker thread by default, libp2p update, supporting Node 20, and as a nice-to-have, including Yamux.
Additional Testing Setup for Solo Stakers: It was suggested to have a beacon node running with a single validator to simulate a home staker setup, with more modest hardware, in order to have more realistic testing for such users.
Transitioning from Goerli to Holesky Testnet: Due to Goerli becoming less reliable, there was a suggestion to move testing to a new, larger testnet launching in September (Holesky). The team discussed the importance of testing in an environment that has a large number of validators in the active validator set and how many validators are connected to the beacon node.
Matt
- Completed a small PR that involved spell checking for all of the documentation and Readmes.
- Worked on cache checking. He wrote documentation on how to look for cache hits and misses. After verifying, he found that there is only a 3% extra cache miss on the network thread, which he considers not to be a significant difference. Hence, he believes it’s not necessary to delve deeper to find out the level of the misses.
- Faced a segmentation fault issue. After investigation, he discovered that the issue was due to keys moving during garbage collection when he started to bundle all of the aggregates and attestations. He realized that he had been looking in the wrong place, and it wasn’t about mixing old and new keys.
- To resolve the segmentation fault, Matt plans to refactor the relevant function and write unit tests in Blast to ensure stability by triggering the garbage collector.
Cayman
- Identified and patched a bug in Gossip Sub last week.
- Has been testing the new libp2p library with various patches to achieve stability or equivalent performance. However, he hasn't reached that point yet. He conducted tests on feature one the previous week.
- Has several Pull Requests (PRs) that are ready to be merged, but he has been holding off on merging them until version 1.9 is stable. He doesn't want to affect performance or introduce any risks, so he is leaving them open for now.
- His primary focus for the week is working on getting Lodestar Node 20 ready.
- As a secondary objective, he will continue working on the libp2p library.
Tuyen
- Took over a PR (Pull Request) from Lion to eliminate the serialization of blocks after they are fetched from request responses on gossip, which resulted in some minor improvements in gossip.
- Tuyen’s next task involves working on the integration of yamux. He also mentions that if there are any specific things needed in libP2P, he is available to assist, otherwise, he will focus on yamux.
- The integration of yamux is expected to be straightforward and can be used in place of Mplex. However, there might be some considerations regarding the compatibility of versions.
- Tuyen recalls that the code changes needed for yamux integration are small, but he notes that the performance is not as good as with Mplex.
Gajinder
- Reported issues with DevNet 6, which led to it becoming non-functional. The nodes were not syncing correctly, particularly with the Lighthouse client. Lighthouse had problems serving the blobs correctly.
- Additionally, Geth had issues with block proposals. There was also a problem with how one of the Ethereum (EL) clients included block transactions with blobs, causing the testnet to break. Because of skewed validator allocations, DevNet 6 is deemed non-recoverable, and work will shift to DevNet 7, which will focus on version 4844.
- Gajinder reviewed the BLST code from a PR that Matt worked on, and gave feedback. He is waiting for Matt to update the PR based on the comments.
- He cleaned up his own PR called "Free the Blobs" and rebased it with the latest changes. Most of the work is done, but a few critical pieces are missing, which he plans to push during the week.
- He submitted a PR for fixing the proposal flow for DVT validators, making sure that local execution engine blocks are not produced or used against the blocks received from the relay.
- For the upcoming week, he plans to work on Deneb-focused PRs, including including beacon block root in the execution payloads so that proofs against beacon state can be done in contracts in the Ethereum Virtual Machine (EVM). He also aims to align the EIP to make voluntary exits non-expirable.
- He mentioned an important change regarding deposit snapshots for WSS (Weak Subjectivity Sync). Currently, when doing a WSS sync, the execution client has to backfill all logs to provide the deposit tree to the beacon client. With deposit snapshots, this will no longer be necessary. The execution client will receive a deposit snapshot tree from the CL (Consensus Layer) so it won’t have to sync all history since the deposit contract was deployed.
- Gajinder will also work on Lion’s PR related to metric proposals.
Nico
- Nico addressed a problem related to the beacon node not shutting down, which was linked to a libp2p issue. Cayman identified that it was due to an update in a sub-library of libp2p. Nico plans to further debug this once Cayman's fix is implemented.
- He fixed minor API-level issues, including one that was identified during DevNet 6, and added end-to-end tests for them. He mentioned that he found these tests efficient and thinks they can serve as good sanity checks. He’s considering adding more end-to-end tests for other APIs.
- Nico added a feature that allows users to force a checkpoint sync, which is mostly useful for development purposes. This feature can also help users who have been offline for an extended period and are facing lengthy sync times.
- He aims to run all other Validator Clients (VCs) against their beacon node to check for compliance and to identify any issues.
- Nico observed a potential issue with doppelganger protection, where it produced a false positive. He's uncertain if this can be prevented and plans to investigate further. There should be no false positives according to Lion.
- He intends to start working on the state and region topic either this week or early next week.
Lion
- Lion has been involved in a bunch of spec initiatives. Out of scope for meeting notes, but transcripts provide context and discussions can be viewed here: https://ethresear.ch/t/increase-the-max-effective-balance-a-modest-proposal/15801
Transcript: https://pastebin.com/6FUM4PbH
There was an observation that the network was stable except for some mesh peers that were dropped. A branch was deployed previously to test batch delete and it appeared stable based on the 7-day chart. It was agreed to monitor the network for any additional issues. The team discussed deploying on some mainnet validators, particularly the CIP validators, to obtain better metrics. The suggestion was agreed upon as it could provide useful data to the network. The consensus was that the risks were low and the validators could be a good resource. The team agreed that the priority was pushing out version 1.9 RC2. Several people, including larger node operators and relayers, are awaiting the release.
Performance Based on Keys: The conversation began with an observation that nodes with a higher number of keys seem to experience issues, while nodes with fewer keys are performing relatively well. For instance, CIP validators with a lower number of keys appear to be more effective compared to Lido nodes, which have around 200 keys per beacon node.
Network Thread Enablement: The team discussed whether the network thread should be enabled by default. The consensus was positive, as enabling the network thread seems to significantly improve performance, especially for nodes connected to more subnets. The network thread allows for the processing of more messages, and the team believes that this is critical for being a good network participant. Despite the benefits, there were concerns about network threads getting overloaded. The reason why the network thread is clogged compared to the main thread is not clear. It was hypothesized that spinning up a second isolate is creating overhead and that context switching at the CPU level might be the issue. The network thread introduces a thread for the first time, which is different from the main process.
Self-regulation of Network Thread: The team pondered whether the network thread could self-regulate itself not to choke. They considered reintroducing a mechanism to drop messages to reduce the load if the thread detects that it is overloaded.
Backpressure and Yamux: Backpressure, which regulates data flow, was mentioned as a key issue. There was a mention of Yamux, a stream multiplexer, which has built-in backpressure. However, it was noted that Yamux was previously blocked due to memory leak issues.
Upgrading LibP2P: The team discussed the necessity of upgrading to the latest version of libP2P (0.45). We should prioritize this after releasing v1.9.0. This upgrade would provide several fixes in the TCP library and improvements in Gossip Sub. It would also pave the way for retesting Yamux. Future versions of libP2P might replace the underlying implementation of the streams with WebWG streams, which promise better performance and built-in backpressure.
Async Iterables and Buffering Strategy: The team talked about async iterables and buffering strategies. They considered whether they could eliminate abort sources with the current design. The conversation also touched on the implementation of streams using async iterables to avoid additional memory copies when dealing with binary data.
Event Loop Lag and Micro Queue Tasks: The team identified that there was event loop lag and micro-queue tasks might be causing the network thread to underperform. The hypothesis is that if the micro task queue is clogged, the data from the sockets could be loading into L2 or L3 cache, causing delays. However, the team wasn't sure how to test this hypothesis.
Performance Testing: Finally, the team expressed the need for libp2p performance tests to be conducted to investigate the hypothesis that their stack might be slow. This would help them to understand if their observations are due to inherent issues with the stack or if there are other factors at play.
Gajinder
- With the launch of pre-DevNet 6, they increased the number of blobs (data packets) that can be utilized in the network from one to six per block.
- However, this change caused some issues which Gajinder is debugging. A fix has been generated, and they are working on synchronizing back to Devnet6 to stabilize the network.
- One issue emerged when Gajinder sent 500 transactions with 500 blobs to the network. While each block could handle six blobs, the EL clients started facing issues.
- One particular problem was with Lodestar which, due to a typo, was sending all six blobs together instead of processing them one by one. A PR (pull request) has been generated to resolve this issue.
- Gajinder mentions that the current network is running in a single data center, so there haven't been any network latency issues, and he doesn't expect any such issues in the future.
Matt
- He has successfully integrated BLST (BLS signature library) which is now handling attestations, aggregates, and proofs. He has created a draft pull request and aims to deploy it.
- He plans to work remotely and attempt to deploy the BLST version to a feature node to gather metrics. He hopes that this deployment will alleviate some of the load from the main CPU and possibly help with other issues they are experiencing.
- Matt has the second piece ready for Gajinder but understands that Gajinder has been occupied with getting the next step of BLST approved.
Cayman
- Cayman has been working on the libp2p branch and plans to push any fixes or the latest updates to it.
- Cayman has been working on getting their system ready for Node 20.
- He has an open PR (pull request) for simplifying the snappy frame decompression by replacing some old libraries with a simpler solution.
- Additionally, he is updating Snappy, a native library they are using, to the latest version that is compatible with Node 20.
- Cayman shares an interesting technique he learned called branded types. It's a method for creating unique types (nominal types) in programming, which can help in distinguishing between similar data types. For example, distinguishing between a regular string and a special ID that is also in string format. This can be helpful in avoiding mistakes where a simple string is used where an ID should be used, as it requires explicit typecasting. Cayman has written a comment on this and provides an example in a library for anyone interested in learning more about branded types.
Tuyen
- Tuyen investigated an issue regarding external memory in version 1.9.0. He has implemented a fix related to batch deletion, which seems to resolve the issue.
- Tuyen investigated the network thread and identified some minor optimizations in gossipsub.
- He found that not converting the PID (process identifier) when calling certain functions could potentially save around 4% of CPU time.
- However, these optimizations are not the root cause of the network thread issue and Tuyen plans to continue the investigation.
Nico
- Nico made the thread pool used for decrypting key sources reusable, improved error handling, and fixed issues related to terminating the decryption process. Before these fixes, the decryption process could not be terminated without forcefully closing the process.
- Nico also submitted a Pull Request (PR) to integrate these improvements into the key manager API.
- Nico observed that, in some instances, the beacon node does not exit cleanly after running for an extended period. The process continues to run, and Nico has not yet been able to identify the cause or the handler that keeps it active. He mentioned that this issue seems random and is hard to test.
- Nico is considering adding an explicit process exit once the beacon node is closed as a solution, although he prefers to avoid it if possible. He mentioned this issue occurs on Linux and has also been observed in Docker.
Nazar:
- Nazar worked on incorporating batch requests into the prover. It proved to be challenging as the prover needed to be compatible with both Ethers.js and Web3JS. Ethers.js does not have a public interface for batch requests while Web3JS does.
- Nazar opened a PR that includes several final features and refactoring for the prover. Once this PR is merged, he plans to close the epic issue and will open a separate issue for the P2P interface for the light client.
- Nazar enabled an ESLint rule for detecting unnecessary typecasting and found that there was a lot of unnecessary typecasting in the source code. He opened a PR to address this and is waiting for feedback on whether to keep or remove the unnecessary typecastings.
- Nazar conducted research on integrating the prover with MetaMask through MetaMask snaps. However, he discovered that MetaMask snaps may not be the right framework for this integration. According to MetaMask documentation, snaps are not intended for long-running processes, whereas running a light client would require a long-running process. Nazar is continuing research on this topic to determine an alternative solution or how snaps could be adapted to fit the use case.
- Nazar will continue researching MetaMask snaps integration and will update the Light Client demo to use the Prover package after v1.9 is released.
The team mainly discussed issues related to the 1.9 update, including the RSS memory leak problem and batch delete anomalies in levelDB, which are currently hindering the deployment of the 1.9 version. Tuyen shared that he found the batch delete was causing the memory spike issue, and despite changing the approach to delete each slot separately, the problem persisted in one or two nodes. Tuyen is considering changing his approach to address this issue.
There was a discussion about the possibility of underlying C and C++ code causing random memory leaks, as the team heavily relies on it. Notably, levelDB, which is handling the database, had leak issues in the past. The team decided that this should be reported. Matt expressed concern that the leak might be due to the failure of using handlescopes, which would prevent the garbage collector from deleting stuff during lengthy processes. He suggested looking further into this.
Tuyen mentioned that they've had the same levelDB dependencies as in 1.8.0, where there wasn't an issue. However, an enhancement by Gajinder on May 23 appears to have sparked the issue. After its reversion, memory still spiked periodically (referred to as "barting"). Tuyen proposed deploying different versions between May 23 and May 31 to identify the exact commit causing the problem.
The team decided that this was a good plan, although it was noted that the issue was mainly seen in one medium node, and there could be some inconsistency. Tuyen thinks if we go with batch delete, it's quite consistent. We'll just actually deploy the commit with no patches on top. we do the exact commit and revert Gajinder’s PR. The issue was recognized as a blocker for deploying 1.9, but as there's no immediate rush, the team has the necessary time to work through it.
Gajinder
- He updated the specifications for DevNet 6 to 1.4.0 alpha one and updated KZG to big endian in the 4844 branch.
- He merged the PR to change blob and coding transactions to RLP and also changed the request response to new methods. Pending tasks include changing network gossip methods and changing block input.
- He also will be pushing a pre-devnet6 build with EthereumJS this week.
Tuyen
- Will proceed with deploying commits between May 23-28 on feat1, feat2, and feat3 nodes to diagnose memory barting issue.
Lion
- Lion created a PR to automate the process of analyzing why a node is missing attestations.
- He also experimented with perf and profiler with the aim of automating a network thread rendering SPGs.
Nico
- He fixed two issues noticed on the 1.9 release and started looking into doing decryption in a thread pool for the key manager API.
- He also wrote a script to analyze attestations.
Nazar:
- Nazar worked on two PRs containing small features within the Loadstar Prover.
- He added whitelisting support and batch request support.
- This week, he plans to focus on adding test coverage to the prover package and move file names to camel case.
Cayman
- He worked on getting the Libp2p update upgrade unblocked and deployed it to feat2.
- He also looked into updating to node 20 and worked on updating the native dependencies for this.
- Notes from Matt indicate node 20 fixes many regressions from node 18. We should upgrade as soon as we can.
Matt
- He managed to get stack traces and flame graphs working on worker threads.
- He's working on analyzing them and using them for better telemetry.
- He also worked on a PR with BLST and will push the next one when ready.
The primary discussion centered around the team's challenges with the 1.9 release, specifically relating to a memory leak issue that's preventing the release. The memory leak issue seems to have been narrowed down by Tuyen, who found that the external memory jumped since May 23rd. A peculiar fact is that this seems related to the work of change archiving strategy to always store last finalized, which is strange as it used to be performed but not as frequently as currently. The PR related to this issue is not big, and the team was invited to take a look to see if anything can be found. The team seemed skeptical about this PR being the cause of the issue as it mostly deals with fetching some keys from the DB and deciding what to delete, and doesn't appear to involve substantial memory consumption. There was some discussion about the memory leak being related to the external memory, not heap memory, and the possibility of the issue being in LevelDB, given that the external memory could be affected by buffers and array buffers. The team's next steps involve further investigating this issue, considering reverting the PR or changing the frequency of the archive state calls to help narrow down the problem.
Besides the memory leak, other issues for the 1.9 release were mentioned, including unstable peers and request response handlers. One of the main concerns was the instability of peers when the network thread is enabled, resulting in numerous ban messages from other peers. There was a suggestion to temporarily disable the worker thread, release a 1.9 RC1 for testing, and continue to work on the issues as part of 1.10.
Nazar
- Nazar closed the PR for "estimate gas for the prover", which completed the list of web3 methods planned for the version 0 of the prover.
- He is currently working on some improvements before the prover package is ready for publication.
- Two PRs are ready for review and have been shared on Discord.
- An issue regarding file naming convention was discussed, with the consensus leaning towards snake_case for file and directory names.
- Nazar will open a PR for the file renames soon, to avoid potential issues with others.
- A linter rule has been added to ensure the snake_case file naming convention is adhered to in the future. However, directory names cannot be lint-checked automatically, so manual review is required to ensure the naming convention is adhered to.
Tuyen
- Tuyen spent most of his time investigating version 1.9, but there are not many results yet apart from identifying a memory leak issue.
- He has added some metrics to the unknown block sync panel.
- A performance issue was identified with calling fork choice hash block, which was found to be the main consumer for the network processor.
- He also has a PR for deduplicating notifier log, which has been merged.
Gajinder
- Gajinder reported that DevNet 5 was resurrected after being in limbo for more than two weeks.
- The resurrected nodes could sync using checkpoint sync without being rate-limited by Lighthouse nodes.
- DevNet 6 spec has a few open PRs, and most of the spec for 4844 is finalized.
- He completed a PR in Loadstar to desensitize the blob transactions, with only a small part pending to add example RLP transactions.
- He also separated the request response from the "Free the Blobs/Decouple" master PR.
- He created a Blobfish banner for Deneb at the request of Barnabas from EF DevOps.
Nico
- Nico addressed minor issues from version 1.9 and opened some PRs to fix them. He mentioned problems with event listener warnings, especially after upgrading the worker.
- Nico also reported compatibility issues with Lighthouse VC, including a time discrepancy error with the latest release.
- He also noted interop issues with Teku.
- Team managed to get the overall cluster running, successfully making some attestations, but acknowledged that the cluster broke again over the weekend.
- His goal for the upcoming period is to continue fixing issues and close existing ones.
Cayman
- Spent the last week looking at metrics from 1.9, particularly the libp2p worker. His analysis suggests that increased workloads might be causing timing issues due to increased data processing.
- While working with the libP2P team, they found bugs in the Rust YAMUX implementation, which he suggested trying again in Lodestar after version 1.9 is released.
An upcoming protocol program presentation was announced which will cover project updates from the last quarter and future plans.
The team discussed the v1.9 planning with a focus on outstanding PRs. The libp2p 0.45 upgrade will be deferred to a future release due to potential complexities, but the network thread merge has been completed. Several other PRs were reviewed and discussed, including "Use Proper State to Verify Attestations," "Generating and Using Flame Graphs," "Improved Error Handling in Attestation Service," and "Change Archiving Strategy to Always Store Last Finalized." Most of these were close to being merged or have already been merged. The team is looking to commit to a v1.9 release within the next 24 hours so we can include cleanly exit process on graceful shutdown (5330).
The team also discussed the need to ensure the beacon node is shutting down cleanly. There was a need for the threads package to be updated and published, which Cayman can likely do when he returns.
One of the PRs (5521) aims to fix the browser test, and the finalized proposal log, recently added, is under review for its continued inclusion.
The team aims to release v1.9 by the end of the week and subsequently update their production nodes. It is hoped this will address some issues with lower effectiveness and missed attestations.
BLST integration in lodestar: He has successfully integrated BLST into lodestar, which has resulted in good metrics. However, Matt admits he made a mistake with a promise return in one of the tests, which Gajinder caught. Matt thanked Gajinder for his efforts in reviewing the pull requests and for his valuable feedback. Matt has updated his work based on Gajinder's comments. Over the weekend, Matt worked on refactoring the code and making some additional changes. He has started to rework the code to demonstrate different parts and iterations. Matt has made progress with the memory model and class hierarchy, which he hopes to finish soon. Matt also worked on flame graph testing. He has simplified the code significantly and tested to ensure everything looks good. Matt has put up another metrics pull request for block errors. After integrating BLST, Matt noticed improvements in the system's performance, with no negative gossip scores or bad behavior reports. However, he discovered that he implemented it incorrectly, which could have potentially skewed the metrics. He plans to refactor this after Gajinder's next PR review. During a performance test, Matt found a mistake where the test was passing by the promises, resulting in falsely high gains. Once the error was corrected, the performance was on par with previous measurements. Matt will redesign performance tests to better account for multithreading and will attempt to make it work on CI. If not, he will test it locally on his machine.
Nazar has been working on several pull requests (PRs) for the prover, with the latest one implementing the eth_estimateGas. He is currently fixing some failing browser tests related to this PR. With the conclusion of the eth_estimateGas, the initial plan for the version 0 of the prover would be finished. The future work on this will involve refining and fixing any remaining issues with the package. No new features will be added at this stage. With the completion of Prover's v0, Nazar plans to communicate with MetaMask to test the prover. He suggested setting up an alpha or testing ground with MetaMask for this purpose. In addition to his work on the prover, Nazar is also working on a PR for continuous integration (CI) improvements. Nazar mentioned that due to public holidays on Friday and Monday in Germany, and his plan to travel during this time, it will be a short work week for him. He aims to finish up all the open PRs before Friday and do some fine-tuning of the package to get it ready for discussion with the MetaMask team. He also mentioned a recent change where a new logger package was created. However, a lot of the code is still referencing the logger type from the utils package. Nazar suggested updating the references to point to the new logger package whenever developers come across them during their work.
Nico has mainly been focusing on ensuring a clean shutdown for the beacon node. He is currently dealing with a few remaining errors that get thrown on shutdown, some of which may be due to the database closing prematurely. However, he expects the PR to be ready once the threads package is released. He has made a few minor updates, including improving some locks and fixing some ESLint issues, although he mentioned that the latter is not particularly important to merge right now. Nico plans to consider what could be included in version 1.9. One potential feature is decrypting in a thread pool when keys are imported via the API. However, he is unsure how easy this would be to implement, given the current implementation, and will need to investigate it further.
Gajinder has been working on and contributing to these PRs, incorporating feedback and adjustments. The result is a tentative DevNet 6 spec. However, many PRs are still open and, once closed, Gajinder plans to implement some of the spec items for DevNet 6. He also created two PRs. The first integrates the new database for the Blob sidecars, and the second enables a spec test by adding the remaining types. His plan is to extract network PRs once the database PRs are out. He has integrated the part where blobs were signed, and the block contents are now referred to in the Beacon API spec. Gajinder studied the Node add-on API and Matt's work, and came to a conclusion that some aspects can potentially be simplified. He emphasized that the memory management is critical and should not cause the main Lodestar process to crash. Therefore, he suggested the need for incremental development to prevent any memory issues. He wrote some specs for the custody game, which was mentioned in the context of discussions around endianness of KZG libraries. Gajinder will work on the DevNet 6 preparation, dig deeper into verkle, and aim to extract network work from the "Free the Blobs" initiative, which primarily involves request response and gossip work.
- Lido's v2 Upgrade Deployment: Lido deployed their version 2 upgrade successfully, and the Lodestar team was prepared for it. The team will go up to 8,000 keys with Lido this week.
- Hiring: The team is in the process of hiring an additional person to work alongside Faith. Several candidates have submitted assignments for the Technical Project Manager (TPM) infrastructure position.
- Promotion of Lodestar: The team is working on promoting Lodestar more effectively. They plan to set up a developer blog with the help of Mark Hans and use AI like ChatGPT to help publicize the work done at Lodestar, aiming to attract more developers.
- Writing and Storytelling: The team was encouraged to take notes on their problem-solving processes and use ChatGPT to write them up. The goal is to generate content that illuminates the unique challenges they tackle and the solutions they come up with. This could also serve as a recruiting tool.
- Content Management System: A content management system is expected to be up within the next two weeks, with assistance from Cindy. This is part of the effort to attract more people to their website and work.
- Obol Cluster: The team is still working on the Obol cluster and is awaiting a signed message from Cayman for the Distributed Key Generation (DKG).
- Lodestar's Integration into Docker Scripts: The team appreciated Nico's work on integrating Lodestar into Docker scripts.
- Networking Thread Updates: An update was given on the networking thread. The branch is up to date and will be merged after fixing the failing end-to-end tests. After the merge, any reorganization work will be carried out in a separate refactoring PR.
- Release Planning: The team is aiming for a new release or at least a Release Candidate (RC) next week. While not all milestones for version 1.9 will be reached, they want to determine the essential features to include. The network thread updates were deemed a critical component of the next release.
-
Gajinder's Work: Gajinder worked on the proposal stats PR, which has been merged, and also worked on rebasing the 'Free the Blobs' project. A part of this project, the block signing section, was extracted out into a separate PR, which Tuyen has provided feedback on. Gajinder also mentioned the ongoing spec changes around Blob transactions and the debate around endianness in the underlying blob library and blobs themselves.
-
DevNet 5 Issues: On DevNet 5, most nodes are out of sync except for Lighthouse nodes, which were on a different fork. Lighthouse was having issues serving the Lodestar node, leading to slow slot syncing. Gajinder has flagged these issues to the Lighthouse team.
-
Restarting Lodestar: Gajinder is investigating an issue where restarting Lodestar leads to it starting from many epochs back, from the last finalized state. A potential solution could involve saving the last finalized state and clearing previous finalized states within a specific window, ensuring a more efficient restart.
-
Work on BLST Implementation: Gajinder, Matthew, and Cayman are collaborating on BLST implementation for multi-threading.
-
Implications of RLP in CL: The change from SSZ to RLP for Blob transactions in the CL layer could have implications for the current implementation, specifically in terms of matching commitments against the version hashes in the transaction. However, a workaround is proposed, involving sending computed version hashes to the EL layer for verification against the transactions, which can avoid the need for deserializing the transaction in the CL. This means that for the time being, RLP support might not be required in the CL.
-
-
Lion's Work: Lion has completed the PR for the network thread, particularly improving the functionality of loggers. He has also worked on some DevOps issues and is currently working to improve debug logs.
-
Nico's Work: Nico has been looking into incompatibility issues reported by Rocklogic and has updated the way they generate change logs in preparation for v1.9. He is also investigating issues like the high priority node issue with the node not shutting down cleanly. He mentioned that they can now publish the dev node packages themselves, but this involves on-chain transactions which can be costly.
-
Tuyen's Work: Tuyen has finished the work on stable P-scoring and is now working to improve block import by batching onto the I/O operation. He has also worked on a solution to search for unknown block routes when there is an attestation with an unknown route.
-
Nazar's Work: Nazar has merged the ETH call implementation for the prover and is now working on the last method, estimate gas. He is also extending the simulation test with Capela support and planning to run the simulation environment in the CI for end-to-end tests.
-
Matthew's Work: Matthew has made significant progress with the BLST library, with all specs now passing. He is also beginning the implementation in Lodestar and has created his first PR, which included metrics and a Grafana dashboard. Matthew is also working on finalizing the Flamescope work for inclusion in v1.9.
-
Cayman's Work: Cayman has been working on unblocking the networking thread PR and has started reviewing Matthew's BLST PRs. He is also planning to work on IPv6 support for Lodestar.