Skip to content

Commit

Permalink
Merge main
Browse files Browse the repository at this point in the history
Signed-off-by: Edwin Greene <[email protected]>
  • Loading branch information
edwin-greene committed Jan 17, 2025
2 parents 0e125e1 + 0a8166b commit 5bb5c1f
Show file tree
Hide file tree
Showing 62 changed files with 2,325 additions and 912 deletions.
102 changes: 83 additions & 19 deletions docs/database/bootstrap.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This guide provides step-by-step instructions for setting up a fresh PostgreSQL
## Table of Contents

- [Prerequisites](#prerequisites)
- [1. Optional High-Performance Decompressors](#1-optional-high-performance-decompressors)
- [Database Initialization and Data Import](#database-initialization-and-data-import)
- [1. Download the Required Scripts and Configuration File](#1-download-the-required-scripts-and-configuration-file)
- [2. Edit the `bootstrap.env` Configuration File](#2-edit-the-bootstrapenv-configuration-file)
Expand All @@ -15,9 +16,15 @@ This guide provides step-by-step instructions for setting up a fresh PostgreSQL
- [3.2. List Available Versions](#32-list-available-versions)
- [3.3. Select a Version](#33-select-a-version)
- [3.4. Download the Data](#34-download-the-data)
- [Download Minimal DB Data Files](#download-minimal-db-data-files)
- [Download Full DB Data Files](#download-full-db-data-files)
- [4. Check Version Compatibility](#4-check-version-compatibility)
- [5. Run the Bootstrap Script](#5-run-the-bootstrap-script)
- [6. Monitoring and Managing the Import Process](#6-monitoring-and-managing-the-import-process)
- [6.1. Monitoring the Import Process](#61-monitoring-the-import-process)
- [6.2. Stopping the Script](#62-stopping-the-script)
- [6.3. Resuming the Import Process](#63-resuming-the-import-process)
- [6.4. Start the Mirrornode Importer](#64-start-the-mirrornode-importer)
- [Handling Failed Imports](#handling-failed-imports)
- [Additional Notes](#additional-notes)
- [Troubleshooting](#troubleshooting)
Expand All @@ -37,6 +44,16 @@ This guide provides step-by-step instructions for setting up a fresh PostgreSQL
- `realpath`
- `flock`
- `curl`
- `b3sum`

### 1. Optional High-Performance Decompressors

The script automatically detects and uses faster alternatives to `gunzip` if they are available in the system's or user's PATH:

- [rapidgzip](https://github.com/mxmlnkn/rapidgzip) - A high-performance parallel gzip decompressor (fastest option, even for single-threaded decompression)
- [igzip](https://github.com/intel/isa-l) - Intel's optimized gzip implementation from ISA-L (second fastest option)

These tools can significantly improve decompression performance during the import process. If neither is available, the script will fall back to using standard `gunzip`.

4. Install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install), then authenticate:

Expand Down Expand Up @@ -144,7 +161,7 @@ gcloud config set project YOUR_GCP_PROJECT_ID
To see the available versions of the database export, list the contents of the bucket:
```bash
gsutil -m ls gs://mirrornode-db-export/
gcloud storage ls gs://mirrornode-db-export/
```
This will display the available version directories.
Expand All @@ -162,13 +179,32 @@ This will display the available version directories.

#### 3.4. Download the Data

Create a directory to store the data and download all files and subdirectories for the selected version:
Choose one of the following download options based on your needs:

##### Download Minimal DB Data Files

Create a directory and download only the minimal database files:

```bash
mkdir -p /path/to/db_export
export CLOUDSDK_STORAGE_SLICED_OBJECT_DOWNLOAD_MAX_COMPONENTS=1 && \
VERSION_NUMBER=<VERSION_NUMBER> && \
gcloud storage rsync -r -x '.*_part_\d+_\d+_\d+_atma\.csv\.gz$' "gs://mirrornode-db-export/$VERSION_NUMBER/" /path/to/db_export/
```

##### Download Full DB Data Files

Create a directory and download all files and subdirectories for the selected version:

```bash
mkdir -p /path/to/db_export
gsutil -m cp -r gs://mirrornode-db-export/<VERSION_NUMBER>/* /path/to/db_export/
export CLOUDSDK_STORAGE_SLICED_OBJECT_DOWNLOAD_MAX_COMPONENTS=1 && \
VERSION_NUMBER=<VERSION_NUMBER> && \
gcloud storage rsync -r "gs://mirrornode-db-export/$VERSION_NUMBER/" /path/to/db_export/
```

For both options:

- Replace `/path/to/db_export` with your desired directory path.
- Replace `<VERSION_NUMBER>` with the version you selected (e.g., `0.111.0`).
- Ensure all files and subdirectories are downloaded into this single parent directory.
Expand Down Expand Up @@ -207,18 +243,30 @@ The `bootstrap.sh` script initializes the database and imports the data. It is d
# Should list bootstrap.sh and bootstrap.env
```
2. **Run the Bootstrap Script Using `nohup` and Redirect Output to `bootstrap.log`:**
2. **Run the Bootstrap Script Using `setsid` and Redirect Output to `bootstrap.log`:**
To ensure the script continues running even if your SSH session is terminated, run it in a new session using `setsid`. The script handles its own logging, but we redirect stderr to capture any startup errors:
For a minimal database import (default):
```bash
setsid ./bootstrap.sh 8 /path/to/db_export > /dev/null 2>> bootstrap.log &
```
To ensure the script continues running even if your SSH session is terminated, run it using `nohup`, redirect stdout and stderr to `bootstrap.log`, and save its process ID (PID) to a file.
For a full database import:
```bash
nohup setsid ./bootstrap.sh 8 /path/to/db_export > /dev/null 2>> bootstrap.log &
setsid ./bootstrap.sh 8 --full /path/to/db_export > /dev/null 2>> bootstrap.log &
```
- The script handles logging internally to `bootstrap.log`, and the execution command will also append stdout/stderr of the script itself to the log file.
- The script handles logging internally to `bootstrap.log`, and the execution command will also append stderr to the log file
- `8` refers to the number of CPU cores to use for parallel processing. Adjust this number based on your system's resources.
- `/path/to/db_export` is the directory where you downloaded the database export data.
- `bootstrap.pid` stores the PID of the running script for later use.
- The script creates several tracking files:

- `bootstrap.pid` stores the process ID used for cleanup of all child processes if interrupted
- `bootstrap_tracking.txt` tracks the progress of each file's import and hash verification
- `bootstrap_discrepancies.log` records any data verification issues
- **Important**: The SKIP_DB_INIT flag file is automatically created by the script after a successful database initialization. Do not manually create or delete this file. If you need to force the script to reinitialize the database in future runs, remove the flag file using:
Expand All @@ -240,7 +288,7 @@ The `bootstrap.sh` script initializes the database and imports the data. It is d
### 6. Monitoring and Managing the Import Process
#### **Monitoring the Import Process:**
#### **6.1. Monitoring the Import Process:**
- **Check the Log File:**
Expand All @@ -258,8 +306,22 @@ The `bootstrap.sh` script initializes the database and imports the data. It is d
```
- This file tracks the status of each file being imported.
- Each line contains the file name, followed by two status indicators:
Import Status:
- `NOT_STARTED`: File has not begun importing
- `IN_PROGRESS`: File is currently being imported
- `IMPORTED`: File was successfully imported
- `FAILED_TO_IMPORT`: File import failed
Hash Verification Status:
- `HASH_UNVERIFIED`: BLAKE3 hash has not been verified yet
- `HASH_VERIFIED`: BLAKE3 hash verification passed
- `HASH_FAILED`: BLAKE3 hash verification failed
#### **Stopping the Script**
#### **6.2. Stopping the Script**
If you need to stop the script before it completes:
Expand All @@ -283,33 +345,34 @@ If you need to stop the script before it completes:

**Note:** Ensure that `bootstrap.sh` is designed to handle termination signals and clean up its child processes appropriately.

#### **Resuming the Import Process**
#### **6.3. Resuming the Import Process**

- **Re-run the Bootstrap Script:**

```bash
nohup setsid ./bootstrap.sh 8 /path/to/db_export > /dev/null 2>> bootstrap.log &
setsid ./bootstrap.sh 8 /path/to/db_export > /dev/null 2>> bootstrap.log &
```

- The script will resume where it left off, skipping files that have already been imported successfully.
- Add the `--full` flag if you were using full database mode.

#### **Start the Mirrornode Importer**
#### **6.4. Start the Mirrornode Importer**

- Once the bootstrap process completes without errors, you may start the Mirrornode Importer.

---

## Handling Failed Imports

During the import process, the script generates a file named `bootstrap_tracking.txt`, which logs the status of each file import. Each line in this file contains the path and name of a file, followed by its import status: `NOT_STARTED`, `IN_PROGRESS`, `IMPORTED`, or `FAILED_TO_IMPORT`.
During the import process, the script generates a file named `bootstrap_tracking.txt`, which logs the status of each file import. Each line in this file contains the path and name of a file, followed by its import and hash verification status (see [Monitoring and Managing the Import Process](#6-monitoring-and-managing-the-import-process) for status descriptions).

**Example of `bootstrap_tracking.txt`:**

```
/path/to/db_export/record_file.csv.gz IMPORTED
/path/to/db_export/transaction/transaction_part_1.csv.gz IMPORTED
/path/to/db_export/transaction/transaction_part_2.csv.gz FAILED_TO_IMPORT
/path/to/db_export/account.csv.gz NOT_STARTED
/path/to/db_export/record_file.csv.gz IMPORTED HASH_VERIFIED
/path/to/db_export/transaction/transaction_part_1.csv.gz IMPORTED HASH_VERIFIED
/path/to/db_export/transaction/transaction_part_2.csv.gz FAILED_TO_IMPORT HASH_FAILED
/path/to/db_export/account.csv.gz NOT_STARTED HASH_UNVERIFIED
```

**Notes on Data Consistency:**
Expand Down Expand Up @@ -351,6 +414,7 @@ During the import process, the script generates a file named `bootstrap_tracking
- Review `bootstrap.log` for detailed error messages.
- Check `bootstrap_tracking.txt` to identify which files failed to import.
- Check `bootstrap_discrepancies.log` for any data verification issues (this file is only created if discrepancies are found in file size, row count, or BLAKE3 hash verification).
- Re-run the `bootstrap.sh` script to retry importing failed files.
- **Permission Denied Errors:**
Expand All @@ -365,7 +429,7 @@ During the import process, the script generates a file named `bootstrap_tracking
- **Script Does Not Continue After SSH Disconnect:**
- Ensure you used `nohup` when running the script.
- Ensure you used `setsid` when running the script.
- Confirm that the script is running by checking the process list:
```bash
Expand Down
47 changes: 12 additions & 35 deletions docs/design/block-streams.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,29 +113,6 @@ public class BlockFileTransformer implements StreamFileTransformer<RecordFile, B
@Override
public RecordFile transform(BlockFile block);

/**
* Block Items list needs to include the block items of the following type to calculate the
* hash for the Root of Output Merkle Tree:
* - BlockHeader
* - TransactionOutput
* - StateChange
*/
private byte[] calculateOutputTreeRootHash(List<BlockItem> blockItems);

/**
* Block hashes are not included in the block. They must be calculated from the contents of the block.
*
* The `previousBlockHash` is located in the BlockHeader
* The `inputTreeRootHash` is located in the BlockStreamInfo of the last StateChange of the block
* The `startOfBlockStateRootHash` is located in the BlockProof
*/
private byte[] calculateBlockHash(
byte[] previousBlockHash,
byte[] inputTreeRootHash,
byte[] outputTreeRootHash,
byte[] startOfBlockStateRootHash
);

// The transaction hash will not be included in the block stream output so we will need to calculate it
private byte[] calculateTransactionHash(EventTransaction transaction);
}
Expand Down Expand Up @@ -177,32 +154,32 @@ public class BlockStreamPoller extends StreamPoller<BlockFile> {
}
```

#### StreamFileNotifier

-Rename `verified` to `notify` as blocks will not be verifiable until each state change has been processed by the BlockStreamVerifier.

#### BlockStreamVerifier

```java
package com.hedera.mirror.importer.downloader.block;

public class BlockStreamVerifier {
private final StreamFileNotifier streamFileNotifier;
private final StreamFileTransformer<RecordFile, BlockFile> blockFileTransformer;
private final RecordFileParser recordFileParser;
private final StreamFileNotifier streamFileNotifier;

/**
* Transforms the block file into a record file, verifies the hash chain and then parses it
* Verifies the block file, transforms it into a record file, and then notifies the parser
*/
public void notify(@Nonnull StreamFile<?> streamFile);
public void verify(@NotNull BlockFile blockFile);

/**
* For Block N the hash must be verified to match the previousBlockHash protobuf value provided by Block N+1
* Verifies the block number of the block file
* - that the block number is one after the previous block number if exists
* - that the block number from the file name matches the block number in the block
*/
private void verifyHashChain(String expected, String actual);
private void verifyBlockNumber(BlockFile blockFile);

// Verifies that the number of the block file contained in its file name matches the block number within the block file
private void verifyBlockNumber(String expected, String actual);
/**
* The previous hash from the block must match the hash of the previous block
*/
private void verifyHashChain(BlockFile blockFile);
}
```

Expand All @@ -223,7 +200,7 @@ alter table if exists record_file
add column if not exists round_end bigint null;

alter table if exists topic_message
alter column if exists running_hash_version drop not null;
alter column running_hash_version drop not null;
```

### Block to Record File Transformation
Expand Down
Loading

0 comments on commit 5bb5c1f

Please sign in to comment.