Skip to content

okx/zeknox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZEKNOX - ZEro KNOwledge Xcelerated

This repo implements primitives used in Zero Knowledge Proofs accelerated with CUDA for Nvidia GPUs. In particular, this repo is used by OKX Plonky2 fork.

The following primitives are implemented:

  • Poseidon hashing over Goldilocks elements (in C/C++ and CUDA) - see native/poseidon.
  • Poseidon hashing over BN254 (or BN128) elements (in C/C++ and CUDA) - see native/poseidon.
  • Poseidon2 hashing over Goldilocks elements (in C/C++ and CUDA) - see native/poseidon2.
  • Keccak hashing over Goldilocks elements (in C/C++ and CUDA) - see native/keccak.
  • Monolith hashing over Goldilocks elements (in C/C++ and CUDA) - see native/monolith.
  • Merkle Tree building (compatible with Plonky2) using any of the above hashing methods - see native/merkle.
  • NTT (including LDE and transpose) over Goldilocks field - see native/ntt.
  • MSM over BN254 - see native/msm.

Building and Testing

Prerequisites

  • git submodules
$ git submodule update --init
  • gcc/g++, make, gtest. To install these in Ubuntu, run:
$ sudo apt update
$ sudo apt install -y gcc g++ clang make cmake libc++-dev libgtest-dev
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
export NVCC=/usr/local/cuda/bin/nvcc

For example, to install CUDA 12.6:

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
$ sudo apt-get update
$ sudo apt-get -y install cuda-toolkit-12-6 cuda-drivers

Then, reboot your system:

$ sudo reboot

Build CUDA library

$ cd native
$ cmake -B build
$ cmake --build build -j

Note 1: the steps above build the library with Goldilocks support, without MSM.

Note 2: by default, the CUDA code is compiled for sm_89. To change the default CUDA architeture, use -DCUDA_ARCH=XY (e.g., -DCUDA_ARCH=86).

Build and run CUDA tests

Note: this requires an Nvidia GPU.

$ cd native
$ cmake -B build -DBUILD_TESTS=ON
$ cmake --build build -j
$ ./tests.exe

Install the library

First, make sure you also build libblst:

$ cd depends/blst
$ ./build.sh
$ cd ../..

After that, run (in native folder):

$ sudo cmake --install build

Curve Parameters Generation

The curve/field parameters are generated by a template:

$ cd scripts
$ python3 new_curve_script.py configs/${field}.json

For Goldilocks field (see details), generate the parameters as:

$ cd scripts
$ python3 gen_field_params.py configs/gl64.json
# or (for compatibility with OxPolygonZero Plonky2)
$ python3 gen_field_params.py configs/gl64_v2.json

Then re-build the CUDA library as described above.

FAQ

Please see our FAQ page.

Examples and Benchmarks

Next, we present three examples of integrating this library to speedup ZK primitives and applications.

E1. Plonky2

In Plonky2, we offload Merkle Tree building (with hashing) and Low Degree Extention (LDE) with Number Theoretic Transform (NTT) to a GPU (or multiple GPUs) (more details here). Next, we list the steps needed to build Plonky2 with GPU acceleration:

$ git clone https://github.com/okx/plonky2.git
$ cd plonky2
$ git checkout dev
$ rustup update
$ rustup override set nightly-x86_64-unknown-linux-gnu
$ cargo build --release --features=cuda

Next, we show benchmarking results for Merkle Tree building with Poseidon, Poseidon2, and Poseidon over BN254, comparing the CPU-only with the CPU+GPU execution. To run these benchmarks, simply:

$ git clone https://github.com/okx/plonky2.git
$ cd plonky2
$ git checkout dev
$ cd plonky2
$ cargo bench --bench=merkle
$ cargo bench --bench=merkle --features=cuda

The following results are from an GCP g2-standard-32 instance with 32 vCPU of Intel Xeon type and one NVIDIA L4 GPU.

Hash Leaves CPU-only CPU+GPU Speedup
Poseidon 8192 26.8 ms 11.5 ms 2.3 X
Poseidon 16384 53.4 ms 20.2 ms 2.6 X
Poseidon 32768 111.1 ms 44.8 ms 2.5 X
Poseidon2 8192 30.9 ms 8.4 ms 3.7 X
Poseidon2 16384 61.4 ms 16.6 ms 3.7 X
Poseidon2 32768 127.0 ms 39.2 ms 3.2 X
Poseidon BN128 8192 404.7 ms 73.5 ms 5.5 X
Poseidon BN128 16384 809.4 ms 124.0 ms 6.5 X
Poseidon BN128 32768 1618.4 ms 239.9 ms 6.7 X

Next, we show benchmarking results for LDE + MT building with Poseidon, comparing the CPU-only with the CPU+GPU execution. To run these benchmarks, simply:

$ git clone https://github.com/okx/plonky2.git
$ cd plonky2
$ git checkout dev
$ cd plonky2
$ cargo bench --bench=lde
$ cargo bench --bench=lde --features=cuda
LDE size (log) CPU-only CPU+GPU Speedup
13 6.5 ms 3.1 ms 2.1 X
14 11.6 ms 4.2 ms 2.8 X
15 22.0 ms 6.0 ms 3.7 X

E2. zk_evm (Type 1 ZK EVM from 0xPolygonZero)

$ sudo apt install -y librust-openssl-dev bc
$ git clone https://github.com/okx/zk_evm.git
$ cd zk_evm
$ git checkout dev
$ cd scripts
$ ./prove_stdio.sh ../artifacts/witness_b3_b6.json
$ ./prove_stdio.sh ../artifacts/witness_b19807080.json

The following results are from an GCP g2-standard-32 instance with 32 vCPU of Intel Xeon type and one NVIDIA L4 GPU.

Input CPU-only CPU+GPU Speedup
witness_b3_b6.json 193.7 ms 111.1 ms 1.74 X
witness_b19807080.json 294.6 ms 174.5 ms 1.69 X

E3. Proof-of-Reserves-v2 (OKX)

$ git clone https://github.com/okx/proof-of-reserves-v2.git
$ cd proof-of-reserves-v2.git
$ git checkout dev-dumi-v0.1.0

then follow the steps presented in the README.

The following results are from an GCP g2-standard-32 instance with 32 vCPU of Intel Xeon type and one NVIDIA L4 GPU, for proving 1,310,720 accounts.

CPU-only CPU+GPU Speedup
2834 s 1377 s 2.06 X

E4. gnark

In gnark, we offload some of the groth16 MSM computations to the GPU (check backend/groth16/bn254/zeknox/zeknox.go). Then, we benchmark the performance of proving 10 secp256r1 signatures:

$ git clone https://github.com/okx/gnark.git
$ cd gnark
$ git checkout zeknox
$ cd examples
$ go build
$ ./examples

Note: you need to install go to run the steps above:

$ sudo snap install go --classic

The following results are from an GCP g2-standard-32 instance with 32 vCPU of Intel Xeon type and one NVIDIA L4 GPU:

CPU-only CPU+GPU Speedup
5840.96 s 3792.48 s 1.54 X

Contributing

You are welcome to report any issue via Github repo issues. However, due to limited time, we may not be able to fix the issues in a fast way. You can also propose bug fixes or new features via pull requests. Again, we may not be able to accept all the pull requests due to reasons such as limited time to review or incompatibility of the proposed code with the existing code. For more details, please read Plonky2 contributing guide (we follow it for this repo as well).

License

Apache License, Version 2.0 LICENSE