Skip to main content

Chia vdf verification (wraps C++)

Project description

Chia VDF

Build PyPI PyPI - Format GitHub

Total alerts Language grade: Python Language grade: C/C++

Building a wheel

Compiling chiavdf requires cmake, boost and GMP.

python3 -m venv venv
source venv/bin/activate

pip install wheel setuptools_scm pybind11
pip wheel .

The primary build process for this repository is to use GitHub Actions to build binary wheels for MacOS, Linux, and Windows and publish them with a source wheel on PyPi. See .github/workflows/build.yml. setup.py adds a dependency on pybind11 using pip install pybind11. Building is then managed by cibuildwheel. Further installation is then available via pip install chiavdf e.g.

Building Timelord and related binaries

In addition to building the required binary and source wheels for Windows, MacOS and Linux, chiavdf can be used to compile vdf_client and vdf_bench. vdf_client is the core VDF process that completes the Proof of Time submitted to it by the Timelord. The repo also includes a benchmarking tool to get a sense of the iterations per second of a given CPU called vdf_bench. Try ./vdf_bench square_asm 250000 for an ips estimate.

To build vdf_client set the environment variable BUILD_VDF_CLIENT to "Y". export BUILD_VDF_CLIENT=Y.

Similarly, to build vdf_bench set the environment variable BUILD_VDF_BENCH to "Y". export BUILD_VDF_BENCH=Y.

This is currently automated via pip in the install-timelord.sh script in the chia-blockchain repository which depends on this repository.

Contributing and workflow

Contributions are welcome and more details are available in chia-blockchain's CONTRIBUTING.md.

The master branch is the currently released latest version on PyPI. Note that at times chiavdf will be ahead of the release version that chia-blockchain requires in it's master/release version in preparation for a new chia-blockchain release. Please branch or fork master and then create a pull request to the master branch. Linear merging is enforced on master and merging requires a completed review. PRs will kick off a ci build and analysis of chiavdf at lgtm.com. Please make sure your build is passing and that it does not increase alerts at lgtm.

Background from prior VDF competitions

Copyright 2018 Ilya Gorodetskov generic@sundersoft.com

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Summary

The NUDUPL algorithm is used. The equations are based on cryptoslava's equations from the previous contest. They were modified slightly to increase the level of parallelism.

The GCD is a custom implementation with scalar integers. There are two base cases: one uses a lookup table with continued fractions and the other uses the euclidean algorithm with a division table. The division table algorithm is slightly faster even though it has about 2x as many iterations.

After the base case, there is a 128 bit GCD that generates 64 bit cofactor matricies with Lehmer's algorithm. This is required to make the long integer multiplications efficient (Flint's implementation doesn't do this).

The GCD also implements Flint's partial xgcd function, but the output is slightly different. This implementation will always return an A value which is > the threshold and a B value which is <= the threshold. For a normal GCD, the threshold is 0, B is 0, and A is the GCD. Also the interfaces are slightly different.

Scalar integers are used for the GCD. I don't expect any speedup for the SIMD integers that were used in the last implementation since the GCD only uses 64x1024 multiplications, which are too small and have too high of a carry overhead for the SIMD version to be faster. In either case, most of the time seems to be spent in the base case so it shouldn't matter too much.

If SIMD integers are used with AVX-512, doubles have to be used because the multiplier sizes for doubles are significantly larger than for integers. There is an AVX-512 extension to support larger integer multiplications but no processor implements it yet. It should be possible to do a 50 bit multiply-add into a 100 bit accumulator with 4 fused multiply-adds if the accumulators have a special nonzero initial value and the inputs are scaled before the multiplication. This would make AVX-512 about 2.5x faster than scalar code for 1024x1024 integer multiplications (assuming the scalar code is unrolled and uses ADOX/ADCX/MULX properly, and the CPU can execute this at 1 cycle per iteration which it probably can't).

The GCD is parallelized by calculating the cofactors in a separate slave thread. The master thread will calculate the cofactor matricies and send them to the slave thread. Other calculations are also parallelized.

The VDF implementation from the first contest is still used as a fallback and is called about once every 5000 iterations. The GCD will encounter large quotients about this often and these are not implemented. This has a negligible effect on performance. Also, the NUDUPL case where A<=L is not implemented; it will fall back to the old implementation in this case (this never happens outside of the first 20 or so iterations).

There is also corruption detection by calculating C with a non-exact division and making sure the remainder is 0. This detected all injected random corruptions that I tested. No corruptions caused by bugs were observed during testing. This cannot correct for the sign of B being wrong.

GCD continued fraction lookup table

The is implemented in gcd_base_continued_fractions.h and asm_gcd_base_continued_fractions.h. The division table implementation is the same as the previous entry and was discussed there. Currently the division table is only used if AVX2 is enabled but it could be ported to SSE or scalar code easily. Both implementations have about the same performance.

The initial quotient sequence of gcd(a,b) is the same as the initial quotient sequence of gcd(a*2^n/b, 2^n) for any n. This is because the GCD quotients are the same as the continued fraction quotients of a/b, and the initial continued fraction quotients only depend on the initial bits of a/b. This makes it feasible to have a lookup table since it now only has one input.

a*2^n/b is calculated by doing a double precision division of a/b, and then truncating the lower bits. Some of the exponent bits are used in the table in addition to the fraction bits; this makes each slot of the table vary in size depending on what the exponent is. If the result is outside the table bounds, then the division result is floored to fall back to the euclidean algorithm (this is very rare).

The table is calculated by iterating all of the possible continued fractions that have a certain initial quotient sequence. Iteration ends when all of these fractions are either outside the table or they don't fully contain at least one slot of the table. Each slot that is fully contained by such a fraction is updated so that its quotient sequence equals the fraction's initial quotient sequence. Once this is complete, the cofactor matricies are calculated from the quotient sequences. Each cofactor matrix is 4 doubles.

The resulting code seems to have too many instructions so it doesn't perform very well. There might be some way to optimize it. It was written for SSE so that it would run on both processors.

This might work better on an FPGA possibly with low latency DRAM or SRAM (compared to the euclidean algorithm with a division table). There is no limit to the size of the table but doubling the latency would require the number of bits in the table to also be doubled to have the same performance.

Other GCD code

The gcd_128 function calculates a 128 bit GCD using Lehmer's algorithm. It is pretty straightforward and uses only unsigned arithmetic. Each cofactor matrix can only have two possible signs: [+ -; - +] or [- +; + -]. The gcd_unsigned function uses unsigned arithmetic and a jump table to apply the 64-bit cofactor matricies to the A and B values. It uses ADOX/ADCX/MULX if they are available and falls back to ADC/MUL otherwise. It will track the last known size of A to speed up the bit shifts required to get the top 128 bits of A.

No attempt was made to try to do the A and B long integer multiplications on a separate thread; I wouldn't expect any performance improvement from this.

Threads

There is a master thread and a slave thread. The slave thread only exists for each batch of 5000 or so squarings and is then destroyed and recreated for the next batch (this has no measurable overhead). If the original VDF is used as a fallback, the batch ends and the slave thread is destroyed.

Each thread has a 64-bit counter that only it can write to. Also, during a squaring iteration, it will not overwrite any value that it has previously written and transmitted to the other thread. Each squaring is split up into phases. Each thread will update its counter at the start of the phase (the counter can only be increased, not decreased). It can then wait on the other thread's counter to reach a certain value as part of a spin loop. If the spin loop takes too long, an error condition is raised and the batch ends; this should prevent any deadlocks from happening.

No CPU fences or atomics are required since each value can only be written to by one thread and since x86 enforces acquire/release ordering on all memory operations. Compiler memory fences are still required to prevent the compiler from caching or reordering memory operations.

The GCD master thread will increment the counter when a new cofactor matrix has been outputted. The slave thread will spin on this counter and then apply the cofactor matrix to the U or V vector to get a new U or V vector.

It was attempted to use modular arithmetic to calculate k directly but this slowed down the program due to GMP's modulo or integer multiply operations not having enough performance. This also makes the integer multiplications bigger.

The speedup isn't very high since most of the time is spent in the GCD base case and these can't be parallelized.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chiavdf-0.12.13.tar.gz (133.5 kB view details)

Uploaded Source

Built Distributions

chiavdf-0.12.13-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

chiavdf-0.12.13-cp38-cp38-manylinux2014_x86_64.whl (371.9 kB view details)

Uploaded CPython 3.8

chiavdf-0.12.13-cp38-cp38-macosx_10_14_x86_64.whl (694.7 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

chiavdf-0.12.13-cp37-cp37m-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.7m Windows x86-64

chiavdf-0.12.13-cp37-cp37m-manylinux2014_x86_64.whl (373.2 kB view details)

Uploaded CPython 3.7m

chiavdf-0.12.13-cp37-cp37m-macosx_10_14_x86_64.whl (694.3 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

File details

Details for the file chiavdf-0.12.13.tar.gz.

File metadata

  • Download URL: chiavdf-0.12.13.tar.gz
  • Upload date:
  • Size: 133.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for chiavdf-0.12.13.tar.gz
Algorithm Hash digest
SHA256 debb1619b30f24c992851bf1c40befeb667898d746f1b9548e9aad1de96a98d6
MD5 04027ba4395cacd0b096b41a8980c542
BLAKE2b-256 e8dd96ebd900dcce0331edc7d0cf8f5a801f4a35060afaf858e140b7229e3adb

See more details on using hashes here.

File details

Details for the file chiavdf-0.12.13-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: chiavdf-0.12.13-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for chiavdf-0.12.13-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 8c52010b9c7a973b48f8330d3509dc798871740b0e0ac50adea5eb7ba9bae67e
MD5 338a9a78a3172e425228431f413dce81
BLAKE2b-256 a33c0b4641542dff897d640a294ba9cc8cae259cd20ae9e58ab801a86b0c1cf3

See more details on using hashes here.

File details

Details for the file chiavdf-0.12.13-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: chiavdf-0.12.13-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 371.9 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for chiavdf-0.12.13-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1b8070082475648036aca009c08fee476489258ab3e5d313b18f0a8049fdd605
MD5 27a4714698b5fb31f75dc646bcbf9e01
BLAKE2b-256 4c486a667dbe1fbd98094ebcd01009e8bc499747303f34e7898f585cd1d76296

See more details on using hashes here.

File details

Details for the file chiavdf-0.12.13-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: chiavdf-0.12.13-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 694.7 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for chiavdf-0.12.13-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 22541667503e3ee1b25e08f5ecabc208978d8cd774cfbbf22200279270865101
MD5 10410fe38984b0f59b189d7889e7650e
BLAKE2b-256 3b9dd315b2efc2a7099b32d7e554bd80609bd688b2b49c55bd2a299f14dce4de

See more details on using hashes here.

File details

Details for the file chiavdf-0.12.13-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: chiavdf-0.12.13-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for chiavdf-0.12.13-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 227e33528408f74e540924d3453d2a899714f35058bd0bd41cba1d914e7757d7
MD5 b866a236703c3efbded2c2fd85cfa11d
BLAKE2b-256 991cc0141b9c2bd5e9750e17b3c186aad2d2918842d6a1fc8248ae1a0d0fb314

See more details on using hashes here.

File details

Details for the file chiavdf-0.12.13-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: chiavdf-0.12.13-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 373.2 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for chiavdf-0.12.13-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 56dbbd7470ae2c565e76271eacd01162e37e3f364ce70e6909aafdd72d74960f
MD5 7ff8d2f05a3bd0a8e5d0b7f8f029a5c4
BLAKE2b-256 98b47b5aa3a60dbdb19298b1af54e53d390d9b5ec3b09c59108e006a8209b139

See more details on using hashes here.

File details

Details for the file chiavdf-0.12.13-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: chiavdf-0.12.13-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 694.3 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for chiavdf-0.12.13-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 eaf2b2af2ccf2c33e5cf6fa9bb97ba847acffdbc3c14be0e631ca6d5b1837080
MD5 ef700f460744d56d901ef3fcc060b4af
BLAKE2b-256 7cbb34b7c23ee29fbc9cd09c3a5aec7bbe7a62c2fddb0b8a0eb06b19775e589f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page