Skip to main content

Library to read file blocks as fast as possible

Project description

filewarmer - File Warm Up Helper

This library will do nothing but try to read file blocks as fast as possible.

It's useful to initialize/pre-warm volumes.

Installation

pip install filewarmer

Usage

from filewarmer import FWUP

fwup = FWUP()
fwup.warmup(
    ["/var/lib/mysql/abc_demo/tab@020Sales@020Invoice.ibd", "/var/lib/mysql/abc_demo/tab@020Sales@020Invoice.ibd"]
    method="io_uring",
    small_file_size_threshold=1024 * 1024,
    block_size_for_small_files=256 * 1024,
    block_size_for_large_files=256 * 1024,
    small_files_worker_count=1,
    large_files_worker_count=1,
)

Notes -

  • Block size's unit is bytes.
  • For io_uring, it's recommended to use Linux Kernel 5.1 or higher.
  • For io_uring, use a single thread to submit the requests.

Build + Publish

TWINE_PASSWORD=xxxxxx VERSION=0.0.10 ./build.sh

In the TWINE_PASSWORD put the API key of the PyPi account.

Why this library?

Once you create a new RBS volume from a snapshot, you will not get performance as expected. It's because AWS keep the Snapshot in S3 and when you try to access certain blocks of the volume, it will download the blocks from S3 to the EBS volume. This process is called "Lazy Loading".

But for our database physical restoration process, we need to pre-warm selected files for validation and copying to target directory. To make next process faster, we need to pre-warm the files beforehand.

The AWS documentation suggests to use dd or fio to pre-warm the files. But both of them are not optimized for this purpose and we can get at max 25~40MB/s speed on a 3000IOPS 600MB/s disk. But with this library, we can get almost 90% of speed.

In the library, we also do batching of small files. In case of database, we have lot of small files and reading them and waiting for the io completion will take more time. So, we read them in batches.

In io_uring, we can submit multiple requests and wait for the completion of all of them.

Assume, small files are those <= 1MB.

In io_uring we are submitting 64 requests at a time.

1MB = 256KB * 4

So, we can submit (64/4) = 16 small 1MB files i/o requests at a time.

This will reduce the overhead of submitting the requests and waiting for the io completion.

So, in case of io_uring, we should run only two threads to submit the requests.

  • One for small files
  • Another for large files

Note > We have also support for psync which is default in most of the io operation due to support in every Linux Kernel. For Linux Kernel 5.1 or higher, we should use io_uring which is faster than psync.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filewarmer-0.0.17.tar.gz (2.8 MB view details)

Uploaded Source

File details

Details for the file filewarmer-0.0.17.tar.gz.

File metadata

  • Download URL: filewarmer-0.0.17.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.20 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.9

File hashes

Hashes for filewarmer-0.0.17.tar.gz
Algorithm Hash digest
SHA256 052de643772a6a571c3dbf675a864a3536848d977d4b3b201d139fed8eea23f4
MD5 0d5387478f4a66cdfe3298f46a5965a3
BLAKE2b-256 dccd345ebf521bd2e6a9515b2cf26969c1fd0cd16b97d7cb3a23f96121659042

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page