Library to read file blocks as fast as possible
Project description
filewarmer - File Warm Up Helper
This library will do nothing but try to read file blocks as fast as possible.
It's useful to initialize/pre-warm volumes.
Installation
pip install filewarmer
Usage
from filewarmer import FWUP
fwup = FWUP()
fwup.warmup(
["/var/lib/mysql/abc_demo/tab@020Sales@020Invoice.ibd", "/var/lib/mysql/abc_demo/tab@020Sales@020Invoice.ibd"]
method="io_uring",
small_file_size_threshold=1024 * 1024,
block_size_for_small_files=256 * 1024,
block_size_for_large_files=256 * 1024,
small_files_worker_count=1,
large_files_worker_count=1,
)
Notes -
- Block size's unit is bytes.
- For io_uring, it's recommended to use Linux Kernel 5.1 or higher.
- For io_uring, use a single thread to submit the requests.
Build + Publish
TWINE_PASSWORD=xxxxxx VERSION=0.0.10 ./build.sh
In the
TWINE_PASSWORD
put the API key of the PyPi account.
Why this library?
Once you create a new RBS volume from a snapshot, you will not get performance as expected. It's because AWS keep the Snapshot in S3 and when you try to access certain blocks of the volume, it will download the blocks from S3 to the EBS volume. This process is called "Lazy Loading".
But for our database physical restoration process, we need to pre-warm selected files for validation and copying to target directory. To make next process faster, we need to pre-warm the files beforehand.
The AWS documentation suggests to use dd
or fio
to pre-warm the files. But both of them are not optimized for this purpose and we can get at max 25~40MB/s speed on a 3000IOPS 600MB/s disk. But with this library, we can get almost 90% of speed.
In the library, we also do batching of small files. In case of database, we have lot of small files and reading them and waiting for the io completion will take more time. So, we read them in batches.
In io_uring, we can submit multiple requests and wait for the completion of all of them.
Assume, small files are those <= 1MB.
In io_uring we are submitting 64 requests at a time.
1MB = 256KB * 4
So, we can submit (64/4) = 16 small 1MB files i/o requests at a time.
This will reduce the overhead of submitting the requests and waiting for the io completion.
So, in case of io_uring, we should run only two threads to submit the requests.
- One for small files
- Another for large files
Note > We have also support for psync
which is default in most of the io operation due to support in every Linux Kernel. For Linux Kernel 5.1 or higher, we should use io_uring
which is faster than psync
.
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file filewarmer-0.0.17.tar.gz
.
File metadata
- Download URL: filewarmer-0.0.17.tar.gz
- Upload date:
- Size: 2.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.20 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 052de643772a6a571c3dbf675a864a3536848d977d4b3b201d139fed8eea23f4 |
|
MD5 | 0d5387478f4a66cdfe3298f46a5965a3 |
|
BLAKE2b-256 | dccd345ebf521bd2e6a9515b2cf26969c1fd0cd16b97d7cb3a23f96121659042 |