The tool considers a file so large that it does not fit in memory as a single string and performs a split process of the string. The tool stores the result as separate files.
Project description
large_file_splitter
下の方に日本語の説明があります
Overview
- The tool considers a file so large that it does not fit in memory as a single string and performs a split process of the string. The tool stores the result as separate files.
- under construction
Usage
import large_file_splitter
# Split a large file [large_file_splitter].
large_file_splitter.split(
"dummy_large_file.txt", # File to be split
split_str = "SPLIT_MARK\r\n", # Split string (For convenience of splitting, it is processed as binary internally, so setting this to a single character is not recommended because it may lead to erroneous splitting of multi-byte characters, etc.)
div_mode = "start", # mode for handling split strings (delete: split string is not included in output; start: split string is concatenated at the beginning of the next chunk; end: split string is concatenated at the end of the previous chunk)
output_filename_frame = "./output/div_%d.txt", # Template for output filename (an integer value is automatically inserted for %d)
cache_size = 10 * 1024 * 1024 # Specify the size of the chunk of data to work with in memory (in bytes; memory capacity must be at least several times this size.)
)
概要
- メモリに乗らないほど巨大なファイルを一つの文字列とみなし、文字列のsplit処理を実施。その結果を別々のファイルとして格納するツール。
- 説明は執筆中です
使用例
import large_file_splitter
# 巨大ファイルの分割 [large_file_splitter]
large_file_splitter.split(
"dummy_large_file.txt", # 分割対象ファイル
split_str = "SPLIT_MARK\r\n", # 分割文字列 (分割の都合上内部ではbinaryとして処理するので、ここを一文字等にするのは、マルチバイト文字等の誤分割に繋がる可能性があるため非推奨)
div_mode = "start", # 分割文字列の扱いのモード (delete: 分割文字列は出力に含まない; start: 分割文字列は次の塊の先頭に結合される; end: 分割文字列は前の塊の末尾に結合される)
output_filename_frame = "./output/div_%d.txt", # 出力先ファイル名のテンプレート (%dのところは自動で整数値が挿入される)
cache_size = 10 * 1024 * 1024 # メモリで作業するデータ塊の大きさの指定 (バイト単位; メモリ容量は少なくともこの数倍は必要)
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for large-file-splitter-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a6448a85de6d802d181e6e7a099e418d5a454616861f4527606dbdce856a53a |
|
MD5 | 86ed19055730e569c39b28102bf44ceb |
|
BLAKE2b-256 | 14eef5ed4ea51c89996957f7e6c0895545211e0a4d52bded3cea44be878f9838 |
Close
Hashes for large_file_splitter-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba402d7901192846fe1b266645f54543f593519412e74416b162f3d271078be3 |
|
MD5 | a0a782df9eaf78f3db1b48ae0755f527 |
|
BLAKE2b-256 | 07f2c5c4622a5dc80ef685e1f8cc888231dd41cacfbc6aba3e9cd6b4d7b5b98d |