The tool considers a file so large that it does not fit in memory as a single string and performs a split process of the string. The tool stores the result as separate files.
Project description
large_file_splitter
下の方に日本語の説明があります
Overview
- The tool considers a file so large that it does not fit in memory as a single string and performs a split process of the string. The tool stores the result as separate files.
- under construction
Usage
import large_file_splitter
# Split a large file [large_file_splitter].
large_file_splitter.split(
"dummy_large_file.txt", # File to be split
split_str = "SPLIT_MARK\r\n", # Split string (For convenience of splitting, it is processed as binary internally, so setting this to a single character is not recommended because it may lead to erroneous splitting of multi-byte characters, etc.)
div_mode = "start", # mode for handling split strings (delete: split string is not included in output; start: split string is concatenated at the beginning of the next chunk; end: split string is concatenated at the end of the previous chunk)
output_filename_frame = "./output/div_%d.txt", # Template for output filename (an integer value is automatically inserted for %d)
cache_size = 10 * 1024 * 1024 # Specify the size of the chunk of data to work with in memory (in bytes; memory capacity must be at least several times this size.)
)
概要
- メモリに乗らないほど巨大なファイルを一つの文字列とみなし、文字列のsplit処理を実施。その結果を別々のファイルとして格納するツール。
- 説明は執筆中です
使用例
import large_file_splitter
# 巨大ファイルの分割 [large_file_splitter]
large_file_splitter.split(
"dummy_large_file.txt", # 分割対象ファイル
split_str = "SPLIT_MARK\r\n", # 分割文字列 (分割の都合上内部ではbinaryとして処理するので、ここを一文字等にするのは、マルチバイト文字等の誤分割に繋がる可能性があるため非推奨)
div_mode = "start", # 分割文字列の扱いのモード (delete: 分割文字列は出力に含まない; start: 分割文字列は次の塊の先頭に結合される; end: 分割文字列は前の塊の末尾に結合される)
output_filename_frame = "./output/div_%d.txt", # 出力先ファイル名のテンプレート (%dのところは自動で整数値が挿入される)
cache_size = 10 * 1024 * 1024 # メモリで作業するデータ塊の大きさの指定 (バイト単位; メモリ容量は少なくともこの数倍は必要)
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for large-file-splitter-0.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c3f547e5490f106d0fdad51a5bd72eb73f96d867cdb0d16bd0d7ec4a8606c26 |
|
MD5 | 4b9afe83a4549d6672e85f7a9204ac63 |
|
BLAKE2b-256 | d5d84c0bcd58819a674029d422dea0678e8516bdead30060395faec0eb5f4967 |
Close
Hashes for large_file_splitter-0.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5b4bd645411f19a91a522a1ddad070e64acafae497877322ccde781432adc4d |
|
MD5 | 2170162943aba72cf46f84f8a8cd07fb |
|
BLAKE2b-256 | bff4743c5c763c255992296d099f70fd3c001d063bf10086396a6fee9670a49d |