Skip to main content

Simple module for splitting binary files into multiple chunks/parts and viceversa (from chunks/parts to original file)

Project description

Simple module for splitting files into multiple chunks/parts and viceversa (from chunks/parts to original file).

I made splitnjoin for 3 reasons: 1. Speed-up uploading sessions (it is better to upload small, multiple files instead of a larger one; in case of network failure some parts of file are already online) 2. Surpass ISP not-nice upload limitations about filesizes 3. End the laziness of a boring sunday

Performance of splitting/joining phases can vary greatly depending on hardware configuration (especially the HDD speed).

For instance, let’s try to split a Virtual Box virtual machine sized 8.5+ GB (.vdi): - A system equipped with AMD Ryzen 7, 16 GB DDR4 and an SSD.MD can split the VM in 34 chunks of 250MB each one, in less than 20 seconds - An older notebook (i3, 8GB DDR3, and 5400 RPM HDD) requires 4+ minutes to split it with the same parameters

To read benchmark and performance tests, read sections below (“Perfomance tests”).

Important: don’t use splitnjoin in production enviroments, of course.

Requirements

A default Python3 installation. It works on every Linux distro and every Windows version.

About hardware requirements: splitting and joining huge files are CPU/RAM intensive tasks and ‘splitnjoin’ is currently in its early days so don’t expect big updates regarding resource optmization soon (I am working on it, that’s for sure).

To put it simple: if you have a system with a fairly capable CPU and 4/8 GB RAM you shouldn’t have any problem splitting huge files (for example, 8+ GB on hard disk).

Installation

Install using pip

pip3 install splitnjoin

Examples

Splitting by chunk size example

import splitnjoin as snj
import os
import sys

fsplitter = snj.FileProcessor()

#Set size of each chunk, for example: 25 mb
p_size = 25

#File to split and subdir where to save chunks
from_file = "myFile.ext"
to_dir = "splitting_dir"

absfrom, absto = map(os.path.abspath, [from_file, to_dir])
print('Splitting', absfrom, 'to', absto, 'by', p_size, 'mb...')
#Split now
fsplitter.split_file(from_file, p_size, to_dir)

Splitting by parts example

import splitnjoin as snj
import os
import sys

fsplitter = snj.FileProcessor()

#Set the number of parts you want, for example: 10
p_num = 10

#File to split and subdir where to save parts
from_file = "myFile.ext"
to_dir = "splitting_dir"

absfrom, absto = map(os.path.abspath, [from_file, to_dir])
print('Splitting', absfrom, 'to', absto, 'in', p_num, 'parts...')
#Split now
fsplitter.split_file_by_parts(from_file, p_num, to_dir)

Joining example

import splitnjoin as snj
import os
import sys

fjoiner = snj.FileProcessor()

#Set the size-value for reading chunks, for example: 25 mb
readsize = 25

#Set chunks dir and dest filename
from_dir = "splitting_dir"
to_file = "joined_myFile.ext"

absfrom, absto = map(os.path.abspath, [from_dir, to_file])
print('Joining', absfrom, 'to', absto, 'by', readsize)
#Join now
fjoiner.join_file(from_dir, readsize, to_file)

Performance tests

I made a simple testing and benchmarking tool (splitting a binary file into chunks of 250MB each one).

Run it like this: python3 -m splitnjoin.snj_benchmark.py.

On my notebook (Intel i3 dual core, 8 GB RAM, 500 GB 5400 RPM disk, Linux Mint 18.3) this is the output:

[+] Generating fake binary file of 1 GB...
[+] Please, wait...
[+] fake_data.bin written.
[+] Writing time:  13.388530897998862

[+] Splitting /home/sergio/Scrivania/splitnjoin/fake_data.bin to /home/sergio/Scrivania/splitnjoin/test by 250 mb...
[+] Please, wait...
[+] Splitting time:  12.705547745999866

[+] Joining /home/sergio/Scrivania/splitnjoin/test to /home/sergio/Scrivania/splitnjoin/joined_fake_data.bin by 250 mb...
[+] Please, wait...
[+] Joining time:  15.447953824999786

[+] Calculating md5 hash for both files...
[+] Please wait...
[+] md5: 98a1c12f80bc9344846e75dc3b406611 for fake_data.bin
[+] md5: 98a1c12f80bc9344846e75dc3b406611 for joined_fake_data.bin
[+] Hashing time:  7.4639659309996205

[+] Integrity Check OK, the files are identical.

[+] Removing test files...
[+] fake_data.bin  removed.
[+] joined_fake_data.bin  removed.
[+] Removing test dir...
[+] Test directory removed.

TO-DO:

  • [STRIKEOUT:Improve splitting and joining methods to speedup the entire process] (moved to splitnjoiny project)

  • [STRIKEOUT:Use multiprocess module to improve performance (if possibile, i’m looking at you, I/O interface)] (moved to splitnjoiny project)

  • Using the module for write a basic CLI application and…

  • …Cross-compile this CLI application for Linux/macOS/Windows (multiplatform-binary)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splitnjoin-0.76.tar.gz (5.6 kB view details)

Uploaded Source

File details

Details for the file splitnjoin-0.76.tar.gz.

File metadata

  • Download URL: splitnjoin-0.76.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for splitnjoin-0.76.tar.gz
Algorithm Hash digest
SHA256 2a3e32144bf6df251273fd2680345165ad05fe8999c2df9502cb8c038da8a48e
MD5 c9464004131f9ca36d2de39031ee519f
BLAKE2b-256 4085e83a8db63dbb52272d69831c4ef796d02a6f5c473e85c117a0bb4981db93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page