External sort algorithm implementation.
Project description
External sort algorithm implementation. External sorting is a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory (RAM) of a computer and instead must be resided in slower external memory, usually a hard disk drive. Sorting is achieved in two passes. During the first pass it sorts chunks of data that each fit in RAM, during the second pass it merges the sorted chunks together. For more information see https://en.wikipedia.org/wiki/External_sorting.
Compatibility
ext-sort requires 3.6+.
Installation
You can install ext-sort with pip:
$ pip install ext-sort
Quick start
Quick start.
import csv
import io
import logging
import ext_sort as es
class CSVSerializer(es.Serializer):
def __init__(self, writer):
super().__init__(csv.writer(io.TextIOWrapper(writer, write_through=True)))
def write(self, item):
return self._writer.writerow(item)
class CSVDeserializer(es.Deserializer):
def __init__(self, reader):
super().__init__(csv.reader(io.TextIOWrapper(reader)))
def read(self):
return next(self._reader)
logging.basicConfig(
level=logging.DEBUG,
format='[%(levelname)-8s] %(asctime)-15s (%(name)s): %(message)s',
)
with open('/home/user/data.csv', 'rb') as unsorted_file, open('/home/user/data.sorted.csv', 'wb') as sorted_file:
# save the csv header
sorted_file.write(unsorted_file.readline())
es.sort(
reader=unsorted_file,
writer=sorted_file,
chunk_size=10_000_000,
Serializer=CSVSerializer,
Deserializer=CSVDeserializer,
workers_cnt=4,
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ext-sort-0.2.0.tar.gz
.
File metadata
- Download URL: ext-sort-0.2.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b877656b475c22a23decc2e1cdec073be9f7bf142259021f6df3ec20e5e56738 |
|
MD5 | 46987d0763d30171289f58379f962919 |
|
BLAKE2b-256 | da148bc09c1b4fba79885af0768510b89398d866866ede8717e03a9f04b230f0 |