Skip to main content

External sort algorithm implementation.

Project description

Build status License Supported Python versions

External sort algorithm implementation. External sorting is a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory (RAM) of a computer and instead must be resided in slower external memory, usually a hard disk drive. Sorting is achieved in two passes. During the first pass it sorts chunks of data that each fit in RAM, during the second pass it merges the sorted chunks together. For more information see https://en.wikipedia.org/wiki/External_sorting.

Compatibility

ext-sort requires 3.6+.

Installation

You can install ext-sort with pip:

$ pip install ext-sort

Quick start

Quick start.

import csv
import io
import logging

import ext_sort as es


class CSVSerializer(es.Serializer):

    def __init__(self, writer):
        super().__init__(csv.writer(io.TextIOWrapper(writer, write_through=True)))

    def write(self, item):
        return self._writer.writerow(item)


class CSVDeserializer(es.Deserializer):

    def __init__(self, reader):
        super().__init__(csv.reader(io.TextIOWrapper(reader)))

    def read(self):
        return next(self._reader)


logging.basicConfig(
    level=logging.DEBUG,
    format='[%(levelname)-8s] %(asctime)-15s (%(name)s): %(message)s',
)

with open('/home/user/data.csv', 'rb') as unsorted_file, open('/home/user/data.sorted.csv', 'wb') as sorted_file:
    # save the csv header
    sorted_file.write(unsorted_file.readline())

    es.sort(
        reader=unsorted_file,
        writer=sorted_file,
        chunk_size=10_000_000,
        Serializer=CSVSerializer,
        Deserializer=CSVDeserializer,
        workers_cnt=4,
    )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ext-sort-0.2.0.tar.gz (5.3 kB view details)

Uploaded Source

File details

Details for the file ext-sort-0.2.0.tar.gz.

File metadata

  • Download URL: ext-sort-0.2.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for ext-sort-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b877656b475c22a23decc2e1cdec073be9f7bf142259021f6df3ec20e5e56738
MD5 46987d0763d30171289f58379f962919
BLAKE2b-256 da148bc09c1b4fba79885af0768510b89398d866866ede8717e03a9f04b230f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page