Cli util to sort large files
Project description
Cli util to sort large file.
Installation
$ python3 -m pip install fsort
Usage
Create file
fsort.py create-file --filename largefile.txt
Sort file
fsort.py external-sort --filename largefile.txt --size 20
Result would be output.txt sorted file
Also you will notice files chunk_* to be aware of how a large file was splitted into small ones.
Source code
import string
from contextlib import ExitStack
from heapq import merge
from itertools import count, islice
from random import choice, randint
import click
chunk_names = []
@click.group()
def cli():
pass
@cli.command()
@click.option('--filename', help='File to sort')
@click.option('--size', default=50000, help='Size of each chunk')
def external_sort(filename, size):
"""
Sort file large file by chunks storing these chunks
into separate files with the given size – 50K by default.
Result would be `output.txt` file with a sorted text.
"""
with open(filename) as f:
for c in count(1):
sorted_chunk = sorted(islice(f, size))
if not sorted_chunk:
break
chunk_name = f'chunk_{c}.txt'
chunk_names.append(chunk_name)
with open(chunk_name, 'w') as chunk_file:
chunk_file.writelines(sorted_chunk)
with ExitStack() as stack, open('output.txt', 'w') as of:
files = (
stack.enter_context(open(chunk))
for chunk
in chunk_names
)
of.writelines(merge(*files))
def generate_text(length=None):
word_length = randint(8, length or 45)
return ''.join(choice(string.printable) for i in range(length))
@cli.command()
@click.option('--filename', default='large_file.txt', help="File's name")
@click.option('--lines', default=100, help='Rows in a file')
@click.option('--line-length', default=45, help="Max line length")
def create_file(filename, lines, line_length):
with open(filename, 'w') as f:
for i in range(lines):
f.write(f'{generate_text(line_length)}\n')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fsort-0.1.2.tar.gz
(1.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
fsort-0.1.2-py3-none-any.whl
(2.8 kB
view details)
File details
Details for the file fsort-0.1.2.tar.gz.
File metadata
- Download URL: fsort-0.1.2.tar.gz
- Upload date:
- Size: 1.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.24.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a626287fc976ebcdda542a9eb63dc2f3977eabd178dc5f1b1c5173af6b28465
|
|
| MD5 |
c4defd79c2d3dba5e29d73ca6b0ad33a
|
|
| BLAKE2b-256 |
b129acf976778495ef48acd2c1b8158cad974ed7677e8e1cd50ef89bb615d6a6
|
File details
Details for the file fsort-0.1.2-py3-none-any.whl.
File metadata
- Download URL: fsort-0.1.2-py3-none-any.whl
- Upload date:
- Size: 2.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.24.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
888ff5ccbdee437d6bcbe03c2b3fe56393265914d73e6484519fa4af50f4ab81
|
|
| MD5 |
e0a35cb0a707cf02a1cff69369eaa661
|
|
| BLAKE2b-256 |
ae41a55571c5dc09ca69258858a817d671e9294ac6462f265244d513b25bb0f4
|