filesequence

Write to an indexed sequence of files using the standard Python file API

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Programming Language
- Python
- Python :: 2.7
Topic
- System :: Filesystems
- Text Processing :: General

Project description

File sequences

A FileSequence allows you to write to multiple files using standard Python file descriptor read / write functionality.

You specify the file size limit and naming scheme when opening the sequence, but the library handles creating new files as needed.

Each call to the opened FileSequence’s write function will potentially create a new file, if the chunk you want to write will push the file over the limit. So if you want to only split files on newlines, call write() once for each line. If you want behavior more like BSD’s split command, you can write one byte at a time. Though, at that rate, split is probably the better choice.

Installation

pip install filesequence

API

You can simply use a FileSequence object as if it were a file.

filesequence.open(...) returns a FileSequence object.
my_file_sequence.write(line) takes a line and writes it to the next available file.

Note that FileSequence requires with wrapping, as opposed to the Python built-in open():

import filesequence

filenames = filesequence.interpolator('numbers-%02d.txt', xrange(1000))

with filesequence.open(filenames, 1000000) as out:
    for a in xrange(1000):
        for b in xrange(1000):
            out.write('# %d * %d = %d\n' % (a, b, a * b))

Now you have a huge multiplication table in 20 different files that are 1MB or less! So awesome!

Want to keep going?

filenames = filesequence.interpolator('numbers-%02d.txt', xrange(1000))

with filesequence.open(filenames, 1000000, 'a') as out:
    ...

The ‘a’ flag will make the sequence jump to the last existing file, and start writing from there.

Bonus

A filesequence script will be installed to your PATH. This script reads STDIN line by line and command line arguments for the filename pattern and filesize limit (see filesequence --help), and writes out a series of files of at most that filesize and without breaking any lines.

$ filesequence --help

usage: cli.py [-h] [--limit LIMIT] [--pattern PATTERN] [--version]

Write STDIN into a sequence of files, splitting only at newlines

optional arguments:
  -h, --help         show this help message and exit
  --limit LIMIT      Maximum bytes per file (default: 50000000)
  --pattern PATTERN  Filename string pattern: generate filenames in sequence
                     by interpolating `pattern % indices.next()`
                     (default: split.%02d)
  --version          show program's version number and exit

TODO

Support reading (flags r and r+).

Development

This package is published to PyPI at pypi.python.org/pypi/filesequence.

Typical publish process:

pandoc README.md -o README.rst
If needed, git commit ...
npm version patch
git push
python setup.py register sdist upload

Testing

Continuous integration:

Or run tests locally (after installing):

nosetests

License

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Programming Language
- Python
- Python :: 2.7
Topic
- System :: Filesystems
- Text Processing :: General

Release history Release notifications | RSS feed

This version

0.1.12

Sep 14, 2013

0.1.11

Sep 13, 2013

0.1.10

Aug 26, 2013

0.1.9

Aug 16, 2013

0.1.8

Aug 16, 2013

0.1.7

Aug 16, 2013

0.1.6

Aug 16, 2013

0.1.5

Aug 1, 2013

0.1.4

Aug 1, 2013

0.1.3

Aug 1, 2013

0.1.2

Aug 1, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filesequence-0.1.12.tar.gz (6.4 kB view details)

Uploaded Sep 14, 2013 Source

File details

Details for the file filesequence-0.1.12.tar.gz.

File metadata

Download URL: filesequence-0.1.12.tar.gz
Upload date: Sep 14, 2013
Size: 6.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for filesequence-0.1.12.tar.gz
Algorithm	Hash digest
SHA256	`03615195414bee5cf225375939589de8e52307d2733bfa1a4a3095cbe67c34d9`
MD5	`8635b7b388002eb2c97745e0ecb79577`
BLAKE2b-256	`1f2eedafc6893b1e9d13d2f342d00df2c98f15338351ae0f83e60409154e1dcd`

See more details on using hashes here.

filesequence 0.1.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

File sequences

Installation

API

Bonus

TODO

Development

Testing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes