filesequence·PyPI

Write to an indexed sequence of files using the standard Python file API

Project description

## File sequences

A `FileSequence` allows you writing to multiple files using standard Python file descriptor read / write functionality.

You specify the file size limit and naming scheme when opening the sequence, but the library handles creating new files as needed.

Each call to the opened FileSequence's `write` function will potentially create a new file, if the chunk you want to write will push the file over the limit. So if you want to only split files on newlines, call `write()` once for each line. If you want behavior more like BSD's `split` command, you can write one byte at a time. Though, at that rate, `split` is probably the better choice.

### Installation

```sh
pip install filesequence
```

### API

You can simply use a FileSequence object as if it were a file.

* `filesequence.open` returns a FileSequence object.
* `FileSequence.write(line)` takes a line and writes it to the next available file.

Note that FileSequence requires `with` wrapping, as opposed to the Python built-in `open()`:

```python```
import filesequence

filenames = filesequence.interpolator('numbers-%02d.txt', xrange(1000))

with filesequence.open(filenames, 1000000) as out:
for a in xrange(1000):
for b in xrange(1000):
out.write('# %d * %d = %d\n' % (a, b, a * b))
```

Now you have a huge multiplication table in 20 different files that are 1MB or less! So awesome!

Want to keep going?

```python
filenames = filesequence.interpolator('numbers-%02d.txt', xrange(1000))

with filesequence.open(filenames, 1000000, 'a') as out:
...
```

The 'a' flag will make the sequence jump to the last existing file, and start writing from there.

### Bonus

A `filesequence` script will be installed to your `PATH`. This script reads STDIN line by line and command line arguments for the filename `pattern` and filesize limit (see `filesequence --help`), and writes out a series of files of at most that filesize and without breaking any lines.

* `filesequence --help`

```
usage: filesequence [-h] [--limit LIMIT] [--pattern PATTERN]

Split STDIN into a sequence of files

optional arguments:
-h, --help show this help message and exit
--limit LIMIT Maximum bytes per file (default: 50000000)
--pattern PATTERN Filename string pattern: generate filenames in sequence
by interpolating `pattern % indices.next()` (default: file.%02d)
```

## TODO

* Support reading.

## License

Copyright © 2013 Christopher Brown. [MIT Licensed](LICENSE).

Project details

Release history Release notifications | RSS feed

0.1.12

Sep 14, 2013

0.1.11

Sep 13, 2013

0.1.10

Aug 26, 2013

0.1.9

Aug 16, 2013

0.1.8

Aug 16, 2013

0.1.7

Aug 16, 2013

0.1.6

Aug 16, 2013

0.1.5

Aug 1, 2013

0.1.4

Aug 1, 2013

0.1.3

Aug 1, 2013

This version

0.1.2

Aug 1, 2013

filesequence 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed