Skip to main content

Write to an indexed sequence of files using the standard Python file API

Project description

## File sequences

A `FileSequence` allows you writing to multiple files using standard Python file descriptor read / write functionality.

You specify the file size limit and naming scheme when opening the sequence, but the library handles creating new files as needed.

Each call to the opened FileSequence's `write` function will potentially create a new file, if the chunk you want to write will push the file over the limit. So if you want to only split files on newlines, call `write()` once for each line. If you want behavior more like BSD's `split` command, you can write one byte at a time. Though, at that rate, `split` is probably the better choice.

### Installation

```sh
pip install filesequence
```

### API

You can simply use a FileSequence object as if it were a file.

* `filesequence.open` returns a FileSequence object.
* `FileSequence.write(line)` takes a line and writes it to the next available file.

Note that FileSequence requires `with` wrapping, as opposed to the Python built-in `open()`:

```python```
import filesequence

filenames = filesequence.interpolator('numbers-%02d.txt', xrange(1000))

with filesequence.open(filenames, 1000000) as out:
for a in xrange(1000):
for b in xrange(1000):
out.write('# %d * %d = %d\n' % (a, b, a * b))
```

Now you have a huge multiplication table in 20 different files that are 1MB or less! So awesome!

Want to keep going?

```python
filenames = filesequence.interpolator('numbers-%02d.txt', xrange(1000))

with filesequence.open(filenames, 1000000, 'a') as out:
...
```

The 'a' flag will make the sequence jump to the last existing file, and start writing from there.

### Bonus

A `filesequence` script will be installed to your `PATH`. This script reads STDIN line by line and command line arguments for the filename `pattern` and filesize limit (see `filesequence --help`), and writes out a series of files of at most that filesize and without breaking any lines.

* `filesequence --help`

```
usage: filesequence [-h] [--limit LIMIT] [--pattern PATTERN]

Split STDIN into a sequence of files

optional arguments:
-h, --help show this help message and exit
--limit LIMIT Maximum bytes per file (default: 50000000)
--pattern PATTERN Filename string pattern: generate filenames in sequence
by interpolating `pattern % indices.next()` (default: file.%02d)
```

## TODO

* Support reading.

## License

Copyright © 2013 Christopher Brown. [MIT Licensed](LICENSE).

Project details


Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page