Skip to main content

Python binding to linux syscall getdents64.

Project description

Iterate large directories efficiently with python.

About

python-getdents is a simple wrapper around Linux system call getdents64 (see man getdents for details). More details on approach.

TODO

  • Verify that implementation works on platforms other than x86_64.

Install

pip install getdents

For development

python3 -m venv env
. env/bin/activate
pip install -e .[test]

Run tests

ulimit -v 33554432 && py.test tests/

Or

ulimit -v 33554432 && ./setup.py test

Usage

from getdents import getdents

for inode, type, name in getdents('/tmp', 32768):
    print(name)

Advanced

import os
from getdents import *

fd = os.open('/tmp', O_GETDENTS)

for inode, type, name in getdents_raw(fd, 2**20):
    print({
            DT_BLK:     'blockdev',
            DT_CHR:     'chardev ',
            DT_DIR:     'dir     ',
            DT_FIFO:    'pipe    ',
            DT_LNK:     'symlink ',
            DT_REG:     'file    ',
            DT_SOCK:    'socket  ',
            DT_UNKNOWN: 'unknown ',
        }[type], {
            True:  'd',
            False: ' ',
        }[inode == 0],
        name,
    )

os.close(fd)

CLI

Usage

python-getdents [-h] [-b N] [-o NAME] PATH

Options

Option

Description

-b N

Buffer size (in bytes) to allocate when iterating over directory. Default is 32768, the same value used by glibc, you probably want to increase this value. Try starting with 16777216 (16 MiB). Best performance is achieved when buffer size rounds to size of the file system block.

--buffer-size N

-o NAME

Output format:

  • plain (default) Print only names.

  • csv Print as comma-separated values in order: inode, type, name.

  • csv-headers Same as csv, but print headers on the first line also.

  • json output as JSON array.

  • json-stream output each directory entry as single json object separated by newline.

--output-format NAME

Exit codes

  • 3 - Requested buffer is too large

  • 4 - PATH not found.

  • 5 - PATH is not a directory.

  • 6 - Not enough permissions to read contents of the PATH.

Examples

python-getdents /path/to/large/dir
python -m getdents /path/to/large/dir
python-getdents /path/to/large/dir -o csv -b 16777216 > dir.csv

Project details


Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page