Skip to main content

Python binding to linux syscall getdents64.

Project description

Iterate large directories efficiently with python.

About

python-getdents is a simple wrapper around Linux system call getdents64 (see man getdents for details). More details on approach.

TODO

  • Verify that implementation works on platforms other than x86_64.

Install

pip install getdents

For development

python3 -m venv env
. env/bin/activate
pip install -e .[test]

Building Wheels

pip install cibuildwheel
cibuildwheel --platform linux --output-dir wheelhouse

Run tests

ulimit -v 33554432 && py.test tests/

Or

ulimit -v 33554432 && ./setup.py test

Usage

from getdents import getdents

for inode, type, name in getdents('/tmp', 32768):
    print(name)

Advanced

import os
from getdents import *

fd = os.open('/tmp', O_GETDENTS)

for inode, type, name in getdents_raw(fd, 2**20):
    print({
            DT_BLK:     'blockdev',
            DT_CHR:     'chardev ',
            DT_DIR:     'dir     ',
            DT_FIFO:    'pipe    ',
            DT_LNK:     'symlink ',
            DT_REG:     'file    ',
            DT_SOCK:    'socket  ',
            DT_UNKNOWN: 'unknown ',
        }[type], {
            True:  'd',
            False: ' ',
        }[inode == 0],
        name,
    )

os.close(fd)

CLI

Usage

python-getdents [-h] [-b N] [-o NAME] PATH

Options

Option

Description

-b N

Buffer size (in bytes) to allocate when iterating over directory. Default is 32768, the same value used by glibc, you probably want to increase this value. Try starting with 16777216 (16 MiB). Best performance is achieved when buffer size rounds to size of the file system block.

--buffer-size N

-o NAME

Output format:

  • plain (default) Print only names.

  • csv Print as comma-separated values in order: inode, type, name.

  • csv-headers Same as csv, but print headers on the first line also.

  • json output as JSON array.

  • json-stream output each directory entry as single json object separated by newline.

--output-format NAME

Exit codes

  • 3 - Requested buffer is too large

  • 4 - PATH not found.

  • 5 - PATH is not a directory.

  • 6 - Not enough permissions to read contents of the PATH.

Examples

python-getdents /path/to/large/dir
python -m getdents /path/to/large/dir
python-getdents /path/to/large/dir -o csv -b 16777216 > dir.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

getdents-0.4.0.tar.gz (10.6 kB view hashes)

Uploaded Source

Built Distributions

getdents-0.4.0-pp310-pypy310_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

getdents-0.4.0-pp39-pypy39_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

getdents-0.4.0-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

getdents-0.4.0-cp38-abi3-musllinux_1_1_x86_64.whl (18.7 kB view hashes)

Uploaded CPython 3.8+ musllinux: musl 1.1+ x86-64

getdents-0.4.0-cp38-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.3 kB view hashes)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page