Skip to main content
Help the Python Software Foundation raise $60,000 USD by December 31st!  Building the PSF Q4 Fundraiser

A fast directory scanner.

Project description

scandir-rs

scandir_rs is a directory iteration module like os.walk(), but with more features and higher speed. Depending on the function call it yields a list of paths, tuple of lists grouped by their entry type or DirEntry objects that include file type and stat information along with the name. Using scandir_rs is about 2-17 times faster than os.walk() (depending on the platform, file system and file tree structure) by parallelizing the iteration in background.

If you are just interested in directory statistics you can use the submodule count.

scandir_rs contains following submodules:

  • count for determining statistics of a directory.
  • walk for getting names of directory entries.
  • scandir for getting detailed stats of directory entries.

For the API see:

Installation

For building this wheel from source you need Rust with channel nightly and the tool maturin.

Switch to channel nightly:

rustup default nightly

Install maturin:

cargo install maturin

Build wheel (not on Windows):

maturin build --release --strip

Build wheel on Windows:

maturin build --release --strip --no-sdist

maturin will build the wheels for all Python versions installed on your system.

Building and running tests for different Python versions

To make it easier to build wheels for several different Python versions the script build_wheels.sh has been added. It creates wheels for Python versions 3.6, 3.7, 3.8 and 3.9. In addition it runs pytest after successfull creation of each wheel.

To be able to run the script pyenv needs to be installed first including all Python interpreter versions mentioned above.

Instruction how to install pyenv can be found here.

Examples

Get statistics of a directory:

import scandir_rs as scandir

print(scandir.count.count("~/workspace", extended=True))

The same, but asynchronously in background using a class instance:

import scandir_rs as scandir

scanner = scandir.count.Count("~/workspace", extended=True))
scanner.start())  # Start background thread pool
...
value = scanner.statistics  # Can be read at any time
...
scanner.stop()  # If you want to cancel the scanner

and with a context manager:

import scandir_rs as scandir

C = scandir.count.Count("~/workspace", extended=True))
with C:
    while C.busy():
        statistics = C.statistics
        # Do something

os.walk() example:

import scandir_rs as scandir

for root, dirs, files in scandir.walk.Walk("~/workspace"):
    # Do something

with extended data:

import scandir_rs as scandir

for root, dirs, files, symlinks, other, errors in scandir.walk.Walk("~/workspace",
        return_type=scandir.RETURN_TYPE_EXT):
    # Do something

os.scandir() example:

import scandir_rs as scandir

for path, entry in scandir.scandir.Scandir("~/workspace",
        return_type=scandir.RETURN_TYPE_EXT):
    # entry is a custom DirEntry object

Benchmarks

See examples/benchmark.py

In the below table the line scandir_rs.walk.Walk returns comparable results to os.walk.

Linux with Ryzen 5 2400G and SSD

Directory /usr with

  • 83790 directories
  • 671847 files
  • 48480 symlinks
  • 1278 hardlinks
  • 0 devices
  • 0 pipes
  • 30.3GB size and 31.9GB usage on disk
Time [s] Method
5.319 os.walk (Python 3.8)
13.351 os.walk+os.stat (Python 3.8)
0.918 scandir_rs.count.count
1.340 scandir_rs.count.count(extended=True)
0.812 scandir_rs.count.Count
1.663 scandir_rs.walk.toc
1.107 scandir_rs.walk.Walk (iter)
1.775 scandir_rs.walk.Walk (collect)
2.511 scandir_rs.scandir.entries (RETURN_TYPE_FAST)
2.561 scandir_rs.scandir.entries (RETURN_TYPE_BASE)
2.496 scandir_rs.scandir.entries (RETURN_TYPE_EXT)
2.881 scandir_rs.scandir.entries (RETURN_TYPE_FULL)
2.437 scandir_rs.scandir.entries (iter, RETURN_TYPE_FULL)

Directory linux-5.5.5 with

  • 4391 directories
  • 66459 files
  • 35 symlinks
  • 13 hardlinks
  • 0 devices
  • 0 pipes
  • 870.7MB size and 1021.5MB usage on disk
Time [s] Method
0.343 os.walk (Python 3.8)
0.966 os.walk+os.stat (Python 3.8)
0.067 scandir_rs.count.count
0.116 scandir_rs.count.count(extended=True)
0.067 scandir_rs.count.Count
0.155 scandir_rs.walk.toc
0.081 scandir_rs.walk.Walk (iter)
0.150 scandir_rs.walk.Walk (collect)
0.186 scandir_rs.scandir.entries (RETURN_TYPE_FAST)
0.201 scandir_rs.scandir.entries (RETURN_TYPE_BASE)
0.202 scandir_rs.scandir.entries (RETURN_TYPE_EXT)
0.260 scandir_rs.scandir.entries (RETURN_TYPE_FULL)
0.210 scandir_rs.scandir.entries (iter, RETURN_TYPE_FULL)

Up to ~5 times faster on Linux.

Windows 10 with Laptop Core i7-4810MQ @ 2.8GHz Laptop, MTF SSD

Directory C:\Windows with

  • 130429 directories
  • 426588 files
  • 0 symlinks
  • 53563 hardlinks
  • 0 devices
  • 0 pipes
  • 49.8GB size and 50.9GB usage on disk
Time [s] Method
96.544 os.walk (Python 3.8)
328.965 os.walk+os.stat (Python 3.8)
17.133 scandir_rs.count.count
90.272 scandir_rs.count.count(extended=True)
19.607 scandir_rs.count.Count
19.654 scandir_rs.walk.toc
18.203 scandir_rs.walk.Walk (iter)
19.704 scandir_rs.walk.Walk (collect)
88.183 scandir_rs.scandir.entries (RETURN_TYPE_FAST)
90.077 scandir_rs.scandir.entries (RETURN_TYPE_BASE)
90.704 scandir_rs.scandir.entries (RETURN_TYPE_EXT)
93.704 scandir_rs.scandir.entries (RETURN_TYPE_FULL)
90.340 scandir_rs.scandir.entries (iter, RETURN_TYPE_FULL)

Directory linux-5.5.5 with

  • 4391 directories
  • 66459 files
  • 35 symlinks
  • 13 hardlinks
  • 0 devices
  • 0 pipes
  • 870.7MB size and 1021.5MB usage on disk
Time [s] Method
0.343 os.walk (Python 3.8)
0.966 os.walk+os.stat (Python 3.8)
0.067 scandir_rs.count.count
0.116 scandir_rs.count.count(extended=True)
0.067 scandir_rs.count.Count
0.155 scandir_rs.walk.toc
0.081 scandir_rs.walk.Walk (iter)
0.150 scandir_rs.walk.Walk (collect)
0.186 scandir_rs.scandir.entries (RETURN_TYPE_FAST)
0.201 scandir_rs.scandir.entries (RETURN_TYPE_BASE)
0.202 scandir_rs.scandir.entries (RETURN_TYPE_EXT)
0.260 scandir_rs.scandir.entries (RETURN_TYPE_FULL)
0.210 scandir_rs.scandir.entries (iter, RETURN_TYPE_FULL)

Up to 6.7 times faster on Windows 10.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scandir-rs, version 0.9.3
Filename, size File type Python version Upload date Hashes
Filename, size scandir_rs-0.9.3-cp36-cp36m-manylinux1_x86_64.whl (339.3 kB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size scandir_rs-0.9.3-cp37-cp37m-manylinux1_x86_64.whl (339.1 kB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size scandir_rs-0.9.3-cp37-none-win_amd64.whl (297.0 kB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size scandir_rs-0.9.3-cp38-cp38-manylinux1_x86_64.whl (339.1 kB) File type Wheel Python version cp38 Upload date Hashes View
Filename, size scandir_rs-0.9.3-cp38-none-win_amd64.whl (297.0 kB) File type Wheel Python version cp38 Upload date Hashes View
Filename, size scandir_rs-0.9.3-cp39-cp39-manylinux1_x86_64.whl (339.1 kB) File type Wheel Python version cp39 Upload date Hashes View
Filename, size scandir_rs-0.9.3.tar.gz (17.4 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page