Skip to main content

scandir, a better directory iterator and faster os.walk()

Project description

scandir on PyPI (Python Package Index) Travis CI tests (Linux) Appveyor tests (Windows)

scandir() is a directory iteration function like os.listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding unnecessary calls to os.stat() in most cases.

Now included in a Python near you!

scandir has been included in the Python 3.5 standard library as os.scandir(), and the related performance improvements to os.walk() have also been included. So if you’re lucky enough to be using Python 3.5 (release date September 13, 2015) you get the benefit immediately, otherwise just download this module from PyPI, install it with pip install scandir, and then do something like this in your code:

# Use the built-in version of scandir/walk if possible, otherwise
# use the scandir module version
try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk

PEP 471, which is the PEP that proposes including scandir in the Python standard library, was accepted in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.

This scandir module is intended to work on Python 2.7+ and Python 3.4+ (and it has been tested on those versions).

Background

Python’s built-in os.walk() is significantly slower than it needs to be, because – in addition to calling listdir() on each directory – it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X. So we’re not talking about micro-optimizations. See more benchmarks in the “Benchmarks” section below.

Somewhat relatedly, many people have also asked for a version of os.listdir() that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.

So as well as a faster walk(), scandir adds a new scandir() function. They’re pretty easy to use, but see “The API” below for the full docs.

Benchmarks

Below are results showing how many times as fast scandir.walk() is than os.walk() on various systems, found by running benchmark.py with no arguments:

System version

Python version

Times as fast

Windows 7 64-bit

2.7.7 64-bit

10.4

Windows 7 64-bit SSD

2.7.7 64-bit

10.3

Windows 7 64-bit NFS

2.7.6 64-bit

36.8

Windows 7 64-bit SSD

3.4.1 64-bit

9.9

Windows 7 64-bit SSD

3.5.0 64-bit

9.5

Ubuntu 14.04 64-bit

2.7.6 64-bit

5.8

Mac OS X 10.9.3

2.7.5 64-bit

3.8

All of the above tests were done using the fast C version of scandir (source code in _scandir.c).

Note that the gains are less than the above on smaller directories and greater on larger directories. This is why benchmark.py creates a test directory tree with a standardized size.

The API

walk()

The API for scandir.walk() is exactly the same as os.walk(), so just read the Python docs.

scandir()

The full docs for scandir() and the DirEntry objects it yields are available in the Python documentation here. But below is a brief summary as well.

scandir(path=’.’) -> iterator of DirEntry objects for given path

Like listdir, scandir calls the operating system’s directory iteration system calls to get the names of the files in the given path, but it’s different from listdir in two ways:

  • Instead of returning bare filename strings, it returns lightweight DirEntry objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.

  • It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

scandir() yields a DirEntry object for each file and sub-directory in path. Just like listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each DirEntry object has the following attributes and methods:

  • name: the entry’s filename, relative to the scandir path argument (corresponds to the return values of os.listdir)

  • path: the entry’s full path name (not necessarily an absolute path) – the equivalent of os.path.join(scandir_path, entry.name)

  • is_dir(*, follow_symlinks=True): similar to pathlib.Path.is_dir(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_file(*, follow_symlinks=True): similar to pathlib.Path.is_file(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_symlink(): similar to pathlib.Path.is_symlink(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases

  • stat(*, follow_symlinks=True): like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don’t follow symbolic links (like os.lstat()) if follow_symlinks is False

  • inode(): return the inode number of the entry; the return value is cached on the DirEntry object

Here’s a very simple example of scandir() showing use of the DirEntry.name attribute and the DirEntry.is_dir() method:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

This subdirs() function will be significantly faster with scandir than os.listdir() and os.path.isdir() on both Windows and POSIX systems, especially on medium-sized or large directories.

Further reading

  • The Python docs for scandir

  • PEP 471, the (now-accepted) Python Enhancement Proposal that proposed adding scandir to the standard library – a lot of details here, including rejected ideas and previous discussion

Flames, comments, bug reports

Please send flames, comments, and questions about scandir to Ben Hoyt:

http://benhoyt.com/

File bug reports for the version in the Python 3.5 standard library here, or file bug reports or feature requests for this module at the GitHub project page:

https://github.com/benhoyt/scandir

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scandir-1.10.0.tar.gz (33.3 kB view details)

Uploaded Source

Built Distributions

scandir-1.10.0-cp37-cp37m-win_amd64.whl (22.8 kB view details)

Uploaded CPython 3.7m Windows x86-64

scandir-1.10.0-cp37-cp37m-win32.whl (22.1 kB view details)

Uploaded CPython 3.7m Windows x86

scandir-1.10.0-cp36-cp36m-win_amd64.whl (22.8 kB view details)

Uploaded CPython 3.6m Windows x86-64

scandir-1.10.0-cp36-cp36m-win32.whl (22.1 kB view details)

Uploaded CPython 3.6m Windows x86

scandir-1.10.0-cp35-cp35m-win_amd64.whl (22.8 kB view details)

Uploaded CPython 3.5m Windows x86-64

scandir-1.10.0-cp35-cp35m-win32.whl (22.1 kB view details)

Uploaded CPython 3.5m Windows x86

scandir-1.10.0-cp34-cp34m-win_amd64.whl (20.5 kB view details)

Uploaded CPython 3.4m Windows x86-64

scandir-1.10.0-cp34-cp34m-win32.whl (20.2 kB view details)

Uploaded CPython 3.4m Windows x86

scandir-1.10.0-cp27-cp27m-win_amd64.whl (20.9 kB view details)

Uploaded CPython 2.7m Windows x86-64

scandir-1.10.0-cp27-cp27m-win32.whl (20.5 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file scandir-1.10.0.tar.gz.

File metadata

  • Download URL: scandir-1.10.0.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0.tar.gz
Algorithm Hash digest
SHA256 4d4631f6062e658e9007ab3149a9b914f3548cb38bfb021c64f39a025ce578ae
MD5 f8378f4d9f95a6a78e97ab01aa900c1d
BLAKE2b-256 dff59c052db7bd54d0cbf1bc0bb6554362bba1012d03e5888950a4f5c5dadc4e

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.10.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 b24086f2375c4a094a6b51e78b4cf7ca16c721dcee2eddd7aa6494b42d6d519d
MD5 46b00959354c0a833473e5d0c82b38d0
BLAKE2b-256 cb06cee31a831784ae66073fff7df58f823a1f6bc3947dc48a76f877b17eb52b

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp37-cp37m-win32.whl.

File metadata

  • Download URL: scandir-1.10.0-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 67f15b6f83e6507fdc6fca22fedf6ef8b334b399ca27c6b568cbfaa82a364173
MD5 5dbab653f3373e4a3b6467c508c7c817
BLAKE2b-256 9b474bf2941582c731b0c3a7aee7a3129063fe7dad5382cd662f3766dac619b2

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.10.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 7d2d7a06a252764061a020407b997dd036f7bd6a175a5ba2b345f0a357f0b3f4
MD5 50303dbc67562351c895ff31b2f6bba1
BLAKE2b-256 a0c38b8244553f4cf8682825e46d264f0bf3b8f7a51c9ba4745c8aa9182da4e5

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp36-cp36m-win32.whl.

File metadata

  • Download URL: scandir-1.10.0-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 2ae41f43797ca0c11591c0c35f2f5875fa99f8797cb1a1fd440497ec0ae4b022
MD5 5139928bafc36cb763063091697a0f14
BLAKE2b-256 25c5257e7f38127de5221a57e6afd0eb6ad0a85412c92644bf8265f20085b22a

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.10.0-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 8c5922863e44ffc00c5c693190648daa6d15e7c1207ed02d6f46a8dcc2869d32
MD5 5468e5503744da4f2064a773667cd9f5
BLAKE2b-256 b75dc0dc3933bd79fdca4780f84fffb2291672f3db8f0093c5d2ce629b7cb656

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp35-cp35m-win32.whl.

File metadata

  • Download URL: scandir-1.10.0-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 2b8e3888b11abb2217a32af0766bc06b65cc4a928d8727828ee68af5a967fa6f
MD5 695cc9235f54d915e3b79d470bb35274
BLAKE2b-256 cda1d5c3e22090ebead6d24abbe0f85641f002db44316c0a422d68d2fa258a8c

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp34-cp34m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.10.0-cp34-cp34m-win_amd64.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: CPython 3.4m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 2586c94e907d99617887daed6c1d102b5ca28f1085f90446554abf1faf73123e
MD5 b76f7c1c1513a9006b1aef0562712a26
BLAKE2b-256 773a7f6111552b4736a08a72c37d6b8852cf77c02042a3124a8806f67008ad45

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp34-cp34m-win32.whl.

File metadata

  • Download URL: scandir-1.10.0-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 2c712840c2e2ee8dfaf36034080108d30060d759c7b73a01a52251cc8989f11f
MD5 5a5261ebc3309845a08723ce7b605b06
BLAKE2b-256 0486417b435bd94a4e0176c10acea5b866a47a045848a346c6930fc846b2f016

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.10.0-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 cb925555f43060a1745d0a321cca94bcea927c50114b623d73179189a4e100ac
MD5 203f49b490e2ec2b2057f03e76985bb2
BLAKE2b-256 f9d06b7b38eaf9964510f5c32aa5aaf9f419864d2e0ebe34274e6cba5689a0c5

See more details on using hashes here.

File details

Details for the file scandir-1.10.0-cp27-cp27m-win32.whl.

File metadata

  • Download URL: scandir-1.10.0-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.10.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 92c85ac42f41ffdc35b6da57ed991575bdbe69db895507af88b9f499b701c188
MD5 ab1698106fc8a94dee536c93d51be79a
BLAKE2b-256 c68c43cc3799c79c435d1a236783993b2e04a2c750b4f91ef3630ec442490df5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page