Skip to main content

scandir, a better directory iterator and faster os.walk()

Project description

scandir on PyPI (Python Package Index) Travis CI tests (Linux) Appveyor tests (Windows)

scandir() is a directory iteration function like os.listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding unnecessary calls to os.stat() in most cases.

Now included in a Python near you!

scandir has been included in the Python 3.5 standard library as os.scandir(), and the related performance improvements to os.walk() have also been included. So if you’re lucky enough to be using Python 3.5 (release date September 13, 2015) you get the benefit immediately, otherwise just download this module from PyPI, install it with pip install scandir, and then do something like this in your code:

# Use the built-in version of scandir/walk if possible, otherwise
# use the scandir module version
try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk

PEP 471, which is the PEP that proposes including scandir in the Python standard library, was accepted in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.

This scandir module is intended to work on Python 2.6+ and Python 3.2+ (and it has been tested on those versions).

Background

Python’s built-in os.walk() is significantly slower than it needs to be, because – in addition to calling listdir() on each directory – it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X. So we’re not talking about micro-optimizations. See more benchmarks in the “Benchmarks” section below.

Somewhat relatedly, many people have also asked for a version of os.listdir() that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.

So as well as a faster walk(), scandir adds a new scandir() function. They’re pretty easy to use, but see “The API” below for the full docs.

Benchmarks

Below are results showing how many times as fast scandir.walk() is than os.walk() on various systems, found by running benchmark.py with no arguments:

System version

Python version

Times as fast

Windows 7 64-bit

2.7.7 64-bit

10.4

Windows 7 64-bit SSD

2.7.7 64-bit

10.3

Windows 7 64-bit NFS

2.7.6 64-bit

36.8

Windows 7 64-bit SSD

3.4.1 64-bit

9.9

Windows 7 64-bit SSD

3.5.0 64-bit

9.5

CentOS 6.2 64-bit

2.6.6 64-bit

3.9

Ubuntu 14.04 64-bit

2.7.6 64-bit

5.8

Mac OS X 10.9.3

2.7.5 64-bit

3.8

All of the above tests were done using the fast C version of scandir (source code in _scandir.c).

Note that the gains are less than the above on smaller directories and greater on larger directories. This is why benchmark.py creates a test directory tree with a standardized size.

The API

walk()

The API for scandir.walk() is exactly the same as os.walk(), so just read the Python docs.

scandir()

The full docs for scandir() and the DirEntry objects it yields are available in the Python documentation here. But below is a brief summary as well.

scandir(path=’.’) -> iterator of DirEntry objects for given path

Like listdir, scandir calls the operating system’s directory iteration system calls to get the names of the files in the given path, but it’s different from listdir in two ways:

  • Instead of returning bare filename strings, it returns lightweight DirEntry objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.

  • It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

scandir() yields a DirEntry object for each file and sub-directory in path. Just like listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each DirEntry object has the following attributes and methods:

  • name: the entry’s filename, relative to the scandir path argument (corresponds to the return values of os.listdir)

  • path: the entry’s full path name (not necessarily an absolute path) – the equivalent of os.path.join(scandir_path, entry.name)

  • is_dir(*, follow_symlinks=True): similar to pathlib.Path.is_dir(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_file(*, follow_symlinks=True): similar to pathlib.Path.is_file(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_symlink(): similar to pathlib.Path.is_symlink(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases

  • stat(*, follow_symlinks=True): like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don’t follow symbolic links (like os.lstat()) if follow_symlinks is False

  • inode(): return the inode number of the entry; the return value is cached on the DirEntry object

Here’s a very simple example of scandir() showing use of the DirEntry.name attribute and the DirEntry.is_dir() method:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

This subdirs() function will be significantly faster with scandir than os.listdir() and os.path.isdir() on both Windows and POSIX systems, especially on medium-sized or large directories.

Further reading

  • The Python docs for scandir

  • PEP 471, the (now-accepted) Python Enhancement Proposal that proposed adding scandir to the standard library – a lot of details here, including rejected ideas and previous discussion

Flames, comments, bug reports

Please send flames, comments, and questions about scandir to Ben Hoyt:

http://benhoyt.com/

File bug reports for the version in the Python 3.5 standard library here, or file bug reports or feature requests for this module at the GitHub project page:

https://github.com/benhoyt/scandir

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scandir-1.8.tar.gz (33.1 kB view details)

Uploaded Source

Built Distributions

scandir-1.8-cp37-cp37m-win_amd64.whl (21.8 kB view details)

Uploaded CPython 3.7mWindows x86-64

scandir-1.8-cp37-cp37m-win32.whl (21.1 kB view details)

Uploaded CPython 3.7mWindows x86

scandir-1.8-cp36-cp36m-win_amd64.whl (21.8 kB view details)

Uploaded CPython 3.6mWindows x86-64

scandir-1.8-cp36-cp36m-win32.whl (21.1 kB view details)

Uploaded CPython 3.6mWindows x86

scandir-1.8-cp35-cp35m-win_amd64.whl (21.8 kB view details)

Uploaded CPython 3.5mWindows x86-64

scandir-1.8-cp35-cp35m-win32.whl (21.1 kB view details)

Uploaded CPython 3.5mWindows x86

scandir-1.8-cp34-cp34m-win_amd64.whl (19.5 kB view details)

Uploaded CPython 3.4mWindows x86-64

scandir-1.8-cp34-cp34m-win32.whl (19.1 kB view details)

Uploaded CPython 3.4mWindows x86

scandir-1.8-cp27-cp27m-win_amd64.whl (19.9 kB view details)

Uploaded CPython 2.7mWindows x86-64

scandir-1.8-cp27-cp27m-win32.whl (19.4 kB view details)

Uploaded CPython 2.7mWindows x86

File details

Details for the file scandir-1.8.tar.gz.

File metadata

  • Download URL: scandir-1.8.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8.tar.gz
Algorithm Hash digest
SHA256 8d5011d3a99042c4d90e8adda0052d4475aae3d57bb927012267a6c59186d870
MD5 5a2daeb6283319842c4a8d213df008e7
BLAKE2b-256 50a4141939a8d213b2cf1b1d6b2704c6f6c07d4f1903df532af887c8e64e6815

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.8-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 0f0059d907817cd3c07f1b658611aabd1af0a4bdc4bb7b211dfd8962d5bd46ba
MD5 421f171d2bc7490e8c8cf3c827b447bd
BLAKE2b-256 01a0cc973d20bfe083072f2d92dcc9e2c58574024e4292492ad9827b835c71c1

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp37-cp37m-win32.whl.

File metadata

  • Download URL: scandir-1.8-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 b009e15a3d73376a84f8d8fad9b5ab6d9f96cb7606bdb867a4c882f10508e57e
MD5 19a94ee400c6051583e7de2a0f4596ec
BLAKE2b-256 ef4f49767cffa4447f2512b9e74b89994683fc60b75058b3b32577e3024cb310

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.8-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 01babbf0fea42a135f6e24747cac63225399d97a67a9a7eedc1f0510b63122db
MD5 de2437f9263839d0bc24957d951fd035
BLAKE2b-256 03eb73497595944c7c16b9504ef9bd9242666eb1256af43da83f013dc01aa099

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp36-cp36m-win32.whl.

File metadata

  • Download URL: scandir-1.8-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 49345923704d611458335872925802620fcf895e1c67074dd8ea715e579f2581
MD5 42182b67fbf05100d4e48db968a62291
BLAKE2b-256 1e46e31bfe000217aacf7fee372e37763706580bc5f26f3cab36171482f3f194

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.8-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 b0e0b4e6de8f8aae41a9fb4834127ee125668c363a79c62eb9f9c77de58e7b71
MD5 13eff9aee36f77809d91017132f747f0
BLAKE2b-256 7d2ff13277f9a12cb62b42600b363268c40bf1e3d4cfcdea1066ca1950313119

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp35-cp35m-win32.whl.

File metadata

  • Download URL: scandir-1.8-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 8231e327a3a1c090b4f09ba40cc0b75a85939812d0e8f4c83acd745df3ed6c23
MD5 6fb3077a887453653a85300d08bb951d
BLAKE2b-256 ba6359ce9cad0f7b24afa9cfb474eaec8ded12babc43d6ac4e031a3f953e183d

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp34-cp34m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.8-cp34-cp34m-win_amd64.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: CPython 3.4m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 9f703e6b8eb53211d39c0f10e5c02f86e9a989fd44913b5c992259312d9bd59d
MD5 23a06dc6682eaf5bf283e21fe51233e6
BLAKE2b-256 d20120217b3a24656e8a2dacd5a8b71722830fa012c3c20fa91b6c26644576e1

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp34-cp34m-win32.whl.

File metadata

  • Download URL: scandir-1.8-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 f70d557a271ee9973087dc704daea205c95f021ee149f1605592bb0b1571ad78
MD5 06a389c139444a172bfc2a7af5934e78
BLAKE2b-256 2e0f2f4132192c13ddc7a6ed861b53502b49e106ccd060be7f7a8ea1f983507f

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.8-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 11e3bd756b13db4baca364985575eeef4781ce35ce66e2324811091b39b97cdb
MD5 ee4f06e236e43568c6af35967c898f7b
BLAKE2b-256 f464a89fe53f95c46c1411ed6f31b30fbb41e9f78632e150c44818a908a8af54

See more details on using hashes here.

File details

Details for the file scandir-1.8-cp27-cp27m-win32.whl.

File metadata

  • Download URL: scandir-1.8-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.8-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 7f94d5967d61d1b5e415840b3a8995cb00a90893b9628451745e57a3749546d6
MD5 e09f11275fd19a93038467e500bf7a5f
BLAKE2b-256 d0a6ca23bcd406a9d5fe9075defedf5efe2f53d9fd6eea2c074ad7afcdbc8ad4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page