Skip to main content

scandir, a better directory iterator and faster os.walk()

Project description

scandir on PyPI (Python Package Index) Travis CI tests (Linux) Appveyor tests (Windows)

scandir() is a directory iteration function like os.listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding unnecessary calls to os.stat() in most cases.

Now included in a Python near you!

scandir has been included in the Python 3.5 standard library as os.scandir(), and the related performance improvements to os.walk() have also been included. So if you’re lucky enough to be using Python 3.5 (release date September 13, 2015) you get the benefit immediately, otherwise just download this module from PyPI, install it with pip install scandir, and then do something like this in your code:

# Use the built-in version of scandir/walk if possible, otherwise
# use the scandir module version
try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk

PEP 471, which is the PEP that proposes including scandir in the Python standard library, was accepted in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.

This scandir module is intended to work on Python 2.6+ and Python 3.2+ (and it has been tested on those versions).

Background

Python’s built-in os.walk() is significantly slower than it needs to be, because – in addition to calling listdir() on each directory – it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X. So we’re not talking about micro-optimizations. See more benchmarks in the “Benchmarks” section below.

Somewhat relatedly, many people have also asked for a version of os.listdir() that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.

So as well as a faster walk(), scandir adds a new scandir() function. They’re pretty easy to use, but see “The API” below for the full docs.

Benchmarks

Below are results showing how many times as fast scandir.walk() is than os.walk() on various systems, found by running benchmark.py with no arguments:

System version

Python version

Times as fast

Windows 7 64-bit

2.7.7 64-bit

10.4

Windows 7 64-bit SSD

2.7.7 64-bit

10.3

Windows 7 64-bit NFS

2.7.6 64-bit

36.8

Windows 7 64-bit SSD

3.4.1 64-bit

9.9

Windows 7 64-bit SSD

3.5.0 64-bit

9.5

CentOS 6.2 64-bit

2.6.6 64-bit

3.9

Ubuntu 14.04 64-bit

2.7.6 64-bit

5.8

Mac OS X 10.9.3

2.7.5 64-bit

3.8

All of the above tests were done using the fast C version of scandir (source code in _scandir.c).

Note that the gains are less than the above on smaller directories and greater on larger directories. This is why benchmark.py creates a test directory tree with a standardized size.

The API

walk()

The API for scandir.walk() is exactly the same as os.walk(), so just read the Python docs.

scandir()

The full docs for scandir() and the DirEntry objects it yields are available in the Python documentation here. But below is a brief summary as well.

scandir(path=’.’) -> iterator of DirEntry objects for given path

Like listdir, scandir calls the operating system’s directory iteration system calls to get the names of the files in the given path, but it’s different from listdir in two ways:

  • Instead of returning bare filename strings, it returns lightweight DirEntry objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.

  • It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

scandir() yields a DirEntry object for each file and sub-directory in path. Just like listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each DirEntry object has the following attributes and methods:

  • name: the entry’s filename, relative to the scandir path argument (corresponds to the return values of os.listdir)

  • path: the entry’s full path name (not necessarily an absolute path) – the equivalent of os.path.join(scandir_path, entry.name)

  • is_dir(*, follow_symlinks=True): similar to pathlib.Path.is_dir(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_file(*, follow_symlinks=True): similar to pathlib.Path.is_file(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_symlink(): similar to pathlib.Path.is_symlink(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases

  • stat(*, follow_symlinks=True): like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don’t follow symbolic links (like os.lstat()) if follow_symlinks is False

  • inode(): return the inode number of the entry; the return value is cached on the DirEntry object

Here’s a very simple example of scandir() showing use of the DirEntry.name attribute and the DirEntry.is_dir() method:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

This subdirs() function will be significantly faster with scandir than os.listdir() and os.path.isdir() on both Windows and POSIX systems, especially on medium-sized or large directories.

Further reading

  • The Python docs for scandir

  • PEP 471, the (now-accepted) Python Enhancement Proposal that proposed adding scandir to the standard library – a lot of details here, including rejected ideas and previous discussion

Flames, comments, bug reports

Please send flames, comments, and questions about scandir to Ben Hoyt:

http://benhoyt.com/

File bug reports for the version in the Python 3.5 standard library here, or file bug reports or feature requests for this module at the GitHub project page:

https://github.com/benhoyt/scandir

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scandir-1.7.tar.gz (32.8 kB view details)

Uploaded Source

Built Distributions

scandir-1.7-cp36-cp36m-win_amd64.whl (25.7 kB view details)

Uploaded CPython 3.6m Windows x86-64

scandir-1.7-cp36-cp36m-win32.whl (25.0 kB view details)

Uploaded CPython 3.6m Windows x86

scandir-1.7-cp35-cp35m-win_amd64.whl (25.7 kB view details)

Uploaded CPython 3.5m Windows x86-64

scandir-1.7-cp35-cp35m-win32.whl (25.0 kB view details)

Uploaded CPython 3.5m Windows x86

scandir-1.7-cp34-cp34m-win_amd64.whl (23.4 kB view details)

Uploaded CPython 3.4m Windows x86-64

scandir-1.7-cp34-cp34m-win32.whl (23.0 kB view details)

Uploaded CPython 3.4m Windows x86

scandir-1.7-cp33-cp33m-win_amd64.whl (23.4 kB view details)

Uploaded CPython 3.3m Windows x86-64

scandir-1.7-cp33-cp33m-win32.whl (23.1 kB view details)

Uploaded CPython 3.3m Windows x86

scandir-1.7-cp27-cp27m-win_amd64.whl (23.8 kB view details)

Uploaded CPython 2.7m Windows x86-64

scandir-1.7-cp27-cp27m-win32.whl (23.4 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file scandir-1.7.tar.gz.

File metadata

  • Download URL: scandir-1.7.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for scandir-1.7.tar.gz
Algorithm Hash digest
SHA256 b2d55be869c4f716084a19b1e16932f0769711316ba62de941320bf2be84763d
MD5 037e5f24d1a0e78b17faca72dea9555f
BLAKE2b-256 13bbe541b74230bbf7a20a3949a2ee6631be299378a784f5445aa5d0047c192b

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 8e3ca5925cc13787aeafbf08f055a8066c091fc20bfa8783235b916cf047afbe
MD5 434217b9e6ca0416bfaf0a2374e5a428
BLAKE2b-256 4857507dd79199dc37efaf882e5076b55d803c17abcb38360fc1f2d0796400cf

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 96dfc553f50946deb6d1cd762bac5cf122832c4aa253c885ca357ef53dd8d072
MD5 19bb787e30bf17732f04405c2010d6a9
BLAKE2b-256 9b0b4648de76277a6807d2c23ab1bb35ba6c1f82cf98f429c89cd43303872f9c

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 6c80092f8fe3e62c3da3110067589c6661c722b0889906d2974e5150f1314523
MD5 5e881b0e1899c67e6b15d3b6a3abf66d
BLAKE2b-256 884f82bb78b9d706d0eb72c89e4a457c96245272c75a06177b32a28fe554e0d2

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 f91418e82edb5a43b020fa15e30a41d730b71c5957536749366bf63cc05427b1
MD5 b90923e34f28baa691d2c603efb31bc2
BLAKE2b-256 45bb0d5432599fad9cddee1ec95b188f66534d00f9abea00e58f8209b8bb1d03

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 b55a091b91f9e6c9c7129889b2f58df329530172a99172de9e784545342a45e6
MD5 d385d46e8f84f467c3e8f1d7ca708887
BLAKE2b-256 7c264e187a07507815ce3888afcdf71d7b58e85df77d5fdf801944b463e6366e

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 24f32112c483ac6c4a40b62f1282e61ecca7977153b66a0d26a9583a716dcb64
MD5 cd2d80ab54417b30628e891fd1601540
BLAKE2b-256 e9ab3174f435b6f8bef3469d68e605c1b4d9ca75834fd2b30289dce82ad5d07b

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp33-cp33m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp33-cp33m-win_amd64.whl
Algorithm Hash digest
SHA256 d985e36eb3effebb20434e6cd7495440b4ba676a22f3ec61e9fff9f3e2995238
MD5 98fe02f4cdfa500bcd4c73cb89a3e927
BLAKE2b-256 2becfa1d81e23137a18941d864af00d92e895498726a4e50e73a19e4f4664a0d

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp33-cp33m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp33-cp33m-win32.whl
Algorithm Hash digest
SHA256 b6cb611a18a828146a178362a36a2c6557c51c596ded4314cb516dd8c947b4ce
MD5 2858ddd0823c94ff04457dc5ac593f21
BLAKE2b-256 03287982073672e7620db4e8a94c59a2ef404754ee3b82c1d23da010a54c6619

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 7729b8444c5f5187649ff58501e7c2ad22b84d7dc28f738f64c5b615913fec22
MD5 c4d62edc28ba5d048e270594dd5294c4
BLAKE2b-256 e8cfc31ae350170504d204f08d2a51fbd56fea565c9031cddc84cc68ece3b0f6

See more details on using hashes here.

File details

Details for the file scandir-1.7-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.7-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 f39dd5affde2860fb28176d2233f318ccca97c55019407ee8172b3fba0b211db
MD5 0e11230ad303107878b45c467208c144
BLAKE2b-256 58d2188a7e8fa7905c4bb42f55e672a3ed45f2b726041b1398ad8c04c0914a24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page