Skip to main content

Walk directories trees with os.scandir, generating DirEntry objects

Project description

scanwalk

scanwalk.walk() walks a directory tree, generating DirEntry objects. It's an alternative to os.walk() modelled on os.scandir().

>>> import scanwalk
>>> for entry in scanwalk.walk('demo'):
...     print('📁' if entry.is_dir() else '📄', entry.path)
...
📁 demo
📁 demo/dir2
📁 demo/dir1
📁 demo/dir1/dir1.1
📄 demo/dir1/dir1.1/file_a
📄 demo/dir1/file_c
📁 demo/dir1/dir1.2
📄 demo/dir1/dir1.2/file_b

a rough equivalent using os.walk() would be

>>> import os
>>> for parent, dirnames, filenames in os.walk('demo'):
...     print('📁', parent)
...     for name in filenames:
...         print('📄', os.path.join(parent, name))
...
📁 demo
📁 demo/dir2
📁 demo/dir1
📄 demo/dir1/file_c
📁 demo/dir1/dir1.1
📄 demo/dir1/dir1.1/file_a
📁 demo/dir1/dir1.2
📄 demo/dir1/dir1.2/file_b

to skip the contents of a directory set the DireEntry.skip attribute

>>> import scanwalk
>>> for entry in scanwalk.walk('demo'):
...     if entry.name == 'dir1.1':
...         entry.skip = True
...     else:
...         print(entry.path)
...
demo
demo/dir2
demo/dir1
demo/dir1/file_c
demo/dir1/dir1.2
demo/dir1/dir1.2/file_b

Comparison

os.walk() scanwalk.walk()
Yields (dirpath, dirnames, filenames) DirEntry objects
Consumers Nested for loops Flat for loop, list comprehension, or generator expression
Grouping Directories & files seperated Directories & files intermingled
Traversal Depth first or breadth first Semi depth first, directories traversed on arrival
Exceptions onerror() callback try/except block
Allocations Builds intermediate lists Direct from os.scandir()
Maturity Mature Alpha
Tests Thorough automated unit tests None
Performance 1.0x 1.1 - 1.2x faster

Installation

python -m pip install scanwalk

Requirements

  • Python 3.7+

License

MIT

Questions and Answers

What's wrong with os.walk()?

os.walk() is plenty good enough, it's just an awkward return type to use inside a list comprehension, a generator expression, or similar.

Why use scanwalk?

scanwalk.walk() eeks out a little more speed (10-20% in an adhoc benchmark). It doesn't require nested for loops, so code is a bit easier to read and write. In particular list comprehensions and generator expressions become simpler.

Why not use scanwalk?

scanwalk is still alpha, mostly untested, and almost entirely undocumented. It only supports newer Pythons, on platforms with a working os.scandir().

scanwalk.walk() behaviour differs from os.walk()

  • directories and files are intermingled, rather than seperated
  • Traversal is always semi depth-first

Related work

  • scandir - backport of os.scandir() for Python 2.7 and 3.4

TODO

  • Implement context manager protocol, similar to os.scandir()
  • Documentation
  • Tests
  • Continuous Integration
  • Coverage
  • Code quality checks (MyPy, flake8, etc.)
  • scanwalk.copytree()?
  • scanwalk.DirEntry.depth?
  • Linux io_uring support?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scanwalk-0.0.5.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

scanwalk-0.0.5-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file scanwalk-0.0.5.tar.gz.

File metadata

  • Download URL: scanwalk-0.0.5.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for scanwalk-0.0.5.tar.gz
Algorithm Hash digest
SHA256 f850b5a8eee78a2137829459b23efc60773c043801fc96d1560614d968937445
MD5 a7bd679d6e02d82ba04ea31d81fc88c4
BLAKE2b-256 3f749383a50b2c12df23975f8fcfdf1344597f688ce6d1335822eeafcfa47f1b

See more details on using hashes here.

File details

Details for the file scanwalk-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: scanwalk-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for scanwalk-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c4338c172f71b445718dba0606a04f40bc11af65e9b185154eef0e84da275ca8
MD5 b53716a2363766155e7ada22601a1176
BLAKE2b-256 2286c62a6896d055d2fb1508622a914d1719827eb9bd26bb6fa1ea00ccc90ff2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page