Skip to main content

Extensible on-demand file loading with cache management.

Project description


title: README author: Jan-Michael Rye

Hatch Latest Release License Pipeline Status PyPI PyPI Downloads Pylint Black isort

Synopsis

Extensible on-demand data loader with caching. The library currently provides loaders for the following data sources:

  • text (file or URL)
  • JSON (file or URL)
  • YAML (file or URL)
  • Pandas DataFrame (file or database connection)

The user can easily add custom loaders for other data sources by subclassing the LoaderBase class or one of its subclasses (e.g. FileLoaderBase, URLLoaderBase. Submissions of new loaders for inclusion in the library are welcome.

Data is managed by the Manager class. The user registers patterns to map data sources to corresponding loaders and then requests the data via the manager. These patterns are compiled to Python regular expressions which are then matched against data requests sent to the manager to determine which loader to use to handle the request. The argument is then passed through to the loader, which will load the data upon the first request and keep it in memory for subsequent requests until the cache is cleared.

The manager provides several methods for managing the cached data, such as clearing everything, clearing by argument pattern, forcing a reload of all data, reloading only data from sources that report a modification, etc.

See the Usage section below for details.

Links

GitLab

Other Repositories

Usage

Basic

In the following example, we instantiate a manager and configure it to load JSON and YAML filepaths via the JSONFileLoader and YAMLFileLoader, respectively.

# Import the manager and loaders.
from dalood.manager import Manager
from dalood.loader.json import JSONFileLoader
from dalood.loader.yaml import YAMLFileLoader


# Instantiate the manager.
man = Manager()

# Register the JSON file loader for all arguments ending with the JSON extension
# (`.json`). The first argument is a Python regular expression, followed by a
# loader instance.
man.register_loader("^.*\.json$", JSONFileLoader())

# The previous example was given to demonstrate explicit pattern specification.
# The provided loaders such as JSONFileLoader provide these patterns via their
# `pattern` property. The loader can be registered even more simply using its
# `register_patterns` method.

# JSONFileLoader().register_patterns(man)

# Note that only one of the two previous commands should be used in actual code.

# The regular expression syntax may be unfamiliar to some users. Dalood
# therefore supports other pattern types: glob patterns and literal strings.

# Register the YAML file loader for all arguments ending with the YAML extension
# (`.yaml`). Here we use a simpler glob pattern instead of a regular expression
# pattern
man.register_loader("*.yaml", YAMLFileLoader(), pattern_type="glob")

# For comparison, we could have registered the JSON loader with the following
# statement.

# man.register_loader("*.json", JSONFileLoader(), pattern_type="glob")

# And just for further illustration, the YAMLFileLoader registration could also
# be done with the following command.

# YAMLFileLoader().register_patterns(man)

# Now that the loaders are registered, we can load JSON and YAML files by simply
# passing their paths to the manager via the `get()` method:

json_data = man.get("/tmp/examples/foo.json")
yaml_data = man.get("/tmp/examples/bar.yaml")

# The data remains in memory within the manager so subsequent requests for the
# same argument via `get()` will not reload the file from the disk. You can
# check which arguments are in memory by iterating over the manager.
for arg in man:
    print("Cached argument:", arg)

# Output:
#   /tmp/examples/foo.json
#   /tmp/examples/bar.yaml.json

# To force a refresh when requesting the data, pass the `reload` argument
# to`get()`:
json_data = man.get("/tmp/examples/foo.json", reload=True).

# You can also request a reload only if the source file reports a modification
# since the data was loaded by the manager:
json_data = man.get("/tmp/examples/foo.json", refresh=True).

# For application that load large amounts of data it may be desirable to
# periodically clear the cache according to different conditions. The
# `clear_cache()` method is provided for this purpose. Without any arguments,
# all cached data is cleared.
man.clear_cache()

for arg in man:
    print("Cached argument:", arg)

# Output: empty

# Re-requeesting data after clearing the cache will simply reload the data from
# the source and cache it again.

# Clearing everything is not always desirable so `clear_cache()` provides
# options to clear the cache by a pattern (e.g. all loaded YAML files), by age
# (e.g everything loaded more than an hour ago), or by last access time (e.g.
# everything that was last accessed more than 20 minutes ago). The following
# would clear all JSON files accessed more than 2 minutes ago:
man.clear_cache("*.json", pattern_type="glob", age={"minutes":2}, by_access_time=True)

Literal Patterns And Customized Loaders

In additional to the regular expression and glob patterns, there are also "literal" patterns that will only match the exact string of the pattern. These can be used to load specific arguments with specific loaders. The following example shows how to associate different CSV file loaders with different files that use different separator characters (comma or space).

from dalood.manager import Manager
from dalood.loader.pandas import DataFrameCSVLoader

man = Manager()
comma_csv_loader = DataFrameCSVLoader(sep=",")
space_csv_loader = DataFrameCSVLoader(sep=" ")
man.register_loader("/tmp/example/file1.csv", comma_csv_loader, pattern_type="literal")
man.register_loader("/tmp/example/file2.csv", comma_csv_loader, pattern_type="literal")
man.register_loader("/tmp/example/file1.csv", space_csv_loader, pattern_type="literal")

# This would be tedious for many different files that cannot be summarized via a
# pattern. In that case, a custom function could make this easier:
def register_comma_csv_loader_for_path(path):
    man.register_loader(path, comma_csv_loader, pattern_type="literal")

Pattern Classes

All of the methods that accept a pattern string and optional pattern_type parameter also accept instances of RegexPattern, GlobPattern and LiteralPattern from dalood.regex.

from dalood.manager import Manager
from dalood.loader.text import TextFileLoader
from dalood.regex import GlobPattern

man = Manager()
pattern = GlobPattern("*.txt")
man.register_loader(pattern, TextFileLoader())

User-Loaded Data

Dalood also provides loaders that simply hold references to user-provided data in order to make it accessible via a common API. For example, the user may wish to build a custom object in memory and then access it via the manager using a simple name:

from dalood.manager import Manager
from dalood.loader.memory import MemoryLoader

man = Manager()

# Assume that the user has defined 2 custom objects: "custom_obj1" and
# "custom_obj2". We can map them to arbitrary names either by passing a dict as
# the "mapping" parameter when instantiating MemoryLoader
mem_loader = MemoryLoader(mapping={"obj1": custom_obj1})

# or afterward using the "map" method.
mem_loader.map("obj2", custom_obj2).

# Once mapped in the memory loader, we can register them via the manager:
mem_loader.register_patterns(man)


# Now we can access the objects via the manager's "get()" method:
new_var_for_custom_obj1 = man.get("obj1")

User-Defined Loaders

The user can define custom loaders and then register them with a manager using custom patterns:

from dalood.manager import Manager
from dalood.loader.file import FileLoaderBase

# Create a custom loader for "foo" files. We'll make it load the first 10 bytes
# from the file.

import pathlib

class FooFileLoader(FileLoaderBase):
    def load(self, src):
        path = pathlib.Path(src)
        with path.open('rb') as handle:
            return handle.read(10)

# Register this loader to handle all arguments ending in ".foo".
man.register_loader("*.foo", FooFileLoader(), pattern_type="glob")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dalood-2025.3.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

dalood-2025.3-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file dalood-2025.3.tar.gz.

File metadata

  • Download URL: dalood-2025.3.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for dalood-2025.3.tar.gz
Algorithm Hash digest
SHA256 446498f9ce01715ad0a57a4b22330dca308a013fccd0197599cef56755788a1f
MD5 b7c1088c4cbccf37c087b05be40c4d76
BLAKE2b-256 f114cac949c4d1d5ee849c17e4abdcbf5171e3a3f0f9e98ff4cc1f0aeb3c8ce2

See more details on using hashes here.

File details

Details for the file dalood-2025.3-py3-none-any.whl.

File metadata

  • Download URL: dalood-2025.3-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for dalood-2025.3-py3-none-any.whl
Algorithm Hash digest
SHA256 80e552141776149061862f45e00285728e7e63c2c31c87e9f6818fdc37fad7e8
MD5 158d6cab894cb8f9b2f3a88854dbdaab
BLAKE2b-256 25d0f4b0b3e0a939c254a7b4ec645e504a38257161eaf434274fc50862f411cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page