Skip to main content

No project description provided

Project description

nshutils

nshutils is a collection of utility functions and classes that I've found useful in my day-to-day work as an ML researcher. This library includes utilities for typechecking, logging, and saving/loading activations from neural networks.

Installation

To install nshutils, simply run:

pip install nshutils

Configuration

nshutils features a unified configuration system that allows you to control various aspects of the library through environment variables, programmatic settings, or JSON configuration.

Environment Variables

You can configure nshutils using environment variables:

# Enable debug mode
export NSHUTILS_DEBUG=1

# Enable typecheck (or disable with 0)
export NSHUTILS_TYPECHECK=1

# Enable ActSave with default temp directory
export NSHUTILS_ACTSAVE=1

# Enable ActSave with specific directory
export NSHUTILS_ACTSAVE="/path/to/activations"

# Set ActSave filters (comma-separated patterns)
export NSHUTILS_ACTSAVE_FILTERS="layer*,attention*,encoder.*"

# JSON configuration (overrides individual variables)
export NSHUTILS_CONFIG='{"debug": {"enabled": true}, "typecheck": {"enabled": false}, "actsave": {"enabled": true, "save_dir": "/path/to/activations", "filters": ["layer*"]}}'

# Comma-separated configuration
export NSHUTILS_CONFIG="debug=true,typecheck=false,actsave=true"

Programmatic Configuration

You can also configure nshutils programmatically:

from nshutils import config

# Enable debug mode
config.set(True, "debug")

# Configure typecheck with explicit settings
config.set({"enabled": True}, "typecheck")

# Configure ActSave
config.set({
    "enabled": True,
    "save_dir": "/path/to/activations",
    "filters": ["layer*", "attention*"]
}, "actsave")

# Check current settings
print(f"Debug enabled: {config.debug_enabled()}")
print(f"Typecheck enabled: {config.typecheck_enabled()}")
print(f"ActSave enabled: {config.actsave_enabled()}")

# Temporary overrides using context managers
with config.debug_override(False):
    # Debug is temporarily disabled
    pass

with config.actsave_override({"enabled": True, "filters": ["encoder.*"]}):
    # ActSave temporarily uses different filters
    pass

Configuration Hierarchy

The configuration system follows a hierarchical approach:

  1. Debug → Typecheck: When debug is enabled, typecheck is automatically enabled unless explicitly overridden
  2. Environment variables take precedence over programmatic settings
  3. Context managers provide temporary overrides within their scope

Features

Typechecking

nshutils provides a simple way to typecheck your code using the jaxtyping library. The typecheck system is now integrated with the unified configuration system.

Enable typechecking globally:

export NSHUTILS_TYPECHECK=1

Or programmatically:

from nshutils import config
config.set(True, "typecheck")

# Now use typecheck decorators
from nshutils.typecheck import typecheck

@typecheck
def my_function(x: Float[torch.Tensor, "batch seq"]) -> Float[torch.Tensor, "batch seq"]:
    return x * 2

You can also use the tassert function to assert that a value is of a certain type:

import nshutils.typecheck as tc

def my_function(x: tc.Float[torch.Tensor, "bsz seq len"]) -> tc.Float[torch.Tensor, "bsz seq len"]:
    tc.tassert(tc.Float[torch.Tensor, "bsz seq len"], x)
    ...

Logging

nshutils provides a simple way to configure logging for your project. Simply call one of the logging setup functions:

from nshutils.logging import init_python_logging

init_python_logging()

This will configure logging to use pretty formatting for PyTorch tensors and numpy arrays (inspired by and/or utilizing lovely-numpy and lovely-tensors), and will also enable rich logging if the rich library is installed.

Activation Saving/Loading

nshutils provides a simple way to save and load activations from neural networks. ActSave is now integrated with the unified configuration system.

Basic Usage

Enable ActSave via environment variables:

# Enable with default temp directory
export NSHUTILS_ACTSAVE=1

# Enable with specific directory
export NSHUTILS_ACTSAVE="/path/to/activations"

# Set filters (comma-separated patterns)
export NSHUTILS_ACTSAVE_FILTERS="layer*,attention*"

Or programmatically:

from nshutils import config
from nshutils import ActSave

# Enable ActSave with configuration
config.set({
    "enabled": True,
    "save_dir": "/path/to/activations",
    "filters": ["layer*", "attention*"]
}, "actsave")

def my_model_forward(x):
    ...
    # Save activations - automatically filtered
    ActSave({"encoder.activations": x})

    # Equivalent to the above
    with ActSave.context("encoder"):
        ActSave(activations=x)
    ...

x = torch.randn(...)
my_model_forward(x)
# Activations are saved to disk under the configured directory

You can also use the traditional explicit enable/disable pattern:

from nshutils import ActSave

# Explicit enable with filters
ActSave.enable(save_dir="path/to/activations", filters=["layer*", "attention*"])

def my_model_forward(x):
    ActSave({"encoder.activations": x})

x = torch.randn(...)
my_model_forward(x)
ActSave.disable()

Activation Filtering

ActSave supports filtering to selectively save only certain activations based on fnmatch patterns. This is useful for reducing storage space and focusing on specific model components.

Configure filters via environment variables:

# Only save activations matching "layer*" or "attention*" patterns
export NSHUTILS_ACTSAVE_FILTERS="layer*,attention*"
export NSHUTILS_ACTSAVE=1

Or programmatically:

from nshutils import config

# Configure via unified config system
config.set({
    "enabled": True,
    "save_dir": "/path/to/activations",
    "filters": ["layer*", "attention*"]
}, "actsave")

# These will be saved (match filters)
ActSave(
    layer1_output=x1,
    layer2_hidden=x2,
    attention_weights=x3
)

# These will NOT be saved (don't match filters)
ActSave(
    decoder_output=x4,
    embedding_vector=x5
)

Traditional explicit filtering is also supported:

from nshutils import ActSave

# Only save activations matching "layer*" or "attention*" patterns
filters = ["layer*", "attention*"]

with ActSave.enabled(save_dir="path/to/activations", filters=filters):
    # These will be saved (match filters)
    ActSave(
        layer1_output=x1,
        layer2_hidden=x2,
        attention_weights=x3
    )

    # These will NOT be saved (don't match filters)
    ActSave(
        decoder_output=x4,
        embedding_vector=x5
    )

The filtering patterns support standard Unix shell-style wildcards:

  • * matches everything
  • ? matches any single character
  • [seq] matches any character in seq
  • [!seq] matches any character not in seq
Contextual Filtering

Filters work with context prefixes, allowing you to save activations from specific model components:

from nshutils import config, ActSave

# Only save activations from encoder layers
config.set({
    "enabled": True,
    "save_dir": "/path/to/activations",
    "filters": ["encoder.*"]
}, "actsave")

# Decoder context - these won't be saved
with ActSave.context("decoder"):
    ActSave(layer1_output=x1, attention=x2)

# Encoder context - these will be saved
with ActSave.context("encoder"):
    ActSave(layer1_output=x3, attention=x4)  # Saved as "encoder.layer1_output", "encoder.attention"
Dynamic Filter Updates

You can update filters during runtime:

from nshutils import config

# Set initial filters
config.set({"enabled": True, "filters": ["layer*"]}, "actsave")

# Initially only layer outputs saved
ActSave(layer1_output=x1, attention_weights=x2)

# Update filters
config.set({"enabled": True, "filters": ["attention*"]}, "actsave")
ActSave(layer2_output=x3, attention_weights=x4)  # Only attention_weights saved

# Check current filters
current_filters = config.actsave_filters()  # Returns ["attention*"]

# Clear filters (save all)
config.set({"enabled": True, "filters": None}, "actsave")

Traditional explicit filter management is also available:

ActSave.enable(save_dir="path/to/activations")

# Initially no filters - all activations saved
ActSave(layer1_output=x1, attention_weights=x2)

# Update to only save layer outputs
ActSave.set_filters(["layer*"])
ActSave(layer2_output=x3, decoder_output=x4)  # Only layer2_output saved

# Check current filters
current_filters = ActSave.filters  # Returns ["layer*"]

# Clear filters
ActSave.set_filters(None)
Environment Variable Configuration

You can configure ActSave and filtering through environment variables (now part of the unified configuration system):

# Enable ActSave with default temp directory
export NSHUTILS_ACTSAVE=1

# Enable ActSave with specific directory
export NSHUTILS_ACTSAVE="/path/to/activations"

# Set filters (comma-separated patterns)
export NSHUTILS_ACTSAVE_FILTERS="layer*,attention*,encoder.*"

# Combine both
export NSHUTILS_ACTSAVE="/path/to/activations"
export NSHUTILS_ACTSAVE_FILTERS="layer*,attention*"

# Or use the unified config format
export NSHUTILS_CONFIG='{"actsave": {"enabled": true, "save_dir": "/path/to/activations", "filters": ["layer*", "attention*"]}}'

The NSHUTILS_ACTSAVE_FILTERS environment variable supports:

  • Comma-separated patterns: "layer*,attention*,decoder.*"
  • Whitespace handling: Extra spaces around commas are automatically trimmed
  • Empty values: Empty string or only commas/spaces result in no filtering

To load activations, use the ActLoad class:

from nshutils import ActLoad

act_load = ActLoad.from_latest_version("path/to/activations")
encoder_acts = act_load["encoder"]

for act in encoder_acts:
    print(act.shape)

This will load all of the activations saved under the encoder prefix.

Other Utilities

nshutils also provides a few other utility functions/classes:

  • snoop: A simple way to debug your code using the pysnooper library, based on the torchsnooper library.
  • apply_to_collection: Recursively apply a function to all elements of a collection that match a certain type, taken from the pytorch-lightning library.

Contributing

Contributions to nshutils are welcome! Please open an issue or submit a pull request on the GitHub repository.

License

nshutils is released under the MIT License. See the LICENSE file for more details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nshutils-0.38.4.tar.gz (36.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nshutils-0.38.4-py3-none-any.whl (44.1 kB view details)

Uploaded Python 3

File details

Details for the file nshutils-0.38.4.tar.gz.

File metadata

  • Download URL: nshutils-0.38.4.tar.gz
  • Upload date:
  • Size: 36.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for nshutils-0.38.4.tar.gz
Algorithm Hash digest
SHA256 398c2fbb82f14b4a9e282821aa6d8c194e916e29691913e4213f13d5cec5cae2
MD5 582d0f74d5c8b39e62f8b23665310fe4
BLAKE2b-256 662a007c7e13cb37a083461dd454de70a6a9f5e399c4691c4921db35ff2d868f

See more details on using hashes here.

File details

Details for the file nshutils-0.38.4-py3-none-any.whl.

File metadata

  • Download URL: nshutils-0.38.4-py3-none-any.whl
  • Upload date:
  • Size: 44.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for nshutils-0.38.4-py3-none-any.whl
Algorithm Hash digest
SHA256 61586de9990a08ebfa994b07788c7df98293362ce5c8294d73b5a9351f9dc196
MD5 37860facc2119392b2767b9cd78784b8
BLAKE2b-256 c899c95316e08cae1eb67f118bb9e5bed51cdb10213843967ff9358d06e5a077

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page