Skip to main content

Functions for Prototyping, QOL and Sanity checking

Project description

Grimmerie

A spellbook for Python.

Grimmerie is a collection of high-level utilities (“spells”) designed for rapid prototyping, sanity checking, and reducing friction in experimentation.

Each spell performs a non-trivial amount of work under the hood.
They are intentionally designed to trade fine-grained control for speed, clarity, and momentum.

Use them when you want to move fast.
Understand them before you rely on them.


Installation

pip install grimmerie

The Idea

Instead of wiring together pipelines every time, Grimmerie gives you:

  • One function call
  • Sensible defaults
  • Heavy lifting handled internally

Example philosophy:

embeddings = specterize(papers)

Behind this single call:

  • Model loading
  • Tokenization
  • Batching
  • Device handling
  • Adapter loading
  • Output formatting

All handled for you.


Spells

specterize

Generate SPECTER2 embeddings from text or paper-like inputs.

from grimmerie import specterize

papers = [
    {'abstract': 'We introduce a new language representation model called BERT'},
    {'abstract': 'The dominant sequence transduction models are based on neural networks'},
]

embeddings = specterize(papers, return_type='numpy')

tfidfize

Generate TF-IDF representations from text with optional preprocessing.

from grimmerie import tfidfize

docs = [
    {'abstract': 'We introduce a new language representation model called BERT'},
    {'abstract': 'The dominant sequence transduction models are based on neural networks'},
]

X = tfidfize(docs, return_type='array')
tfidfize(
    input_data,
    lemmatize: bool = False,
    spacy_model: str = 'en_core_web_sm',
    batch_size: int = 2000,
    n_process: int = 1,
    progress_interval: int | None = None,
    min_df: int | float = 1,
    max_df: int | float = 1.0,
    stop_words: str | list[str] | None = 'english',
    ngram_range: tuple[int, int] = (1, 1),
    lowercase: bool = True,
    max_features: int | None = None,
    norm: Literal['l1', 'l2'] | None = 'l2',
    use_idf: bool = True,
    smooth_idf: bool = True,
    sublinear_tf: bool = False,
    return_type: Literal['sparse', 'array', 'list', 'frame'] = 'sparse',
    return_vectorizer: bool = False,
    vectorizer: TfidfVectorizer | None = None,
)

Parameters:

  • lemmatize: Apply lemmatization (default False)
  • spacy_model: Spacy model for lemmatization (default 'en_core_web_sm')
  • batch_size: Processing batch size (default 2000)
  • n_process: Number of processes (default 1)
  • progress_interval: Progress reporting interval
  • min_df: Minimum document frequency (default 1)
  • max_df: Maximum document frequency (default 1.0)
  • stop_words: Stop words to filter (default 'english')
  • ngram_range: N-gram range (default (1, 1))
  • lowercase: Convert to lowercase (default True)
  • max_features: Maximum vocabulary size
  • norm: Normalization method (default 'l2')
  • use_idf: Enable IDF weighting (default True)
  • smooth_idf: Smooth IDF values (default True)
  • sublinear_tf: Apply sublinear TF scaling (default False)
  • return_type: Output format (default 'sparse')
  • return_vectorizer: Return fitted vectorizer (default False)
  • vectorizer: Pre-fitted TfidfVectorizer instance

API

specterize(input_data, return_type='list', max_length=512)
  • input_data: str, dict, list, or iterable
  • return_type: "list", "numpy", "tensor"
  • max_length: tokenizer truncation length (default 512)

Design Principles

1. Abstraction over configuration

You should not need to think about setup for common workflows.

2. Strong defaults

Spells are opinionated. They are built to “just work” for most cases.

3. Hidden complexity

A spell may do significantly more than it appears.

4. Use with awareness

Because complexity is hidden, you should understand what a spell does before using it in critical systems.


When to Use Grimmerie

  • Rapid experimentation
  • Prototyping ML/NLP pipelines
  • Sanity checking ideas
  • Building quick demos

When Not to Use It

  • When you need full control over every step
  • When reproducibility requires explicit pipelines
  • When debugging low-level behavior

Notes

  • First call may be slower due to model downloads
  • Models are cached locally after first use
  • Subsequent calls reuse loaded resources within the same process

Direction

Grimmerie will expand into a broader system of spells for:

  • Vectorization
  • Dimensionality reduction
  • Visualization
  • Data inspection
  • ML prototyping utilities

Each designed to compress multi-step workflows into a single, intentional call.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grimmerie-0.1.5.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grimmerie-0.1.5-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file grimmerie-0.1.5.tar.gz.

File metadata

  • Download URL: grimmerie-0.1.5.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for grimmerie-0.1.5.tar.gz
Algorithm Hash digest
SHA256 cdb95105d3278811d46be7741324d310752747e083f2376a40d3d392804470a6
MD5 e1616c6bbf9ed347b4d51fac5de4ed72
BLAKE2b-256 6b7eac9c50f72d66867758af0d9fb8970046a924370d6490c0acc9a66002bd68

See more details on using hashes here.

File details

Details for the file grimmerie-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: grimmerie-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for grimmerie-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 eaad7164b683dd55efa0a31ca9fbb27fc2dca6105d219f4fe81887f179efc728
MD5 de20141b03f561f28f9594d41f9e6ad6
BLAKE2b-256 687264f4576700017b90b7042adfc37fc2a00485710c6f4347f44d891d7c4abe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page