Skip to main content

Functions for Prototyping, QOL and Sanity checking

Project description

Grimmerie

A spellbook for Python.

Grimmerie is a collection of high-level utilities (“spells”) designed for rapid prototyping, sanity checking, and reducing friction in experimentation.

Each spell performs a non-trivial amount of work under the hood.
They are intentionally designed to trade fine-grained control for speed, clarity, and momentum.

Use them when you want to move fast.
Understand them before you rely on them.


Installation

pip install grimmerie

The Idea

Instead of wiring together pipelines every time, Grimmerie gives you:

  • One function call
  • Sensible defaults
  • Heavy lifting handled internally

Example philosophy:

embeddings = specterize(papers)

Behind this single call:

  • Model loading
  • Tokenization
  • Batching
  • Device handling
  • Adapter loading
  • Output formatting

All handled for you.


Spells

specterize

Generate SPECTER2 embeddings from text or paper-like inputs.

from grimmerie import specterize

papers = [
    {'abstract': 'We introduce a new language representation model called BERT'},
    {'abstract': 'The dominant sequence transduction models are based on neural networks'},
]

embeddings = specterize(papers, return_type='numpy')

tfidfize

Generate TF-IDF representations from text with optional preprocessing.

from grimmerie import tfidfize

docs = [
    {'abstract': 'We introduce a new language representation model called BERT'},
    {'abstract': 'The dominant sequence transduction models are based on neural networks'},
]

X = tfidfize(docs, return_type='array')
tfidfize(
    input_data,
    lemmatize: bool = False,
    spacy_model: str = 'en_core_web_sm',
    batch_size: int = 2000,
    n_process: int = 1,
    progress_interval: int | None = None,
    min_df: int | float = 1,
    max_df: int | float = 1.0,
    stop_words: str | list[str] | None = 'english',
    ngram_range: tuple[int, int] = (1, 1),
    lowercase: bool = True,
    max_features: int | None = None,
    norm: Literal['l1', 'l2'] | None = 'l2',
    use_idf: bool = True,
    smooth_idf: bool = True,
    sublinear_tf: bool = False,
    return_type: Literal['sparse', 'array', 'list', 'frame'] = 'sparse',
    return_vectorizer: bool = False,
    vectorizer: TfidfVectorizer | None = None,
)

Parameters:

  • lemmatize: Apply lemmatization (default False)
  • spacy_model: Spacy model for lemmatization (default 'en_core_web_sm')
  • batch_size: Processing batch size (default 2000)
  • n_process: Number of processes (default 1)
  • progress_interval: Progress reporting interval
  • min_df: Minimum document frequency (default 1)
  • max_df: Maximum document frequency (default 1.0)
  • stop_words: Stop words to filter (default 'english')
  • ngram_range: N-gram range (default (1, 1))
  • lowercase: Convert to lowercase (default True)
  • max_features: Maximum vocabulary size
  • norm: Normalization method (default 'l2')
  • use_idf: Enable IDF weighting (default True)
  • smooth_idf: Smooth IDF values (default True)
  • sublinear_tf: Apply sublinear TF scaling (default False)
  • return_type: Output format (default 'sparse')
  • return_vectorizer: Return fitted vectorizer (default False)
  • vectorizer: Pre-fitted TfidfVectorizer instance

API

specterize(input_data, return_type='list', max_length=512)
  • input_data: str, dict, list, or iterable
  • return_type: "list", "numpy", "tensor"
  • max_length: tokenizer truncation length (default 512)

Design Principles

1. Abstraction over configuration

You should not need to think about setup for common workflows.

2. Strong defaults

Spells are opinionated. They are built to “just work” for most cases.

3. Hidden complexity

A spell may do significantly more than it appears.

4. Use with awareness

Because complexity is hidden, you should understand what a spell does before using it in critical systems.


When to Use Grimmerie

  • Rapid experimentation
  • Prototyping ML/NLP pipelines
  • Sanity checking ideas
  • Building quick demos

When Not to Use It

  • When you need full control over every step
  • When reproducibility requires explicit pipelines
  • When debugging low-level behavior

Notes

  • First call may be slower due to model downloads
  • Models are cached locally after first use
  • Subsequent calls reuse loaded resources within the same process

Direction

Grimmerie will expand into a broader system of spells for:

  • Vectorization
  • Dimensionality reduction
  • Visualization
  • Data inspection
  • ML prototyping utilities

Each designed to compress multi-step workflows into a single, intentional call.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grimmerie-0.1.6.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grimmerie-0.1.6-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file grimmerie-0.1.6.tar.gz.

File metadata

  • Download URL: grimmerie-0.1.6.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for grimmerie-0.1.6.tar.gz
Algorithm Hash digest
SHA256 a4b2933bbde62dac4bec46fc74280e3f6e17b9a165c9c84dce1897f75f792735
MD5 0f65f2ee054c7d5f924b3364b413c899
BLAKE2b-256 5d189d87ae6a3b635b28b09b622a53cf8e7cccb217114f526ffec4c6637bb7b2

See more details on using hashes here.

File details

Details for the file grimmerie-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: grimmerie-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for grimmerie-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 38ccf814aef11f2c27cac04ef41807326949de2be9b7337b31f8203b00511bd2
MD5 73861eac0e63440321b91c1a2acc6982
BLAKE2b-256 074ec426ae99a199c9998c8d273a8674bcf6745701053f1fdf95133f1d93d9a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page