The Further Framework
Project description
Further: A High-Level Conceptual Overview
NOTE: Further is currently in initial private development. Public release coming in second quarter 2026.
What Is Further?
Further is an open-source Python framework for structured scientific and data analysis. It is designed for researchers working alone, in a lab, or across collaborative multi-institution consortia who need to move naturally between exploratory analysis and rigorous, reproducible pipelines — without rewriting code when the line between the two shifts.
The central promise of Further is: focus on the analysis you are writing today, and trust the framework to manage how it connects to everything else.
One core design challenge Further addresses that most pipeline frameworks sidestep is the non-determinism of external data. Real research pipelines depend on files, databases, APIs, and other resources that change independently of the analysis code. Further treats these as first-class citizens through its resource cell abstraction: a principled mechanism for integrating external data into an otherwise deterministic dependency graph, with automatic cache invalidation, configurable freshness policies, and full traceability of which version of an external resource produced which result.
The Foundational Abstraction: The Cell
The atomic unit in Further is the cell — a named, versioned, self-contained computation. A cell declares:
- What it needs: input parameters (typed, validated Pydantic models), divided into Specs (parameters that affect the result identity) and Opts (operational settings such as logging verbosity or output format that do not change the result).
- What it depends on: other cells, listed explicitly in a
req_cellsmanifest. This is the author's complete statement of dependencies. - What it produces: named contributions — typed outputs such as DataFrames, model weights, summary statistics, or any Python object.
- How it runs: a
Makerclass whosemake()method contains the actual computation logic.
Cells are grouped into projects, which provide a namespace, shared project-level parameters, and a manifest that the framework uses to discover and version all cells at load time.
This design enforces a discipline of modular, locally-comprehensible analysis units. Authors think about one cell's purpose and immediate dependencies; the framework typically handles the rest.
Dependency Graphs and Automatic Pipeline Assembly
When a user invokes a cell through a Session, Further creates the dependency graph. By convention, we think of the graph with the root at the top, terminal nodes or leaves at the bottom, so that 'up' means towards the most dependent cells, and 'down' means into the subgraph of dependencies. The invoked cell is the root of the graph.
Further assembles the dependency graph in three graduated stages:
- Definition Graph (DG): A static, import-time representation of all declared cell dependencies across the loaded projects. No parameters have been resolved yet.
- Abstract Graph (AG): An expanded, path-sensitive structure built when a session call is initiated. Parameters are resolved symbolically; the graph is analyzed for parallelism opportunities, pre-memoization candidates, and potential deadlocks.
- Instance Tree (IT): The concrete execution tree with all parameter values resolved. This is what actually runs.
Researchers author at the DG level (declaring dependencies and parameters in cell definitions, writing the Maker logic that produces the cellular contributions) and operate at the Session level (calling a root cell with initial parameters). The intermediate layers are handled entirely by the framework.
Granular Memoization and Reproducibility
Every cell in Further has a logic key — a hash derived from its logic version, its concrete input parameters (Specs), the versions and logic keys of all its dependencies, and optionally the versions of tracked external libraries. When a cell is executed, its contributions are stored durably and indexed under this key.
On subsequent calls, the framework checks the database before running any computation. If an identical logic key exists and its stored contributions are intact, the cell is skipped entirely — its previous result is returned directly. This is granular memoization: each cell in the graph is independently cached, so a partial change (e.g., altering a parameter for a lower cell), triggers re-execution only of the affected subgraph, not the entire pipeline.
This design has several important consequences:
- Incremental computation is inexpensive. Running the same analysis with one modified parameter costs only the work that is actually new.
- Results are stable across sessions. A result computed months ago with the same logic key is returned instantly in a new session.
- Cache invalidation is explicit and versioned. Authors bump a cell's
logic_versionwhen computation logic changes; this immediately invalidates all cached results for that cell, ensuring stale results cannot be silently reused. - Reproducibility is structural, not procedural. The researcher does not need to manually track what ran and when — the framework does it through the dependency graph and logic key system.
External Resources and Non-Determinism
Standard cells in Further are purely functional: given the same input parameters and the same dependency results, they always produce the same output. Memoization is straightforward — the cache key is a hash of the inputs.
External resources break this guarantee. A file on disk, a REST API, a database table, or a cloud storage object can change at any time, independently of the analysis code. If a pipeline reads such a source, the cached result may become stale without any change to the code or declared parameters. Most pipeline frameworks either ignore this problem (silently returning stale results) or solve it crudely (always re-fetching).
Further addresses this through resource cells — a specialized cell type that extends the memoization model to include an explicitly tracked release identifier representing the version of the external data:
standard cell: output = f(inputs)
resource cell: output = f(inputs, external_state)
where external_state is tracked as a "release"
The Release Concept
A release is a lightweight string that uniquely identifies the current state of an external
resource. The author defines it by implementing a get_release() classmethod — a fast, stateless
check that returns a version identifier without reading the actual data:
| Resource type | Example release |
|---|---|
| File on disk | Modification timestamp ("2024-01-15T10:30:00Z") |
| REST API | ETag or Last-Modified header |
| External database | High-water mark (MAX(updated_at)) |
| S3 object | Object ETag or version ID |
The release is incorporated into the cell's cache key. If the release is unchanged, the cached
result is returned. If it has changed, make() is called to fetch fresh data — and the new result
is stored under the new release.
Pre-Memoization: Checking Without Fetching
The critical insight is that determining whether the cache is valid does not require reading the
data. The framework calls get_release() before deciding whether to execute make():
1. Call get_release() ← file stat, HTTP HEAD, count query: milliseconds
2. Release matches cache?
YES → return cached result ← no data transfer
NO → call make() ← full fetch, only when necessary
This pre-memoization step is especially valuable for large resources — a gigabyte file, a full database table — where the cost of an unnecessary re-fetch would be substantial.
Freshness Policies
Researchers have different needs regarding how current the external data must be. Further provides
a set of ResourceChoice policies that control how the release is selected:
| Policy | Behavior | Typical use |
|---|---|---|
latest |
Always use the most recent release | Live dashboards |
current |
Accept any release within a configurable shelf life | Periodic batch analyses |
present |
Accept any cached release, no expiration | Historical studies |
release |
Pin to a specific named release | Reproducible publications |
date_range |
Accept any release within a date window | Quarterly reports |
These policies can be set at the cell level or overridden at the project level, which is particularly powerful: by passing a single project-level instruction at session call time, a researcher can pin every resource cell in an entire pipeline to the same historical snapshot — running a full analysis against the data as it existed on a given date, without modifying any cell code.
Traceability
Every stored result carries the release string that produced it. An upper cell or database
query can always answer: which version of the external data was used to compute this result?
This is the resource equivalent of logic_version for code: an explicit, persistent record of
the external state that contributed to a stored outcome.
The Instruction vs. Identity Distinction
An important subtlety: the policy instruction (e.g., "give me whatever is current within 5 minutes") does not participate in the cache identity. Only the resolved release string does. Two callers using different policies that happen to resolve to the same release produce exactly the same stored result. This mirrors Further's handling of incremental computation, where the instruction for how many increments to compute is distinct from the increment identity that defines a stored result.
The Parameter System
Parameters in Further are designed to flow through the dependency graph automatically, so that authors can focus on the parameterization relevant to the cell they are writing.
Parameters are declared with explicit kinds that tell the framework how to handle them:
| Kind | Meaning |
|---|---|
CONST |
A fixed value known at definition time |
VAR |
A set of values — each value spawns a separate execution branch (Cartesian expansion) |
DYN |
A value computed at runtime inside the calling cell's make() |
ITER |
An iterative parameter whose value cycles until a termination condition is met |
RAND |
A random value generated fresh per execution |
QUERY |
A value resolved from a SQL query against the Further database at execution time |
Parameters exist at three scopes — session, project, and cell — with well-defined precedence rules. Project-level parameters (shared across all cells in a project) propagate through the dependency graph automatically; cells subscribe to the project parameters they need without having to receive them through every intermediate caller.
When cells from different projects interact, translations allow a calling project to rename and map its own parameters to the target project's expected names, so that shared analytical quantities can be expressed consistently within each project's own vocabulary. Addressed parameters allow highly targeted injection at specific edges deep in the graph, bypassing or complementing project-level translations.
VAR expansion deserves particular note: by declaring a parameter as VAR with a list of values
(at the session level or within a @cell_call decorator), the researcher triggers a Cartesian
product of execution instances — essentially a sweep across a parameter space — with a single call.
Instances can run in parallel and are independently memoized.
Configuration Regimes
Research is rarely conducted in a single operating mode. A data processing pipeline might need to run in a "fast exploration" mode during development (coarser thresholds, smaller samples) and a "production" mode for publication (full data, strict thresholds). A consortium pipeline might apply different configurations to the same analytical library depending on which upstream module is calling it. Further addresses this through project initialization configurations — named, reusable configuration bundles called ProjectInits.
Named Configuration Bundles
Each project can define a project_inits.yaml file containing labeled configuration contexts.
A label bundles a complete set of parameter values for that project — cell-level specs, project-wide
specs, resource freshness instructions, and framework opts — under a single named key:
# project_inits.yaml
fast:
purpose: "Exploratory mode — coarse thresholds, fast turnaround"
cell_specs:
analyzer:
threshold: 0.5
mode: "approximate"
thorough:
purpose: "Production mode — strict thresholds, full data"
cell_specs:
analyzer:
threshold: 0.01
mode: "exhaustive"
A researcher selects a label at session call time (project_init_label="thorough"), and the
framework applies the corresponding configuration across all subscribing cells. No code changes
required — the same cells, the same project, different behavior.
Composing Orthogonal Concerns
Labels can be merged at call time. When two labels govern independent concerns — for example one label controls computational intensity, another controls output formatting — they can be combined freely:
session.call("project.root", project_init_label=["high_intensity", "detailed_output"])
Further's domain system enforces mutual exclusivity where it matters: labels within the same domain (e.g., two competing speed configs) cannot be combined, while labels from different domains (speed and style) compose without conflict.
Routing Configurations Across Projects
In a multi-project pipeline the same called library project might need different configuration
depending on which part of the calling project invokes it. Called assignments handle this:
an assignment routes a specific ProjectInit label to a called project based on which cell is
making the call. A cell_quick caller routes the library to its "fast" config; a cell_deep
caller routes the same library to its "thorough" config — within the same session call, without
ambiguity.
This allows a shared analytical library to serve diverse purposes across a consortium while remaining a single, consistently versioned codebase. The configuration regime is a property of how the library is called, not of the library itself.
Cross-Project Collaboration and Modularity
Further's project concept is designed explicitly for collaborative and multi-institutional research. Different teams can develop and version their own projects independently. When one project calls another, the framework:
- Enforces declared dependency relationships (calling projects declare which other projects they call).
- Isolates parameter namespaces (called project's cells cannot see the calling project's parameters unless explicitly bridged by a translation).
- Manages trust for cross-project type sharing (pickle-based contributions from an external project require explicit session-level trust declarations).
- Supports calls to cells running in separate compute containers, with automatic serialization of parameters and contributions across boundaries.
This makes it practical for a consortium of labs to each maintain a library project of reusable analysis components, while a coordinating project assembles them into a cohesive pipeline — all tracked and memoized across sessions and institutions.
From Exploration to Production-Quality Code
A persistent tension in research computing is the gap between getting results quickly and writing efficient, reproducible, long-lived code. Further acknowledges this tension and provides a deliberate pathway between the two — with the same cell structure throughout.
The Framework Favors What It Can See Early
Further's most powerful features — pre-memoization, static parallelism planning, VAR expansion, and graph-level deduplication — all depend on the framework knowing parameter values before execution begins. The more of a cell's parameter logic that lives in statically-declared classmethods, the more the framework can do on the researcher's behalf without waiting.
The corollary is that parameters resolved only at runtime (inside make()) are opaque to the
static analysis machinery. They require the framework to first execute the calling cell's logic to produce the value,
then use that value to dispatch the child cell — a two-step runtime sequence rather than a
pre-planned dispatch. No pre-memoization, no advance parallelism planning for that branch.
Exploratory: Top-Down Interventions
When a researcher is exploring — before the right parameterization is clear — Further offers mechanisms that minimize authoring friction at the cost of some efficiency:
STUB parameters signal a runtime-resolved value that is expected to be temporary. A STUB cell
call computes its parameter inside make() and passes it dynamically, bypassing static analysis
for that edge. The framework treats it as an explicit marker of intent: "I know this might be
computed earlier eventually — I am not there yet." A STUB carries no memoization penalty for
the cells above it; only the immediate dynamic dispatch is affected.
Transformer cells, injected at session call time without any modification to the cells being transformed, allow a researcher to intercept parameter flow at specific graph edges and reshape it. This is a top-down intervention: rather than wiring transformation logic into the cells themselves, the researcher applies it externally — useful when quickly testing normalization or scaling strategies without committing them to the cell definitions.
Addressed parameters, similarly, allow precise injection of parameter values at specific edges deep in the graph from the outside — from a calling cell's form, or from the session — without modifying the called cell. A researcher can control a deep dependency's behavior during exploration without touching that dependency's code.
Together these mechanisms allow rapid iteration: run an analysis, observe the results, adjust parameters externally, re-run — without a code change cycle on every cell in the graph.
Maturing: Moving Logic Into Classmethods
As an analysis stabilizes, the natural direction is to move parameter logic from runtime into static declarations. Concretely:
-
STUB → VAR or CONST. When it becomes clear what values a parameter should take, replace the runtime computation with a classmethod that emits those values explicitly. The framework can now see the values before execution: pre-memoization kicks in, parallel dispatch is planned in advance, and VAR expansion produces a full parameter sweep automatically.
-
Addressed params → explicit
@cell_calldeclarations. When a parameter injection pattern stabilizes, consider moving it into the calling cell's explicit@cell_calldecorator makes the dependency visible in the Definition Graph. Static analysis can validate it, and other researchers reading the code can see it without tracing session-level configurations. -
Transformers → first-class cells. When a transformation proves durable, promoting it from an injected session-level transformer to a proper cell in the graph makes the flow more predictable and efficient, eliminating unexpected top-down re-direction.
Each of these moves makes the cell's behavior more legible to the framework — and to collaborators. The cell itself does not need to be rewritten from scratch; the authoring surface changes incrementally.
The Practical Guidance
Further does not force researchers to choose between quick results and good code. It allows starting in exploratory mode — with STUBs, addressed params, and injected transformers — and migrating toward a fully statically-declared graph as understanding accumulates. The memoization system ensures this migration costs nothing computationally: results from the exploratory phase that remain valid are reused exactly, and only genuinely new computations run.
The destination — a cell graph where most parameter values are declared in exposed classmethods, versions are bumped intentionally, and configurations are named and composable — is a graph that the framework can schedule, deduplicate, and pre-memoize with maximum efficiency, and that collaborators can read, reuse, and extend with confidence.
Versioning: Evolving Analyses Without Breaking Reproducibility
A defining challenge in long-running research is that "final" results from one phase become inputs to the next — but the underlying analyses continue to evolve. Further addresses this through a two-axis versioning system on every cell:
logic_version: Tracks what the cell computes. A bump invalidates cached results.api_version: Tracks the cell's public interface (its parameters and contributions). A bump signals compatibility changes to callers.
Old cell versions can be archived alongside the current version. Both versions coexist in the
same session's dependency graph. Callers can pin to specific version ranges (e.g., ">=1.2,<2.0")
or always use the current stable version. When a later analysis needs the result of an older
analytical approach for comparison, it simply pins to the archived cell — no manual file management
required.
Projects have analogous versioning for their public interfaces. A project at version 2.0 can coexist with archived version 1.x, allowing gradual migration of dependents.
Execution Modes
Further is designed for heavy computational workloads. It provides six execution modes organized in a two-branch hierarchy:
INLINE
/ \
LOCAL DASK
| |
LOCAL_PROCESSES DASK_STATIC_DAG
|
PREFECT_STATIC_DAG
The session sets the infrastructure ceiling — the most capable mode available for the run. Individual cells and projects can select a mode at or below that ceiling within their branch, but cannot escalate across branches (a LOCAL session cannot use DASK, and vice versa).
Local Branch
- INLINE: Synchronous, single-thread execution. Suitable for development, debugging, and cells that manage their own parallelism internally.
- LOCAL: Parallel execution using a thread pool (
ThreadPoolExecutor). Independent cells run concurrently without requiring any external infrastructure. - LOCAL_PROCESSES: Parallel execution using a process pool (
ProcessPoolExecutor). Useful when cells are CPU-bound and benefit from true multiprocessing, at the cost of serialization overhead across process boundaries.
Distributed Branch
- DASK: Distributed parallel execution using a Dask cluster (local or remote). The dependency graph drives automatic parallelism — independent cells are submitted to Dask workers concurrently without any author intervention.
- DASK_STATIC_DAG: An optimization over standard DASK. When a fully-static subtree — a region
of the graph with no runtime-resolved parameters — exceeds a configurable size threshold
(
static_dag_node_thresh), the framework pre-computes all logic keys, performs a single batch cache check, and submits the entire subtree as a Dask-native dependency graph in one operation. This eliminates per-node round trips between the Hub and the scheduler. A single session may produce multiple such "static chapters," each submitted independently as the execution graph unfolds. - PREFECT_STATIC_DAG: Wraps the Dask static DAG submission in a Prefect
@flowwithDaskTaskRunner. Prefect is not a separate compute engine — Dask still performs all computation. The wrapper gives the Prefect UI visibility into the true DAG topology, enabling monitoring, tagging, and retry configuration through Prefect's observability layer. If Prefect is not installed, the framework falls back gracefully to plain Dask static DAG submission with a warning.
Parallel Opt-Out
Any cell can set in_parallel=False to force synchronous (INLINE) execution regardless of the
session's mode. This is useful for cells that create their own thread or process pools internally
and cannot safely be dispatched in parallel by the framework.
Container Dispatch
Individual cells can be assigned to specific compute containers (e.g., GPU-enabled environments) while the rest of the graph runs on standard hardware. The framework manages the serialization and routing of parameters and contributions across container boundaries.
Advanced Computation Patterns
Beyond standard batch pipelines, Further supports several patterns tailored to research workflows:
Iterative cycles (ITER): A cell can call another cell with an ITER parameter, creating a static unrolling of a fixed-depth loop in the dependency graph. Call switches provide termination conditions evaluated on Specs/Opts, enabling conditional short-circuiting at known stopping points.
Incrementing cells ("anytime algorithms"): For computations that produce useful partial results at every step — MCMC chains, iterative optimizers, phylogenetic tree construction — Further supports incrementing cells. Each increment is independently memoized. A caller can request exactly N increments, at least N, the best currently stored, or a fixed amount of new work beyond what exists. The framework resumes from the last stored increment rather than restarting.
Recursive units: Multi-cell cycles with dynamic termination (convergence detection inside
make()), modeled as a composite node in the dependency graph.
Transformer cells: Cells that intercept and reshape parameters flowing through an edge, enabling context-sensitive transformations (e.g., normalization, scaling) to be injected without modifying the analysis cells themselves.
Persistence Architecture
Further stores results at two levels:
- PostgreSQL metadata database: Tracks cell definitions, execution history, logic keys, run status, parameters, and the full dependency topology. This is the source of truth for memoization and enables introspection.
- Blob storage: Large contributions (DataFrames, arrays, model weights) are stored in configurable blob storage — local filesystem, S3/MinIO, or PostgreSQL large objects. They are keyed by a content-addressing scheme and loaded on demand.
The separation means that small scalar results (summary statistics, counts, flags) can be stored
directly in the database as recorded contributions, making them immediately queryable via SQL
without any blob retrieval. Larger objects travel through blob storage and are loaded only when
the calling cell's make() method requests them.
Introspection and Self-Referential Analysis
The Further database is queryable from within Further itself. Cells can be authored with database makers that issue SQL queries against the execution history — for example, to retrieve all stored results matching a particular parameter sweep, compare increments of a computation, or aggregate contributions across sessions. This allows today's higher-level analysis to be literally constructed from the persistent record of yesterday's lower-level runs.
The QUERY parameter kind takes this further: a parameter value can be resolved at execution time
by a SQL query against the database, so that the set of execution instances in a VAR-like expansion
is drawn dynamically from stored results rather than hard-coded by the author.
Design Philosophy Summary
Further is built around five interlocking ideas:
- Local focus, global coherence. Authors think about one cell and its immediate dependencies; the framework maintains the whole graph.
- Reproducibility as infrastructure, not discipline. Memoization, versioning, and result tracking are structural properties, not conventions to be followed.
- Exploration and production on the same track. The same cell can be run exploratorily today and serve as a memoized dependency in a larger pipeline tomorrow, with no code changes.
- Parameter state belongs to the graph. Global parameter consistency across a complex dependency graph is a framework responsibility, not an author responsibility.
- Open-ended time horizons. Results are persisted across sessions, versions coexist, and incremental computations resume — because research rarely ends.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file further-0.1.0.tar.gz.
File metadata
- Download URL: further-0.1.0.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35773c2fe9da4da20b2d10cf513875e197add015f70c304416cdb54cfdbac2d4
|
|
| MD5 |
91da0660b9e54291b9e88babe6a15448
|
|
| BLAKE2b-256 |
fe42dda2bc7ba07d4fd27a1a701a899a1393dc7f95926952fc353b0d85d01a8c
|
File details
Details for the file further-0.1.0-py3-none-any.whl.
File metadata
- Download URL: further-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c646d895a7f7ce168f4f2b2b6a740d58ab4c42a3c85bcb3d2f576e3ae0e5cd51
|
|
| MD5 |
01f57db9b81884a9d96b2578d92023f5
|
|
| BLAKE2b-256 |
4a318031a1556da9dc8b2f00a41183e75f5478024c0626df905925f1ba9f221a
|