RegiStream Autolabel: accessor for register data labeling.
Project description
registream-autolabel
Apply variable and value labels from the RegiStream catalog to pandas DataFrames. Native schema v2; depth-agnostic scope; Jupyter-friendly return types; matplotlib + seaborn plot-label integration.
Full documentation: https://registream.org/docs/autolabel/python.
Install
pip install registream-autolabel
Pulls in registream-core as a dependency. For the full ecosystem
(core + autolabel + future modules) you can instead
pip install registream (meta-package).
Requires Python 3.11 or later. Pandas is the only hard runtime
dependency besides registream-core. Seaborn is optional; install it
separately to light up the label-aware plot wrappers.
Quick start
import pandas as pd
import registream.autolabel # side effect: installs autolabel methods on pd.DataFrame
df = pd.read_stata("lisa_2020.dta")
# Apply variable and value labels from SCB metadata (English); scope auto-inferred.
df.autolabel(domain="scb", lang="eng")
# Display-time labeled view without mutating df.
df.lab.head()
Labels land on df.attrs['registream']; the column data itself is
never mutated.
What you get as DataFrame methods
Importing the package adds a small set of methods directly onto
pd.DataFrame, so the API reads like a native pandas method:
| Method | What it does |
|---|---|
df.autolabel(domain, lang, scope, release, …) |
Apply variable + value labels |
df.lookup(variables, detail=…) |
Metadata for one or more variables (returns a LookupResult) |
df.lab |
A LabeledView for display-time labeling (a property) |
df.variable_labels() |
Dict of variable labels |
df.value_labels() |
Dict of value labels |
df.get_variable_labels(columns) / df.get_value_labels(columns) |
Column-aware getters |
df.set_variable_labels({...}) / df.set_value_labels(col, {...}) |
In-place edits |
df.copy_labels(source, target) |
Copy a label bundle between columns |
df.meta_search(pattern) |
Filter label metadata by regex |
This matches the Stata surface (autolabel variables, domain(scb) lang(eng)) and the R surface (df |> autolabel(domain = "scb", lang = "eng")) verb-for-verb: the data is always the subject; the command is
always the verb.
Module-level functions
For operations that don't have a single DataFrame as their subject:
from registream.autolabel import suggest, scope, info, cite, update_datasets
suggest(df) # preview coverage; returns SuggestResult
scope(domain="scb", lang="eng") # catalog browser, no df
update_datasets("scb", "eng") # refresh the on-disk metadata bundle
info() # dict: config + cache + versions
cite() # versioned APA citation
Full signatures, arguments, labeling rules, and worked examples are on the Python reference page.
Command-line
python -m registream.autolabel version # installed autolabel version
python -m registream.autolabel info # config + cache + versions
python -m registream.autolabel cite # APA citation
Plotting integration
When seaborn is installed, autolabel wraps 16 plotting functions on import so value labels show on categorical axes and variable labels flow into axis titles + legend. Zero extra setup:
import seaborn as sns
import registream.autolabel # wraps seaborn on import
df.autolabel(domain="scb", lang="eng")
sns.barplot(data=df, x="kon", y="alder")
# x-axis ticks: "Man", "Woman"
# x label: "Sex"
# y label: "Age (years)"
Opt out with REGISTREAM_NO_PLOT_PATCH=1. Pandas column patches (labels
follow through df["new"] = df["old"] and df.rename(columns=...)) opt
out with REGISTREAM_NO_PANDAS_PATCH=1. All three opt-outs read the
environment at import time.
Library-author opt-out
If you're writing a library that imports registream.autolabel as a
transitive dependency and don't want to add methods to your users'
DataFrames, set REGISTREAM_NO_SHORTCUTS=1 before import. The accessor
(df.rs.*) and the module-level functions stay fully available:
from registream.autolabel import autolabel, lookup
autolabel(df, domain="scb", lang="eng")
End-user documentation uses the method form throughout; this opt-out exists so library code doesn't surprise end users.
Catalog coverage
Metadata bundles ship for Statistics Sweden (scb), Statistics Denmark
(dst), Statistics Norway (ssb), Statistics Iceland (hagstofa),
Försäkringskassan (fk), and Socialstyrelsen (sos). Institutions can
create their own domains; see the
schema v2 reference and
the institutional setup guide.
Citation
Clark, J. & Wen, J. (2024–). RegiStream: Infrastructure for Register Data Research. https://registream.org
registream.autolabel.cite() returns the versioned APA form.
Authors
- Jeffrey Clark — jeffrey@registream.org
- Jie Wen — jie@registream.org
License
BSD 3-Clause. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file registream_autolabel-3.0.0.tar.gz.
File metadata
- Download URL: registream_autolabel-3.0.0.tar.gz
- Upload date:
- Size: 119.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db92f745dda0db227fb6d1f585ae0281eec8d60345613a7cae9a51494366e56b
|
|
| MD5 |
ffcc33cb280bf490da6c2876303494e1
|
|
| BLAKE2b-256 |
521bc24e8b32d7caabbffebb28c325fe6f8ddf3e67ffcc6cbe92c3066a30e7ab
|
File details
Details for the file registream_autolabel-3.0.0-py3-none-any.whl.
File metadata
- Download URL: registream_autolabel-3.0.0-py3-none-any.whl
- Upload date:
- Size: 48.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc8a0198581aca7228efd8e979c3162458bb141dc3da88d38d087820a0e77eca
|
|
| MD5 |
fe1ce87e07af147b6757c6fb8f732000
|
|
| BLAKE2b-256 |
04657f823d9408025b054bcac04bb6f6e3d6021dcf369fb887b6d765e763dbfa
|