Skip to main content

RegiStream Autolabel: accessor for register data labeling.

Project description

registream-autolabel

Apply variable and value labels from the RegiStream catalog to pandas DataFrames. Native schema v2; depth-agnostic scope; Jupyter-friendly return types; matplotlib + seaborn plot-label integration.

Full documentation: https://registream.org/docs/autolabel/python.

Install

pip install registream-autolabel

Pulls in registream-core as a dependency. For the full ecosystem (core + autolabel + future modules) you can instead pip install registream (meta-package).

Requires Python 3.11 or later. Pandas is the only hard runtime dependency besides registream-core. Seaborn is optional; install it separately to light up the label-aware plot wrappers.

Quick start

import pandas as pd
import registream.autolabel  # side effect: installs autolabel methods on pd.DataFrame

df = pd.read_stata("lisa_2020.dta")

# Apply variable and value labels from SCB metadata (English); scope auto-inferred.
df.autolabel(domain="scb", lang="eng")

# Display-time labeled view without mutating df.
df.lab.head()

Labels land on df.attrs['registream']; the column data itself is never mutated.

What you get as DataFrame methods

Importing the package adds a small set of methods directly onto pd.DataFrame, so the API reads like a native pandas method:

Method What it does
df.autolabel(domain, lang, scope, release, …) Apply variable + value labels
df.lookup(variables, detail=…) Metadata for one or more variables (returns a LookupResult)
df.lab A LabeledView for display-time labeling (a property)
df.variable_labels() Dict of variable labels
df.value_labels() Dict of value labels
df.get_variable_labels(columns) / df.get_value_labels(columns) Column-aware getters
df.set_variable_labels({...}) / df.set_value_labels(col, {...}) In-place edits
df.copy_labels(source, target) Copy a label bundle between columns
df.meta_search(pattern) Filter label metadata by regex

This matches the Stata surface (autolabel variables, domain(scb) lang(eng)) and the R surface (df |> autolabel(domain = "scb", lang = "eng")) verb-for-verb: the data is always the subject; the command is always the verb.

Module-level functions

For operations that don't have a single DataFrame as their subject:

from registream.autolabel import suggest, scope, info, cite, update_datasets

suggest(df)                                   # preview coverage; returns SuggestResult
scope(domain="scb", lang="eng")               # catalog browser, no df
update_datasets("scb", "eng")                 # refresh the on-disk metadata bundle

info()                                        # dict: config + cache + versions
cite()                                        # versioned APA citation

Full signatures, arguments, labeling rules, and worked examples are on the Python reference page.

Command-line

python -m registream.autolabel version     # installed autolabel version
python -m registream.autolabel info        # config + cache + versions
python -m registream.autolabel cite        # APA citation

Plotting integration

When seaborn is installed, autolabel wraps 16 plotting functions on import so value labels show on categorical axes and variable labels flow into axis titles + legend. Zero extra setup:

import seaborn as sns
import registream.autolabel  # wraps seaborn on import

df.autolabel(domain="scb", lang="eng")

sns.barplot(data=df, x="kon", y="alder")
# x-axis ticks: "Man", "Woman"
# x label:     "Sex"
# y label:     "Age (years)"

Opt out with REGISTREAM_NO_PLOT_PATCH=1. Pandas column patches (labels follow through df["new"] = df["old"] and df.rename(columns=...)) opt out with REGISTREAM_NO_PANDAS_PATCH=1. All three opt-outs read the environment at import time.

Library-author opt-out

If you're writing a library that imports registream.autolabel as a transitive dependency and don't want to add methods to your users' DataFrames, set REGISTREAM_NO_SHORTCUTS=1 before import. The accessor (df.rs.*) and the module-level functions stay fully available:

from registream.autolabel import autolabel, lookup
autolabel(df, domain="scb", lang="eng")

End-user documentation uses the method form throughout; this opt-out exists so library code doesn't surprise end users.

Catalog coverage

Metadata bundles ship for Statistics Sweden (scb), Statistics Denmark (dst), Statistics Norway (ssb), Statistics Iceland (hagstofa), Försäkringskassan (fk), and Socialstyrelsen (sos). Institutions can create their own domains; see the schema v2 reference and the institutional setup guide.

Citation

Clark, J. & Wen, J. (2024–). RegiStream: Infrastructure for Register Data Research. https://registream.org

registream.autolabel.cite() returns the versioned APA form.

Authors

License

BSD 3-Clause. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

registream_autolabel-3.0.0.tar.gz (119.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

registream_autolabel-3.0.0-py3-none-any.whl (48.3 kB view details)

Uploaded Python 3

File details

Details for the file registream_autolabel-3.0.0.tar.gz.

File metadata

  • Download URL: registream_autolabel-3.0.0.tar.gz
  • Upload date:
  • Size: 119.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for registream_autolabel-3.0.0.tar.gz
Algorithm Hash digest
SHA256 db92f745dda0db227fb6d1f585ae0281eec8d60345613a7cae9a51494366e56b
MD5 ffcc33cb280bf490da6c2876303494e1
BLAKE2b-256 521bc24e8b32d7caabbffebb28c325fe6f8ddf3e67ffcc6cbe92c3066a30e7ab

See more details on using hashes here.

File details

Details for the file registream_autolabel-3.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for registream_autolabel-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc8a0198581aca7228efd8e979c3162458bb141dc3da88d38d087820a0e77eca
MD5 fe1ce87e07af147b6757c6fb8f732000
BLAKE2b-256 04657f823d9408025b054bcac04bb6f6e3d6021dcf369fb887b6d765e763dbfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page