Fellesfunksjoner for ssb i Python

These details have not been verified by PyPI

Project description

SSB Fag-fellesfunksjoner i Python

A place for "loose, small functionality" produced at Statistics Norway in Python. Functionality might start here, if it is to be used widely within the organization, but later be moved to bigger packages if they "grow out of proportions".

Team: ssb-pythonistas

We are a team of statisticians which hope to curate and generalize functionality which arizes from specific needs in specific production-environments. We try to take responsibility for this functionality to be generalized and available to all of statistics Norway through this package.

Pypi-account Github-team

Contributing

Please make contact with one of our team members, to see if you can join, or how to send in a PR for approval into the package.

Installing

poetry add ssb-fagfunksjoner

Usage

Environment / Pathing

Check if you are on Dapla or in prodsone.

from fagfunksjoner import check_env


check_env()

Navigate to the root of your project and back again. Do stuff while in root, like importing local functions.

from fagfunksjoner import ProjectRoot


with ProjectRoot():
    ... # Do your local imports here...

Sasfiles

Setting up password with saspy

from fagfunksjoner.prodsone import saspy_ssb


saspy_ssb.set_password() # Follow the instructions to set the password
saspy_ssb.saspy_df_from_path("path")

Logger that follows SSB standards

import logging

from fagfunksjoner import StatLogger


# Ved å opprette StatLogger så "hijacker" den den vanlige loggeren
root_logger = StatLogger(log_file="custom_log_file.log")
# I tillegg sørger vi for at den ikke blir ryddet bort av Python, ved å assigne den til en variabel?

logger = logging.getLogger(__name__)
logger.info("This is an info message")

Export XMLs that can be imported into the KLASS UI

from fagfunksjoner import make_klass_xml_codelist


make_klass_xml_codelist(path="kjoenn.xml",
    codes=["1", "2"],
    names_bokmaal=["Mann", "Kvinne"])

Round data UP

import pandas as pd

from fagfunksjoner import round_up


print(round(2.5, 0), round_up(2.5, 0))

round_up(pd.Series([1.5, 2.5, 3.5]), 0)  # Datatype blir Int64 når man runder til 0 desimaler
round_up(pd.Series([1.15, 2.15, 3.15]), 1)  # Datatype blir Float64 når man runder til mer enn 0 desimaler

df = pd.DataFrame(
    {"col1": [1.5, 2.5, 1.2345, 1.2355],
    "col2": [3.5, 4.5, 5.6789, 6.7891]}
    ).astype({"col1": "Float64", "col2": "Float64"})
rounded = round_up(df, decimal_places=0, col_names="col1")  # Avrunder kun col1, den endrer datatype til Int64

rounded2 = round_up(df, col_names={"col1": 1, "col2": 2})  # Avrunder col1 til 1 desimal, col2 til 2 desimaler

Aggregation / Categories

Aggregate on all exclusive combinations of codes in certain columns (maybe before sending to statbank? Like proc means?)

from fagfunksjoner import all_combos_agg


ialt_koder = {
"skolefylk": "01-99",
"almyrk": "00",
"kjoenn_t": "0",
"sluttkomp": "00",
}
kolonner = list(ialt_koder.keys())
tab = all_combos_agg(vgogjen,
                     groupcols=kolonner,
                     aggargs={'antall': sum},
                     fillna_dict=ialt_koder)

To aggregate on NON-EXCLUSIVE combinations of codes in certain columns, use the slightly less process-effective

from fagfunksjoner import all_combos_agg_inclusive


category_mappings = {
    "Alder": {
        "15-24": range(15, 25),
        "25-34": range(25, 35),
        "35-44": range(35, 45),
        "45-54": range(45, 55),
        "55-66": range(55, 67),
        "15-21": range(15, 22),
        "22-30": range(22, 31),
        "31-40": range(31, 41),
        "41-50": range(41, 51),
        "51-66": range(51, 67),
        "15-30": range(15, 31),
        "31-45": range(31, 46),
        "46-66": range(46, 67),
    },
    "syss_student": {
        "01": ["01", "02"],
        "02": ["03", "04"],
        "03": ["02"],
        "04": ["04"],
    },
    "Kjonn": {
        "Menn": ["1"],
        "Kvinner": ["2"],
    }
}

totalcodes = {
    "Alder": "Total",
    "syss_student": "Total",
    "Kjonn": "Begge"
}
all_combos_agg_inclusive(
    synthetic_data,
    groupcols = [],
    category_mappings=category_mappings,
    totalcodes=totalcodes,
    valuecols = ["n"],
    aggargs={"n": "sum"},
    grand_total=True)

"Formats" like in SAS

Perform mapping using SsbFormat. Behaves like a dictionary. Has functionality for mapping ranges and 'other'-category and detecting different types of NaN-values. Does not handle non-exclusive / overlapping categories, please only use for exclusive categories.

from fagfunksjoner import SsbFormat


age_frmt = {
'low-18': '-18',
'19-25': '19-25',
'26-35': '26-35',
'36-45': '36-45',
'46-55': '46-55',
'56-high': '56+',
'other': 'missing'
}

# convert dictionary to SsbFormat
ssb_age_frmt = SsbFormat(age_frmt)

# perform mapping of age using ranges in format.
df['age_group'] = df['age'].map(ssb_age_frmt)

print(df['age_group'].value_counts())

# save format
from fagfunksjoner.formats import store_format


store_format(path+'format_name_p2025-02.json')

# or
# NB! after performing range mapping using SsbFormat. The dictionary will be long. You should save a short version. Inspect the dictionary before saving/storing.
ssb_age_frmt.store(path + 'format_name_p2025-02.json', force=True)

# read format/import format (dictionary saved as .json) as SsbFormat
from fagfunksjoner.formats import get_format


some_frmt = get_format(path+'format_name.json')

Opening archive-files based on Datadok-api in prodsone

We have "flat files", which are not comma seperated. These need metadata to correctly open. In SAS we do this with "lastescript". But there is an API to old Datadok in prodsone, so these functions let you just specify a path, and attempt to open the flat files directly into pandas, with the metadata also available.

from fagfunksjoner import open_path_datadok


archive_object = open_path_datadok("$TBF/project/arkiv/filename/g2022g2023")
# The object now has several important attributes
archive_object.df  # The Dataframe of the archived data
archive_object.metadata_df  # Dataframe representing metadata
archive_object.codelist_df  # Dataframe representing codelists
archive_object.codelist_dict  # Dict of codelists
archive_object.names  # Column names in the archived data
archive_object.datatypes  # The datatypes the archivdata ended up having?
archive_object.widths  # Width of each column in the flat file

Operation to Oracle database

Remember that any credidential values to the database should not be stored in our code. Possibly use python-dotenv package to make this easier.

Example for a normal select query where we expect not too many records:

import os

import pandas as pd
from doteng import load_dotenv

from fagfunksjoner.prodsone import Oracle


load_dotenv()

query = "select vare, pris from my_db_table"

ora = Oracle(pw=os.getenv("my-secret-password"),
             db=os.getenv("database-name"))

df = pd.DataFrame(ora.select(sql=query))

ora.close()

Example for a select query where possibly many records:

import os

import pandas as pd
from doteng import load_dotenv

from fagfunksjoner.prodsone import Oracle


load_dotenv()

query = "select vare, pris from my_db_table"

ora = Oracle(pw=os.getenv("my-secret-password"),
             db=os.getenv("database-name"))

df = pd.DataFrame(ora.selectmany(sql=query, batchsize=10000))

ora.close()

Example for inserting new records into database(note that ordering of the columns in sql query statement and data are important):

import os

import pandas as pd
from doteng import load_dotenv

from fagfunksjoner.prodsone import Oracle


load_dotenv()

df = pd.DataFrame(
    {
        "vare": ["banan", "eple"],
        "pris": [11, 10]
    }
)

data = list(df.itertuples(index=False, name=None))

query = "insert into my_db_table(vare, pris) values(:vare, :pris)"

ora = Oracle(pw=os.getenv("my-secret-password"),
             db=os.getenv("database-name"))

ora.insert_or_update(sql=query, update=data)

ora.close()

Example for updating records in the database(note that ordering of the columns in sql query statement and data are important. It is also important that the query doesn't update other records than it should. Having some kind of ID to the records will be very usefull!):

import os
import pandas as pd
from doteng import load_dotenv
from fagfunksjoner.prodsone import Oracle
load_dotenv()

df = pd.DataFrame(
    {
        "id": ["12345", "54321"]
        "vare": ["banan", "eple"],
        "pris": [11, 10]
    }
)

data = list(df[["vare", "pris", "id"]].itertuples(index=False, name=None))

query = "update my_db_table set vare = :vare, pris = :pris where id = :id"

ora = Oracle(pw=os.getenv("my-secret-password"),
             db=os.getenv("database-name"))

ora.insert_or_update(sql=query, update=data)

ora.close()

It also support context manager. This is handy when working with big data, and you then have to work more lazy. Or you want to do multiple operations to several tables without closing the connections. Or any other reasons... An easy case; reading large data from database and write it to a parquet file, in batches:

import os
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
from doteng import load_dotenv
from fagfunksjoner.prodsone import Oracle, OraError
load_dotenv()

select_query = "select vare, pris from my_db_table"
parquet_write_path = "write/to/path/datafile.parquet"

with pq.ParquetWriter(parquet_write_path) as pqwriter: # pyarrow schema might be needed
    try:
        # will go straight to cursor
        with Oracle(pw=os.getenv("my-secret-password"),
                db=os.getenv("database-name")) as concur:
            concur.execute(select_query)
            cols = [c[0].lower() for c in cur.description]
            while True:
                rows = cur.fetchmany(10_000) # 10.000 rows per batch
                if not rows:
                    break
                else:
                    data = [dict(zip(cols, row)) for row in rows]
                    tab = pa.Table.from_pylist(data)
                    # this will write data to one row group per batch
                    pqwriter.write_table(tab)
    except OraError as error:
        raise error

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, SSB Fagfunksjoner is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from Statistics Norway's SSB PyPI Template.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.4

Feb 25, 2026

1.1.3

Jan 20, 2026

1.1.2

Jul 11, 2025

1.1.1

Jun 19, 2025

This version

1.1.0

Apr 8, 2025

1.0.9

Apr 3, 2025

1.0.8

Feb 20, 2025

1.0.7

Nov 8, 2024

1.0.6

Nov 8, 2024

1.0.5

Oct 28, 2024

1.0.4

Oct 15, 2024

1.0.3

Aug 26, 2024

1.0.2

Aug 19, 2024

1.0.1

Aug 8, 2024

1.0.0

Aug 6, 2024

0.1.10

Jul 2, 2024

0.1.9

Jul 2, 2024

0.1.8

Jul 2, 2024

0.1.7

Jun 28, 2024

0.1.6

Jun 25, 2024

0.1.5

Feb 12, 2024

0.1.0

Nov 10, 2023

0.0.8

Sep 25, 2023

0.0.7

Sep 12, 2023

0.0.6

Jun 22, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssb_fagfunksjoner-1.1.0.tar.gz (64.0 kB view details)

Uploaded Apr 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ssb_fagfunksjoner-1.1.0-py3-none-any.whl (70.0 kB view details)

Uploaded Apr 8, 2025 Python 3

File details

Details for the file ssb_fagfunksjoner-1.1.0.tar.gz.

File metadata

Download URL: ssb_fagfunksjoner-1.1.0.tar.gz
Upload date: Apr 8, 2025
Size: 64.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for ssb_fagfunksjoner-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`86d4336acb4c7921c406721be2c90c66baa60c967842708e3fe34df1c8e539fe`
MD5	`87f826ce08ac59f62243e5059cba33b6`
BLAKE2b-256	`21a09cd95785b3a63ab18c28238f74038ca17455e342a5f3e56612ab175fedc0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ssb_fagfunksjoner-1.1.0.tar.gz:

Publisher: release.yml on statisticsnorway/ssb-fagfunksjoner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ssb_fagfunksjoner-1.1.0.tar.gz
- Subject digest: 86d4336acb4c7921c406721be2c90c66baa60c967842708e3fe34df1c8e539fe
- Sigstore transparency entry: 193800284
- Sigstore integration time: Apr 8, 2025
Source repository:
- Permalink: statisticsnorway/ssb-fagfunksjoner@21787ea51b33b2cac9f3e9ea0537386755e984a1
- Branch / Tag: refs/heads/main
- Owner: https://github.com/statisticsnorway
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@21787ea51b33b2cac9f3e9ea0537386755e984a1
- Trigger Event: push

File details

Details for the file ssb_fagfunksjoner-1.1.0-py3-none-any.whl.

File metadata

Download URL: ssb_fagfunksjoner-1.1.0-py3-none-any.whl
Upload date: Apr 8, 2025
Size: 70.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for ssb_fagfunksjoner-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f10b10e5a17d396f9700a1910fad30bdc090791cc5fc1dff887df11199d125df`
MD5	`467de91b02cdfcffa0ffc4f040bf14ff`
BLAKE2b-256	`97bb74064e7dc8b5b6281f54d8c72084c47dc5a566e89e834d7318ad1b54b318`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ssb_fagfunksjoner-1.1.0-py3-none-any.whl:

Publisher: release.yml on statisticsnorway/ssb-fagfunksjoner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ssb_fagfunksjoner-1.1.0-py3-none-any.whl
- Subject digest: f10b10e5a17d396f9700a1910fad30bdc090791cc5fc1dff887df11199d125df
- Sigstore transparency entry: 193800288
- Sigstore integration time: Apr 8, 2025
Source repository:
- Permalink: statisticsnorway/ssb-fagfunksjoner@21787ea51b33b2cac9f3e9ea0537386755e984a1
- Branch / Tag: refs/heads/main
- Owner: https://github.com/statisticsnorway
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@21787ea51b33b2cac9f3e9ea0537386755e984a1
- Trigger Event: push

ssb-fagfunksjoner 1.1.0

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SSB Fag-fellesfunksjoner i Python

Team: ssb-pythonistas

Contributing

Installing

Usage

Environment / Pathing

Sasfiles

Logger that follows SSB standards

Export XMLs that can be imported into the KLASS UI

Round data UP

Aggregation / Categories

"Formats" like in SAS

Opening archive-files based on Datadok-api in prodsone

Operation to Oracle database

Contributing

License

Issues

Credits

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance