Tools for the microdata.no platform
Project description
microdata-tools
Tools for the microdata.no platform
Installation
microdata-tools
can be installed from PyPI using pip:
pip install microdata-tools
Usage
Once you have your metadata and data files ready to go, they should be named and stored like this:
my-input-directory/
MY_DATASET_NAME/
MY_DATASET_NAME.csv
MY_DATASET_NAME.json
The CSV file is optional in some cases.
Package dataset
The package_dataset()
function will encrypt and package your dataset as a tar archive. The process is as follows:
- Generate the symmetric key for a dataset.
- Encrypt the dataset data (CSV) using the symmetric key and store the encrypted file as
<DATASET_NAME>.csv.encr
- Encrypt the symmetric key using the asymmetric RSA public key
microdata_public_key.pem
and store the encrypted file as<DATASET_NAME>.symkey.encr
- Gather the encrypted CSV, encrypted symmetric key and metadata (JSON) file in one tar file.
Unpackage dataset
The unpackage_dataset()
function will untar and decrypt your dataset using the microdata_private_key.pem
RSA private key.
The packaged file has to have the <DATASET_NAME>.tar
extension. Its contents should be as follows:
<DATASET_NAME>.json
: Required medata file.
<DATASET_NAME>.csv.encr
: Optional encrypted dataset file.
<DATASET_NAME>.symkey.encr
: Optional encrypted file containing the symmetrical key used to decrypt the dataset file. Required if the .csv.encr
file is present.
Decryption uses the RSA private key located at RSA_KEY_DIR
.
The packaged file is then stored in output_dir/archive/unpackaged
after a successful run or output_dir/archive/failed
after an unsuccessful run.
Example
Python script that uses a RSA public key named microdata_public_key.pem
and packages a dataset:
from pathlib import Path
from microdata_tools import package_dataset
RSA_KEYS_DIRECTORY = Path("tests/resources/rsa_keys")
DATASET_DIRECTORY = Path("tests/resources/input_package/DATASET_1")
OUTPUT_DIRECTORY = Path("tests/resources/output")
package_dataset(
rsa_keys_dir=RSA_KEYS_DIRECTORY,
dataset_dir=DATASET_DIRECTORY,
output_dir=OUTPUT_DIRECTORY,
)
Validation
Once you have your metadata and data files ready to go, they should be named and stored like this:
my-input-directory/
MY_DATASET_NAME/
MY_DATASET_NAME.csv
MY_DATASET_NAME.json
Note that the filename only allows upper case letters A-Z, number 0-9 and underscores.
Import microdata-tools in your script and validate your files:
from microdata_tools import validate_dataset
validation_errors = validate_dataset(
"MY_DATASET_NAME",
input_directory="path/to/my-input-directory"
)
if not validation_errors:
print("My dataset is valid")
else:
print("Dataset is invalid :(")
# You can print your errors like this:
for error in validation_errors:
print(error)
For a more in-depth explanation of usage visit the usage documentation.
Data format description
A dataset as defined in microdata consists of one data file, and one metadata file.
The data file is a csv file seperated by semicolons. A valid example would be:
000000000000001;123;2020-01-01;2020-12-31;
000000000000002;123;2020-01-01;2020-12-31;
000000000000003;123;2020-01-01;2020-12-31;
000000000000004;123;2020-01-01;2020-12-31;
Read more about the data format and columns in the documentation.
The metadata files should be in json format. The requirements for the metadata is best described through the Pydantic model, the examples, and the metadata model.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file microdata_tools-1.0.2.tar.gz
.
File metadata
- Download URL: microdata_tools-1.0.2.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6251cb92508c5e825c069b641c0b16ae0a41ee905300164ee9fb3ac845850272 |
|
MD5 | b4ccae5b37a4511bc9d33f80127c28ab |
|
BLAKE2b-256 | d13329fc072501cbcb4592a2896a6cf87a364ec8afa063a781e90eb0ab9ab127 |
File details
Details for the file microdata_tools-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: microdata_tools-1.0.2-py3-none-any.whl
- Upload date:
- Size: 50.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1d7b3e67bd82a6ba68e80614dd678ab1a2d281d84ba0212108055bd9a41ef83 |
|
MD5 | ed865504e09bfbda5a27ff5bea086681 |
|
BLAKE2b-256 | 99dcc14b7fde81886d60e68b09753b06435df445e0e0d47ddc7d188b256332da |