A set of static assets used (mainly) for ARCHE data preprocessing
Project description
Arche Assets
Set of static assets used (mainly) for ARCHE data preprocessing or ARCHE information pages:
- URI normalization rules used within the ACDH-CH.
(stored inAcdhArcheAssets/uriNormRules.json
) - Description of input data formats accepted by ARCHE.
(stored inAcdhArcheAssets/formats.json
)
The repository provides also Python 3 and PHP bindings for accessing those assets.
Installation & usage
Python
- Install using pip3:
pip3 install acdh-arche-assets
- Use with
from AcdhArcheAssets.uri_norm_rules import get_rules, get_normalized_uri, get_norm_id print(f"{get_rules()}") wrong_id = "http://sws.geonames.org/1232324343/linz.html" good_id = get_normalized_uri(wrong_id) print(good_id) # "https://sws.geonames.org/1232324343/" # extract ID from URL norm_id = get_norm_id("http://sws.geonames.org/1232324343/linz.html") print(norm_id) # "1232324343" from AcdhArcheAssets.file_formats import get_formats, get_by_mtype, get_by_extension formats = get_formats() matching_mapping = get_by_mtype('image/png') matching_mapping = get_by_extension('png')
PHP
- Install using using composer:
composer require acdh-oeaw/arche-assets
- Usage with
require_once 'vendor/autoload.php'; print_r(acdhOeaw\UriNormRules::getRules()); print_r(acdhOeaw\UriNormRules::getRules(['viaf', 'gnd'])); print_r(acdhOeaw\ArcheFileFormats::getAll(); print_r(acdhOeaw\ArcheFileFormats::getByMime('application/json'); print_r(acdhOeaw\ArcheFileFormats::getByExtension('application/json');
Description of assets
URI normalization rules
Each rule consists of five properties:
name
: a rule namematch
: a regular expression matching a given URI namespacereplace
: a regular expression replace expression normalizing an URI in a given namespaceresolve
: a regular expression replace expression transforming an URI in a given namespace to an URL fetching an RDF dataformat
: a RDF serialization format to be requested while resolving the URL produced using theresolve
field
Formats
A curated and growing list of file extensions. For each file extension mappings to the respective ARCHE Resource Type Category (stored in acdh:hasCategory
) and Media Type (MIME type) (stored in acdh:hasFormat
) are given. The indicated Media Type should only be used as a fallback; it is best practice to rely on automated Media Type detection based on file signatures.
Further information is provided as well.
- fileExtension: File extension to be mapped.
- name: Name(s) the format is known
- archeCategory: The corresponding URI of the ARCHE Resource Type Category Vocabulary
- dataType: A broad category to group formats in; mainly intended for visualisation purposes.
- pronomID: ID(s) assigned by PRONOM
- mimeType: Official Media Type(s) (formerly known as MIME types) registered at IANA.
- informalMimeType: Other MIME types kown for the format
- magicNumber: A constant numerical or text value used to identify a file format, e.g. Wikipedia list of file signatures
- ianaTemplate: Link to template at IANA
- reference: Link(s) to format specifications referenced by IANA and others
- longTerm: Indicates if a format is suitable for long-term preservation.
Possible values and their meaning- yes - long-term format
- no - not suitable, another format should be used
- restricted - can be used for long-term preservation in some cases (see comment)
- unsure - status remains to be evaluated
- archeDocs: Link to a place with more information for the format.
- comment: Any other noteworthy information not stated elsewhere.
Developement (Python)
install needed developement packages pip install requirements_dev.txt
linting, tests and testcoverage
- to run the test:
tox
- check coverage and create report:
coverage run setup.py test
andcoverage html
- check linting
flake8
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file acdh_arche_assets-3.21.0.tar.gz
.
File metadata
- Download URL: acdh_arche_assets-3.21.0.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e950f7063df508bfbc9b4edd7e7d235ace31f9bae0fc67ec2414eb970c43a48 |
|
MD5 | b0b3ef72c98935ebd6c687a1acffdf88 |
|
BLAKE2b-256 | 4468d853be9fc5efc9e897e6a10d2d197ea7b55fe2581ff480c3c9698903a666 |
File details
Details for the file acdh_arche_assets-3.21.0-py3-none-any.whl
.
File metadata
- Download URL: acdh_arche_assets-3.21.0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b48cc76a45bc0c4997156c5c6c528b82c82b42bf7ea50cf57fdc421d8d87f7a |
|
MD5 | 265bfda9d82f71d3bf685ca4037283ee |
|
BLAKE2b-256 | b302499123b45f244ed86dbd0b0fa538af7ebee1f452bdf33bb223da6ecfdd61 |