Skip to main content

A set of static assets used (mainly) for ARCHE data preprocessing

Project description

Arche Assets

PyPI version codecov Test flake8 Lint Latest Stable Version phpunit License

Set of static assets used (mainly) for ARCHE data preprocessing and ARCHE information pages:

  • URI normalization rules used by the ACDH.
    (stored in AcdhArcheAssets/uriNormRules.json)
  • Description of input data formats accepted by the ARCHE.
    (stored in AcdhArcheAssets/formats.json)

The repository provides also Python 3 and PHP bindings for accessing those assets.

Installation & usage

Python

  • Install using pip3:
    pip3 install acdh-arche-assets
    
  • Use with
    from AcdhArcheAssets.uri_norm_rules import get_rules, get_normalized_uri, get_norm_id
    print(f"{get_rules()}")
    
    wrong_id = "http://sws.geonames.org/1232324343/linz.html"
    
    good_id = get_normalized_uri(wrong_id)
    print(good_id)
    # "https://sws.geonames.org/1232324343/"
    
    # extract ID from URL
    norm_id = get_norm_id("http://sws.geonames.org/1232324343/linz.html")
    print(norm_id)
    # "1232324343"
    
    
    from AcdhArcheAssets.file_formats import get_formats, get_by_mtype, get_by_extension
    
    formats = get_formats()
    matching_mapping = get_by_mtype('image/png')
    matching_mapping = get_by_extension('png')
    

PHP

  • Install using the composer:
    composer require acdh-oeaw/arche-assets
    
  • Use with
    require_once 'vendor/autoload.php';
    
    print_r(acdhOeaw\UriNormRules::getRules());
    print_r(acdhOeaw\UriNormRules::getRules(['viaf', 'gnd']));
    
    print_r(acdhOeaw\ArcheFileFormats::getAll());
    print_r(acdhOeaw\ArcheFileFormats::getByMime('application/json'));
    print_r(acdhOeaw\ArcheFileFormats::getByExtension('json'));
    

Description of assets

URI normalization rules

Each rule consists of five properties:

  • name: a rule name
  • match: a regular expression matching rule's URI namespace
  • replace: a regex-replace expression transforming an URI in a rule's namespace into its ACDH-canonical form
  • resolve: a regex-replace expression transforming an URI in a rule's namespace into an URL fetching an RDF data
  • format: an RDF serialization format to be requested while resolving the URL produced using the resolve field

Formats

A curated and growing list of file extensions. For each file extension mappings to the respective ARCHE Resource Type Category (stored in :hasCategory) and Media Type (MIME type) (stored in :hasFormat) are given. The indicated Media Type should only be used as a fallback; it is best practice to rely on automated Media Type detection based on file signatures.

Further information is provided as well.

  • fileExtension: File extension to be mapped.
  • name: Name(s) the format is known
  • archeCategory: The corresponding URI of the ARCHE Resource Type Category Vocabulary
  • dataType: A broad category to group formats in; mainly intended for visualisation purposes.
  • pronomID: ID(s) assigned by PRONOM
  • mimeType: Official Media Type(s) (formerly known as MIME types) registered at IANA.
  • informalMimeType: Other MIME types kown for the format
  • magicNumber: A constant numerical or text value used to identify a file format, e.g. Wikipedia list of file signatures
  • ianaTemplate: Link to template at IANA
  • reference: Link(s) to format specifications referenced by IANA and others
  • longTerm: Indicates if a format is suitable for long-term preservation.
    Possible values and their meaning
    • yes - long-term format
    • no - not suitable, another format should be used
    • restricted - can be used for long-term preservation in some cases (see comment)
    • unsure - status remains to be evaluated
  • archeDocs: Link to a place with more information for the format.
  • comment: Any other noteworthy information not stated elsewhere.

Developement (Python)

install needed developement packages pip install requirements_dev.txt

linting, tests and testcoverage

  • to run the test: tox
  • check coverage and create report: coverage run setup.py test and coverage html
  • check linting flake8

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acdh_arche_assets-3.30.1.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acdh_arche_assets-3.30.1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file acdh_arche_assets-3.30.1.tar.gz.

File metadata

  • Download URL: acdh_arche_assets-3.30.1.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for acdh_arche_assets-3.30.1.tar.gz
Algorithm Hash digest
SHA256 5f0663d944286534103bb54c446f6a023be970f17e2cb7f97e4f278bbd5b6e53
MD5 b6f1e6b27ea4ac833561170a00845c6b
BLAKE2b-256 0e08153496d4efe6d1f7d000cdb7d71f3bde3e3bd340114bd5b48599e60f0835

See more details on using hashes here.

File details

Details for the file acdh_arche_assets-3.30.1-py3-none-any.whl.

File metadata

File hashes

Hashes for acdh_arche_assets-3.30.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1451af84327a4e4eb75be180babcb6dfb70c5f8bdc90541bc16239e0b6d27bdc
MD5 5ff908f1be66314aaf7e9a984beab7bf
BLAKE2b-256 9b0049eb88e52c1226cea2e7b69d1e3a41b3bc81ae3f1d28c7e26e16c6cbe119

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page