Skip to main content

arcp (Archive and Package) URI parser and generator

Project description

Create/parse arcp (Archive and Package) URIs.

Documentation Status https://zenodo.org/badge/DOI/10.5281/zenodo.1162749.svg https://travis-ci.org/stain/arcp-py.svg?branch=master https://img.shields.io/pypi/v/arcp.svg?maxAge=86400 https://coveralls.io/repos/github/stain/arcp-py/badge.svg?branch=master https://codecov.io/gh/stain/arcp-py/branch/master/graph/badge.svg

Introduction

arcp provides functions for creating arcp URIs, which can be used for identifying or parsing hypermedia files packaged in an archive or package, like a ZIP file.

arcp URIs can be used to consume or reference hypermedia resources bundled inside a file archive or an application package, as well as to resolve URIs for archive resources within a programmatic framework.

This URI scheme provides mechanisms to generate a unique base URI to represent the root of the archive, so that relative URI references in a bundled resource can be resolved within the archive without having to extract the archive content on the local file system.

An arcp URI can be used for purposes of isolation (e.g. when consuming multiple archives), security constraints (avoiding “climb out” from the archive), or for externally identiyfing sub-resources referenced by hypermedia formats.

Examples:
  • arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/doc.html

  • arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/

  • arcp://ni,sha-256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0/

  • arcp://name,gallery.example.org/

The different forms of URI authority in arcp URIs can be used depending on which uniqueness constraints to apply when addressing an archive. See the arcp specification (draft-soilandreyes-arcp) for details.

Note that this library only provides mechanisms to generate and parse arcp URIs, and do not integrate with any particular archive or URL handling modules like zipfile or urllib.request.

License

© 2018-2020 Stian Soiland-Reyes <https://orcid.org/0000-0001-9842-9718>, The University of Manchester, UK

Licensed under the Apache License, version 2.0 <https://www.apache.org/licenses/LICENSE-2.0>, see the file LICENSE.txt for details.

Contribute

Source code: <https://github.com/stain/arcp-py>

Feel free to raise a pull request at <https://github.com/stain/arcp-py/pulls> or an issue at <https://github.com/stain/arcp-py/issues>.

Submitted contributions are assumed to be covered by section 5 of the Apache License 2.0.

Installing

You will need Python 2.7, Python 3.4 or later (Recommended: 3.6).

If you have pip, then the easiest is normally to install from <https://pypi.org/project/arcp/> using:

pip install arcp

If you want to install manually from this code base, then try:

python setup.py install

Usage

For full documentation, see <https://arcp.readthedocs.io/> or use help(arcp)

This module provides functions for creating arcp URIs, which can be used for identifying or parsing hypermedia files packaged in an archive or package, like a ZIP file:: python

>>> from arcp import *
>>> arcp_random()
'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/'
>>> arcp_random("/foaf.ttl", fragment="me")
'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/foaf.ttl#me'
>>> arcp_hash(b"Hello World!", "/folder/")
'arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/'
>>> arcp_location("http://example.com/data.zip", "/file.txt")
'arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt'

arcp URLs can be used with urllib.parse, for instance using urljoin to resolve relative references:

>>> css = arcp.arcp_name("app.example.com", "css/style.css")
>>> urllib.parse.urljoin(css, "../fonts/foo.woff")
'arcp://name,app.example.com/fonts/foo.woff'

In addition this module provides functions that can be used to parse arcp URIs into its constituent fields:: python

>>> is_arcp_uri("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt")
True
>>> is_arcp_uri("http://example.com/t")
False
>>> u = parse_arcp("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt")
ARCPSplitResult(scheme='arcp',prefix='uuid',name='b7749d0b-0e47-5fc4-999d-f154abe68065',
  uuid='b7749d0b-0e47-5fc4-999d-f154abe68065',path='/file.txt',query='',fragment='')
>>> u.path
'/file.txt'
>>> u.prefix
'uuid'
>>> u.uuid
UUID('b7749d0b-0e47-5fc4-999d-f154abe68065')
>>> u.uuid.version
5
>>> parse_arcp("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/").hash
('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069')

The object returned from parse_arcp is similar to ParseResult from urlparse, but contains additional properties prefix, uuid, ni, hash and name, some of which will be None depending on the arcp prefix.

The function arcp.parse.urlparse can be imported as an alternative to urllib.parse.urlparse. If the scheme is arcp then the extra arcp fields like prefix, uuid, hash and name are available as from parse_arcp, otherwise the output is the same as from regular urlparse:: python

>>> from arcp.parse import urlparse
>>> urlparse("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/soup;sads")
ARCPParseResult(scheme='arcp',prefix='ni',
   name='sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk',
   ni='sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk',
   hash=('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069',
   path='/folder/soup;sads',query='',fragment='')
>>> urlparse("http://example.com/help?q=a")
ParseResult(scheme='http', netloc='example.com', path='/help', params='',
  query='q=a', fragment='')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcp-0.2.1.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

arcp-0.2.1-py2.py3-none-any.whl (15.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file arcp-0.2.1.tar.gz.

File metadata

  • Download URL: arcp-0.2.1.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3

File hashes

Hashes for arcp-0.2.1.tar.gz
Algorithm Hash digest
SHA256 5c17ac7972c9ef82979cc2caf2b3a87c1aefd3fefe9adb8a5dd728ada57715dd
MD5 b1c9aa17474e16d809188aeb245cf83c
BLAKE2b-256 c64dde103380fb1646b720a5318401cf2a49a1a88c082ef06fdd015d848f073b

See more details on using hashes here.

File details

Details for the file arcp-0.2.1-py2.py3-none-any.whl.

File metadata

  • Download URL: arcp-0.2.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3

File hashes

Hashes for arcp-0.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4e09b2d8a9fc3fda7ec112b553498ff032ea7de354e27dbeb1acc53667122444
MD5 8b084e20121d283fd09c3b7009803722
BLAKE2b-256 66df32574bc8f1d440d40f4aaf3b455316b2b1536c7243c985a90f8516cf3074

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page