Skip to main content

A tool converting Darwin Core Archive into Frictionless Data Package.

Project description

Frictionless Darwin Core

A tool converting Darwin Core Archive into Frictionless Data Package.

Features

  • datapackage.json: Ensure your DarwinCore archive complies with Frictionless specifications
  • README.md: Add human readable metadata from EML
  • Support all standards DarwinCore terms
  • Support default values in DarwinCore schema
  • Fields constraints: Enable further data validation, with goodtables
  • URL: Accept DarwinCore Archive from local path or URL
  • Command line interface

Contents

Getting Started

Installing

pip install FrictionlessDarwinCore

Running on CLI

fdwca --help
Usage: fdwca [OPTIONS] DWCA OUTPATH

Options:
  -f, --format [json|md|csv]  Output format
  --help                  Show this message and exit.

# convert from local DwC archive
fdwca myDwC.zip myDP.zip

# convert from URL (archive accessible on internet)
fdwca https://ipt.biodiversity.be/archive.do?r=rbins_saproxilyc_beetles S1dp.zip

# only generates JSON descriptor (datapackage.json)
fdwca -f json https://ipt.biodiversity.be/archive.do?r=rbins_saproxilyc_beetles datapackage.json

# only generates markdown human readable metadata (readme.md)
fdwca -f md https://ipt.biodiversity.be/archive.do?r=rbins_saproxilyc_beetles readme.md

# only converts data as zipped CSV files
fdwca -f csv https://ipt.biodiversity.be/archive.do?r=rbins_saproxilyc_beetles beetles.zip

Python use

Alternatively, you can use DwCArchive Python object like this:

from FrictionlessDarwinCore import DwCArchive

# load DarwinCore archive from URL
da = DwCArchive('https://ipt.biodiversity.be/archive.do?r=rbins_saproxilyc_beetles')
# infer Data Package structure from DarwinCore files
da.infer()
if da.valid:
  # save it as Data Package locally
  da.save('BeetlesDP.zip')
  # ... or generates separate JSON descriptor
  da.to_json('datpackage.json')
  # ... or generates separate markdown human readable metadata
  da.to_markdown('readme.md')
  # ... or generated zip with data files only
  da.to_csv('data.zip')

Documentation

Rationale

DarwinCore standard, created and maintained by Biodivesity Informatics Standards(aka TDWG), is used to publish Life Sciences data about observations, collections specimens, species checklists and sampling events. DarwinCore Archive(DwCA), a bundle of biodiversity data and metadata files, is well established mechanism for publishing or using data in Global Biodiversity Information Facility and other Life Sciences networks.

Frictionless Data Package is an emerging, domain agnostic, data standard that offers a variety of cross technology tools.

Bridging these two data ecosystems is our vision. This project is supported by Open Knowledge Foundation and funded under the Frictionless Data Tool Fund.

What it does?

DarwinCore archives consist of:

  • a core data file
  • optionally, 1 or more extension data file(s)
  • eml.xml: metadata written in Ecological Metadata Language
  • meta.xml: the structure of the DarwinCore data files

Basically, this conversion tool appends two files to the archive, see diagram below:

  • datapackage.json: data package descriptor of the data files
  • readme.md: markdown, human readable, metadata
┌─────────────────────────────────────────────────────────────────┐
│   ┌──────────────────────────────────────────────────────────┐  │
│   │DarwinCore Archive                                        │  │
│   │                                                          │  │
│   │                                ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─       │  │
│   │                           ┌ ─ ─    Extension 1    │      │  │
│   │                                └ ─ ─ ─ ─ ─ ─ ─ ─ ─       │  │
│   │    ┌──────────────────┐   │    ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─       │  │
│   │    │    Core file     │─ ─ ─ ─     Extension 2    │      │  │
│   │    └──────────────────┘   │    └ ─ ─ ─ ─ ─ ─ ─ ─ ─       │  │
│   │                                                          │  │
│   │                           │                              │  │
│   │                                ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─       │  │
│   │                           └ ─ ─    Extension n    │      │  │
│   │                                └ ─ ─ ─ ─ ─ ─ ─ ─ ─       │  │
│   │                                                          │  │
│   │   ┌──────────────────┐         ┌──────────────────┐      │  │
│   │   │     meta.xml     │         │     eml.xml      │      │  │
│   │   └──────────────────┘         └──────────────────┘      │  │
│   │             │                            │               │  │
│   └─────────────┼────────────────────────────┼───────────────┘  │
│                 ▼                            ▼                  │
│       ┌──────────────────┐         ┌──────────────────┐         │
│       │ datapackage.json │         │    readme.md     │         │
│       └──────────────────┘         └──────────────────┘         │
│                                                                 │
│                                           FrictionlessDarwinCore│
│                                                  (=Data Package)│
└─────────────────────────────────────────────────────────────────┘

The tool can also generate these two files as separate outputs without touching the archive.

Additionally, the tool also converts the Core and Extension(s) files, when needed.

DarwinCore terms

Darwin Core is a very persmissive standard some recommandations but almost no constraining rules. This table assigns Frictionless Data Package's type, format and constraints to every Darwin Core term. Values that do not comply with these Frictionless DarwinCore rules will automatically raise warnings.

Test cases suite

The initial test cases suite covers a wide variety of Darwin Core usages. It should give enough confidence that basic incompatibilities are identified, reported and solved but it will not guarantee that all possible DwC Archives will automatically translate into valid Data Packages.

Contributing

You are encouraged to contribute by identifying/reporting issues or incompatiblities and helping to solve them.

Not familiar with Darwin Core?

Have a look at these online documents:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FrictionlessDarwinCore-1.0.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

FrictionlessDarwinCore-1.0.0-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file FrictionlessDarwinCore-1.0.0.tar.gz.

File metadata

  • Download URL: FrictionlessDarwinCore-1.0.0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.3

File hashes

Hashes for FrictionlessDarwinCore-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9dedee91417606edbb20ea19b95cb4497f35311f68d1b48715ad48d0be6645ba
MD5 8c4092cc0c8c36a51e6a9da3f104b9a8
BLAKE2b-256 24e85db6740007d5a727d09bc965bd320e31880819a14ce00d3d1f94875f84a4

See more details on using hashes here.

File details

Details for the file FrictionlessDarwinCore-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: FrictionlessDarwinCore-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.3

File hashes

Hashes for FrictionlessDarwinCore-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31d52377441406193df7cf1b00746c45135770a5ce1ed5cb405cc58edc35bd34
MD5 c2e07e202d0cf60b4bff36cc67bda2d5
BLAKE2b-256 80cb2a88a4cf8cbf9bd0518e4d6cdd1db3e59e290c75c303552e53c91b485da8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page