Skip to main content

A CKAN extension that adds support for complex attribution

Project description

ckanext-attribution

Tests Coveralls CKAN

A CKAN extension that adds support for complex attribution.

Overview

This extension standardises author/contributor attribution for datasets, enabling enhanced metadata and greater linkage between datasets. It currently integrates with the ORCID and ROR APIs; contributors ('agents') can be added directly from these databases, or manually.

Contributors can be added and edited via actions or via a Vue app that can be inserted into the package_metadata_fields.html template snippet.

A screenshot of the form for adding contributors when editing a package. At the top is a preview of the citation in APA format, then there are three example agents with their affiliations and contribution activities.

Schema

The schema is (partially) based on the RDA/TDWG recommendations. Three new models are added: Agent (contributors), ContributionActivity, and Affiliation (plus small linking models between these and Package records).

Agent

Defines one agent.

Field Type Values Notes
agent_type string 'person', 'org', 'other'
family_name string only used for 'person' records
given_names string only used for 'person' records
given_names_first bool True, False only used for 'person' records; if the given names should be displayed first according to the person's culture/language (default True)
name string used for non-'person' records
location string used for non-person records, optional; a location to display for the organisation to help differentiate between similar names (e.g. 'Natural History Museum (_
London_)' and 'Natural History Museum (Dublin)')
external_id string an identifier from an external service like ORCID or ROR
external_id_scheme string 'orcid', 'ror', other the scheme for the external_id; currently only 'orcid' and 'ror' are fully supported, though basic support for others can be implemented by adding to the attribution_controlled_lists action
user_id string User.id foreign key link to a user account on the CKAN instance

ContributionActivity

Defines one activity performed by one agent on one specific dataset.

Field Type Values Notes
activity string [controlled vocabulary] the activity/role the agent is associated with, e.g. 'Editor', 'Methodology'; roles are defined in the attribution_controlled_lists action, which currently lists the Datacite and CRediT role taxonomies (but can be expanded)
scheme string [controlled vocabulary] name of the defined scheme from attribution_controlled_lists
level string 'Lead', 'Equal', 'Supporting' optional degree of contribution (from CRediT)
time datetime optional date/time of the activity
order integer order of the agent within all who are associated with the same activity, e.g. 1st Editor, 3rd DataCollector (optional)

A specialised ContributionActivity entry with a '[citation]' activity is used to define the order in which contributors should be cited (and/or if they should be cited at all).

Affiliation

Defines a relationship between two agents, either as a 'universal' (persistent) affiliation or for a single package (e.g. a project affiliation).

Field Type Values Notes
agent_a_id string Agent.id foreign key one of the two agents (a/b order does not matter)
agent_b_id string Agent.id foreign key one of the two agents (a/b order does not matter)
affiliation_type string very short description (1 or 2 words) of affiliation, e.g. 'employment' (optional)
description string longer description of affiliation (optional)
start_date date date at which the relationship began, e.g. employment start date (optional)
end_date date date at which the relationship ended (optional)
package_id string Package.id foreign key links affiliation to a specific package/dataset (optional)

Installation

Path variables used below:

  • $INSTALL_FOLDER (i.e. where CKAN is installed), e.g. /usr/lib/ckan/default
  • $CONFIG_FILE, e.g. /etc/ckan/default/development.ini
  1. Clone the repository into the src folder:
cd $INSTALL_FOLDER/src
git clone https://github.com/NaturalHistoryMuseum/ckanext-attribution.git
  1. Activate the virtual env:
. $INSTALL_FOLDER/bin/activate
  1. Install the requirements from requirements.txt:
cd $INSTALL_FOLDER/src/ckanext-attribution
pip install -r requirements.txt
  1. Run setup.py:
cd $INSTALL_FOLDER/src/ckanext-attribution
python setup.py develop
  1. Add 'attribution' to the list of plugins in your $CONFIG_FILE:
ckan.plugins = ... attribution
  1. Add this block to package_metadata_fields.html to show the Vue app:
{% block package_custom_fields_agent %}
    {{ super() }}
{% endblock %}

Additional steps

SOLR Faceting

You will need to change the authors field in your schema.xml for faceting to work.

<schema>
    <fields>
        <...>
        <field name="author" type="string" indexed="true" stored="true" multiValued="true"/>
        <...>
    </fields>
    <...>
    <copyField source="author" dest="text"/>
</schema>

After making the changes, restart SOLR and reindex (ckan -c $CONFIG_FILE search-index rebuild-fast). You will also have to enable the config option to see this in the UI (see below).

Configuration

These are the options that can be specified in your .ini config file. NB: setting ckanext.attribution.debug to True means that the API accesses sandbox.orcid.org instead of orcid.org. Although both run by the ORCID organisation, these are different websites and you will need a separate account/set of credentials for each. It is also worth noting that you will not have access to the full set of authors on the sandbox.

API credentials [REQUIRED]

Name Description Options
ckanext.attribution.orcid_key Your ORCID API client ID/key
ckanext.attribution.orcid_secret Your ORCID API client secret

Optional

Name Description Options Default
ckanext.attribution.debug If true, use sandbox.orcid.org (for testing) True/False True
ckanext.attribution.enable_faceting Enable filtering by contributor name (requires change to SOLR schema) True/False False

Usage

Actions

This extension adds numerous new actions. These are primarily CRUD actions for managing models, with inline documentation and predictable interactions. It's probably more helpful to only go over the more "unusual" new actions here.

agent_list

Search for agents by name or external ID, or just list all agents.

data_dict = {
    'q': 'QUERY',  # optional; searches in name, family_name, given_names, and external_id
}

toolkit.get_action('agent_list')({}, data_dict)

package_contributions_show

Show all contribution records for a package, grouped by agent. Optionally provide a limit and offset for pagination.

data_dict = {
    'id': 'PACKAGE_ID',
    'limit': 'PAGE_SIZE',
    'offset': 'OFFSET'
}

toolkit.get_action('package_contributions_show')({}, data_dict)

Returns a dict:

{
    'contributions': [
        {
            'agent': {
                # Agent.as_dict()
            },
            'activities': [
                # list of Activity.as_dict()
            ],
            'affiliations': [
                {
                    'affiliation': {
                        # Affiliation.as_dict()
                    },
                    'other_agent': {
                        # Agent.as_dict()
                    }
                },
                # ...
            ]
        },
        # ...
    ],
    'total': total,
    'offset': offset,
    'page_size': limit or total
}

agent_affiliations

Show all affiliations for a given agent, optionally limited to a specific dataset/package (plus ' global' affiliations).

data_dict = {
    'agent_id': 'AGENT_ID',
    'package_id': 'PACKAGE_ID'  # optional
}

toolkit.get_action('agent_affiliations')({}, data_dict)

Returns a list of records formatted as such:

{
    'affiliation': {
        # Affiliation.as_dict()
    },
    'other_agent': {
        # Agent.as_dict()
    }
}

attribution_controlled_lists

Returns collections of defined values (which can be modified by using @toolkit.chained_action).

data_dict = {
    'lists': ['NAME1', 'NAME2']  # optional; only return these lists
}

toolkit.get_action('attribution_controlled_lists')({}, data_dict)

There are four collections:

  1. agent_types describes valid types for agents and adds additional detail;
  2. contribution_activity_types contains role/activity taxonomies (i.e. Datacite and CRediT) and lists the available activity values;
  3. contribution_activity_levels is a list of contribution levels (i.e. 'lead', 'equal', and ' supporting', from CRediT);
  4. agent_external_id_schemes describes valid schemes for external IDs (currently, ORCID and ROR).

These collections are useful for validation and frontend connectivity/standardisation. They are contained within an action to a. enable frontend access via AJAX requests, and b. allow users to override values as needed.

agent_external_search

Search external sources (ORCID and ROR) for agent data. Ignores records that already exist in the database.

data_dict = {
    'q': 'QUERY_STRING',
    'sources': ['SOURCE1', 'SOURCE2']  # optional; only search these sources
}

toolkit.get_action('agent_external_search')({}, data_dict)

Results are returned formatted as such:

{
    'SCHEME_NAME': {
        'records': [
            # list of agent dicts
        ]
        'remaining': 10000  # number of other records found
    }
}

agent_external_read

Read data from an external source like ORCID or ROR, either from an existing record or a new external ID.

# EITHER
data_dict_existing = {
    'id': 'AGENT_ID',
    'diff': False
    # optional; only show values that differ from the record's current values (default False)
}

# OR
data_dict_new = {
    'external_id': 'EXTERNAL_ID',
    'external_id_scheme': 'orcid'  # or 'ror', etc.
}

toolkit.get_action('agent_external_read')({}, data_dict)

Commands

initdb

ckan -c $CONFIG_FILE attribution initdb

Initialise database tables.

sync

ckan -c $CONFIG_FILE attribution sync $OPTIONAL_ID $ANOTHER_OPTIONAL_ID

Retrieve up-to-date information from external APIs for contributors with an external ID set.

refresh-packages

ckan -c $CONFIG_FILE attribution refresh-packages $OPTIONAL_ID $ANOTHER_OPTIONAL_ID

Update the author string for all (or the specified) packages.

agent-external-search

ckan -c $CONFIG_FILE attribution agent-external-search --limit 10 $OPTIONAL_ID $ANOTHER_OPTIONAL_ID

Search external APIs for contributors without an external ID set. Run refresh-packages and rebuild the search index after this command.

merge-agents

ckan -c $CONFIG_FILE attribution merge-agents --q $SEARCH_QUERY --match-threshold 75

Find agents with similar names (optionally matching the search query) and merge them. Run refresh-packages and rebuild the search index after this command.

migratedb

ckan -c $CONFIG_FILE attribution migratedb --limit 10 --dry-run --no-search-api

Attempt to extract names of contributors from author fields and convert them to the new format.

  • --limit will only convert a certain number of packages at a time.
  • --dry-run prevents saving to the database.
  • --no-search-api just extracts the names, without searching external APIs for contributors after.

It is recommended to run merge-agents, refresh-packages, and rebuild the search index after running this command.

Testing

There is a Docker compose configuration available in this repository to make it easy to run tests.

To run the tests against ckan 2.9.x on Python3:

  1. Build the required images
docker-compose build
  1. Then run the tests. The root of the repository is mounted into the ckan container as a volume by the Docker compose configuration, so you should only need to rebuild the ckan image if you change the extension's dependencies.
docker-compose run ckan

The ckan image uses the Dockerfile in the docker/ folder which is based on openknowledge/ckan-dev:2.9-py2.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckanext-attribution-1.1.6.tar.gz (592.1 kB view details)

Uploaded Source

Built Distributions

ckanext_attribution-1.1.6-py3.10.egg (550.7 kB view details)

Uploaded Source

ckanext_attribution-1.1.6-py3-none-any.whl (567.0 kB view details)

Uploaded Python 3

File details

Details for the file ckanext-attribution-1.1.6.tar.gz.

File metadata

  • Download URL: ckanext-attribution-1.1.6.tar.gz
  • Upload date:
  • Size: 592.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for ckanext-attribution-1.1.6.tar.gz
Algorithm Hash digest
SHA256 f260b7893b7e9a8a2027e6a65f5cd915e7d60f2889ca94fd0c6ee15b35eccb1a
MD5 dc74971e2307e66bee42cd1d31e49817
BLAKE2b-256 8e9ff1f6d4eebe8242884c6f18eb62feb8f5a61abec898b48cf0937045ae446c

See more details on using hashes here.

File details

Details for the file ckanext_attribution-1.1.6-py3.10.egg.

File metadata

File hashes

Hashes for ckanext_attribution-1.1.6-py3.10.egg
Algorithm Hash digest
SHA256 0d878b2349bc4b605f6e330e96463a539b7fae5a7769d17e7022e71871751c6c
MD5 f45823f33247c04561276e656d5e5279
BLAKE2b-256 abd4d8e76b9a2861156af743d1d4c76aaccc8e9f39c4fca308c02fd4e1be68c1

See more details on using hashes here.

File details

Details for the file ckanext_attribution-1.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for ckanext_attribution-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 ae2261e343a399e90418cb692766f3ddf3ae9906707c7f64e6cc513ecbf028d3
MD5 7d714f931ed837dd6c58cd3717a0f63e
BLAKE2b-256 05ec164dc80a1953d7e12843902c00012256008a8190f1dc445d36ddac2b314b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page