Wrapper that connect flask-taxonomies with Invenio
Project description
oarepo-taxonomies
Wrapper that connect Flask-Taxonomies with Invenio.
Installation
The package is installed classically via the PyPi repository:
pip install oarepo-taxonomies
The database is initialized using the management task:
invenio taxonomies init
Config
Search serializer
Taxonomic facets can only be used if a wrapper (taxonomy_enabled_search) is used to wrap your search serializer in
RECORDS_REST_ENDPOINTS. Wrapper takes two positional argument. First is search_serializer and second is enabled
taxonomy. Now you can use taxonomy_term_facet
.
RECORDS_REST_ENDPOINTS = {
...
'search_serializers': {
'application/json': taxonomy_enabled_search(json_search, taxonomy_aggs=["degreeGrantor"],
fallback_language),
},
...
}
Usage
All functionality is provided by flask-taxonomies. For more details see: flask-taxonomies.
In addition, this package adds the ability to import and export taxonomies using Excel files (* .xlsx) and can dereference a reference to a taxonomy in an invenio record.
Import from Excel
Importing from Excel is handled by the management task:
invenio taxonomies import [OPTIONS] TAXONOMY_FILE
Options:
--int TEXT
--str TEXT
--bool TEXT
--drop / --no-drop
--help
where:
TAXONOMY FILE
is path to the xlsx file (older xls file is not supported)--int, --str, --bool
options are repeatable options and determine data type--drop/--no-drop
Specifies whether the old taxonomy should be removed from the database when we import a taxonomy with the same taxonomy code.
Structure of Excel file
Blocks
Excel must contain two blocks. The first block contains taxonomy information and must contain one mandatory code column (taxonomy identifier). Indeed, it can contain other user data (eg. title or description).
The second block must be separated from the first by a blank line and must contain two mandatory columns, level and slug, in exactly that order. The other columns are optional.
Nested JSON
Taxonomies are internally represented as JSON, which can be nested. Excel spreadsheet is inherently linear and can not
store nested data. However, oarepo-taxonomies support nested JSON. Each value in a nested JSON has its own unique
address. Each JSON level is separated by an underscore, so each branched JSON can be transformed to linear as follows.
Nested:
{
"a": 1,
"b": 2,
"c": [{"d": [2, 3, 4], "e": [{"f": 1, "g": 2}]}]
}
Linear:
{"a": 1,
"b": 2,
"c_0_d_0": 2,
"c_0_d_1": 3,
"c_0_d_2": 4,
"c_0_e_0_f": 1,
"c_0_e_0_g": 2
}
According to the same pattern, headings can be created in Excel and the data is transformed into a nested form.
Level order
Taxonomies are tree structures that are also not linear and cannot be transferred to an Excel spreadsheet environment. Therefore, the sort order goes from root to the lowest child. Root (Taxonomy) -> level 1 first child - ... last level all children, level 1 second offspring ... etc.
Excel example
code | title_cs | title_en | |
---|---|---|---|
cities | Města | Cities | |
level | slug | title_cs | title_en |
1 | eu | Evropa | Europe |
2 | cz | Česko | Czechia |
3 | prg | Praha | Prague |
3 | brn | Brno | Brno |
2 | de | Německo | Germany |
3 | ber | Berlín | Berlin |
3 | mun | Mnichov | Munich |
2 | gb | Velká Británie | United Kingdom |
3 | lon | Londýn | London |
3 | man | Manchester | Manchester |
The resulting json for the taxonomy will take the following form:
{
"code": "cities",
"title": {
"cs": "Města",
"en": "Citites"
}
}
and for individual Taxonomy Term:
{
"code": "Praha",
"title": {
"cs": "Praha",
"en": "Prague"
}
}
and tree structure:
cities └-eu |--cz | |--prg | └--brn |--de | |--ber | └--mun └--gb |--lon └--man
Export to Excel
Excel export is created using a management task invenio taxonomies export TAXONOMY_CODE
.
An xlsx and csv file is created in the current folder where the task was run.
Marshmallow
The Marshmallow module serialize Taxonomy and dereference reference from links/self.
The module provides the Marshmallow field TaxonomyField
and schema TaxonomySchema
,
which can be freely used in the user schema.
TaxonomyField/Schema receives any user data and checks if the user data is JSON/dict, string or list.
The output format of serialized taxonomies is the Taxonomic List, which contains ancestors in addition to the taxonomy itself. The order of taxonomy is from the parent term to the finite element of the taxonomy. For taxonomy reason, the serialization is opinionated. Example of taxonomy serialization is following:
[{
'is_ancestor': true,
'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
'test': 'extra_data'
},
{
'created_at': '2014-08-11T05:26:03.869245',
'email': 'ken@yahoo.com',
'is_ancestor': false,
'links': {
'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
},
'name': 'Ken',
'test': 'extra_data'
}]
Taxonomy representation can be changed in config file (e.g.: invenio.cfg). For more details please see Flask-Taxonomies.
This library use predefinded config that is located in config.py
:
FLASK_TAXONOMIES_REPRESENTATION = {
"taxonomy": {
'include': [INCLUDE_DATA, INCLUDE_ANCESTORS, INCLUDE_URL, INCLUDE_SELF,
INCLUDE_ANCESTOR_LIST, INCLUDE_ANCESTOR_TAG, INCLUDE_PARENT],
'exclude': [],
'select': None,
'options': {}
}
}
There are two ways to use TaxonomyField.
- The input format is a dictionary or text string containing a link to the taxonomy.
- dictionary:
The dictionary must contain the nested dictionary with name
links
, which containsself
. - string: Any text that contains a url to the taxonomy.
- dictionary:
The dictionary must contain the nested dictionary with name
- The input format is list of ancestors, where last is the referenced taxonomy.
- dictionary
from marshmallow import Schema
from oarepo_taxonomies.marshmallow import TaxonomyField
# custom schema
class TestSchema(Schema):
field = TaxonomyField()
# taxonomy dict
random_user_taxonomy = {
"created_at": "2014-08-11T05:26:03.869245",
"email": "ken@yahoo.com",
"name": "Ken",
"links": {
"self": "http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b"
}
}
# record dict
data = {
"field": random_user_taxonomy
}
schema = TestSchema()
result = schema.load(data)
assert result == {
'field': [{
'is_ancestor': True,
'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
'test': 'extra_data'
},
{
'created_at': '2014-08-11T05:26:03.869245',
'email': 'ken@yahoo.com',
'is_ancestor': False,
'links': {
'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
},
'name': 'Ken',
'test': 'extra_data'
}]
}
- string
from marshmallow import Schema
from oarepo_taxonomies.marshmallow import TaxonomyField
# custom schema
class TestSchema(Schema):
field = TaxonomyField()
# taxonomy reference as any string with url
random_user_taxonomy = "bla bla http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b"
# record dict
data = {
"field": random_user_taxonomy
}
schema = TestSchema()
result = schema.load(data)
assert result == {
'field': [{
'is_ancestor': True,
'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
'test': 'extra_data'
},
{
'is_ancestor': False,
'links': {
'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
},
'test': 'extra_data'
}]
}
- list
from marshmallow import Schema
from oarepo_taxonomies.marshmallow import TaxonomyField
# custom schema
class TestSchema(Schema):
field = TaxonomyField()
# taxonomy list with ancestor (root ancestor at the first place)
random_user_taxonomy = [
{
'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
},
{
'links': {
'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
},
'test': 'extra_data',
'next': 'bla',
'another': 'something'
}
]
# record dict
data = {
"field": random_user_taxonomy
}
schema = TestSchema()
result = schema.load(data)
assert result == {
'field': [{
'is_ancestor': True,
'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
'test': 'extra_data'
},
{
'another': 'something',
'is_ancestor': False,
'links': {
'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
},
'next': 'bla',
'test': 'extra_data'
}]
}
TaxonomyField vs. TaxonomySchema
TaxonomySchema
is a marshmallow schema, that can be subclassed and used, for example,
inside Nested
.
TaxonomyField
is a marshmallow Field
that is used as is. The field also allows extending
taxonomy metadata model with extra properties.
Signature of the factory is following TaxonomyField(*args, extra=None, name=None, many=False, mixins: list = None, **kwargs)
- args: arbitrary arguments passed to marshmallow.schema
- extra: a dictionary of extra marshmallow fields (key: field name, value: instance of Field)
- name: optional name of the field (it is used as a name of the dynamically created class on the background)
- mixins: list of added mixins (class defining extra marshmallow Fields)
- kwargs: arbitrary named arguments passed to the generated marshmallow schemas
class InstitutionMixin:
name = SanitizedUnicode()
address = SanitizedUnicode()
class TestSchema(Schema):
field = TaxonomyField(many=True, mixins=[InstitutionMixin])
random_user_taxonomy = [
{
'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
},
{
'links': {
'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
},
'test': 'extra_data',
'next': 'bla',
'another': 'something',
'name': 'Hogwarts',
'address': 'Platform nine and three-quarters'
}
]
data = {
"field": random_user_taxonomy
}
schema = TestSchema()
result = schema.load(data)
assert result == {
'field': [{
'is_ancestor': True,
'links': {'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a'},
'test': 'extra_data'
},
{
'address': 'Platform nine and three-quarters',
'another': 'something',
'is_ancestor': False,
'links': {
'parent': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a',
'self': 'http://127.0.0.1:5000/2.0/taxonomies/test_taxonomy/a/b'
},
'name': 'Hogwarts',
'next': 'bla',
'test': 'extra_data'
}]
}
JSONSchemas
The library offers a predefined JSON schema for taxonomies.
The predefined schema is called with "$ref": "../taxonomy-v2.0.0.json#/definitions/TaxonomyTerm"
and is available in Invenio in current_jsonschemas.list_schemas()
.
Custom schema example:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"id": "https://example.com/schemas/example_json-v1.0.0.json",
"additionalProperties": false,
"title": "My site v1.0.0",
"type": "object",
"properties": {
"$schema": {
"type": "string"
},
"custom_taxonomy": {
"$ref": "../taxonomy-v2.0.0.json#/definitions/TaxonomyTerm"
}
}
}
Elasticsearch mapping
Predefined mappings can be used for indexing into Elasticsearch. If you want to use this mapping you must use the
library OAREPO mapping includes. A reference to
taxonomy mapping is then inserted to custom mapping as either
"type": "taxonomy-v2.0.0.json#/TaxonomyTerm"
or "type": "taxonomy-term"
.
Custom mapping example:
{
"mappings": {
"date_detection": false,
"numeric_detection": false,
"dynamic": false,
"properties": {
"$schema": {
"type": "keyword",
"index": true
},
"custom_taxonomy": {
"type": "taxonomy-v2.0.0.json#/TaxonomyTerm"
}
}
}
}
Signals
This module will register the following signal handlers on the Flask Taxonomies signals that handle managing of reference Taxonomies whenever a Taxonomy or TaxonomyTerm changes:
Flask-Taxonomies signals | Registred signal handler | Description |
---|---|---|
before_taxonomy_deleted | taxonomy_delete | Checks if the changed taxonomy is a reference to any record. If so, they throw an exception. |
before_taxonomy_term_deleted | taxonomy_term_delete | Checks if the changed TaxonomyTerm is a reference to any record. If so, they throw an exception. |
after_taxonomy_term_updated | taxonomy_term_update | Replaces the link in the records to the moved TaxonomyTerm. |
after_taxonomy_term_moved | taxonomy_term_moved | Replaces the contents of the changed taxonomy in the referenced records. |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file oarepo_taxonomies-3.2.2.tar.gz
.
File metadata
- Download URL: oarepo_taxonomies-3.2.2.tar.gz
- Upload date:
- Size: 36.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62cfd7449e0fce18a021b346c749f807c91ddb81b08286e2efcd7a6dff1e6db8 |
|
MD5 | 8c09adb0dd3fdb4e96c4a07e892aad75 |
|
BLAKE2b-256 | b3dbcbc1fd2d896584867cae5806aabddda10d47bae5a9230d6887ba320b5f7d |
File details
Details for the file oarepo_taxonomies-3.2.2-py2.py3-none-any.whl
.
File metadata
- Download URL: oarepo_taxonomies-3.2.2-py2.py3-none-any.whl
- Upload date:
- Size: 40.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86e02bb0b9840aded659b6467d1657ea447ec4f0cc42397aed1fc41343434784 |
|
MD5 | fb76a7b572c667076512e509942a01f7 |
|
BLAKE2b-256 | 6a3400335ab9d18460736a1ab6a8593f4c212b7ddd01a8ddffa01608bc6c4330 |