Metadata Editor client for python
Project description
pyMetadataEditor
A tool connected to Metadata Editor for creating, editing and managing metadata for microdata, indicators, geospatial data, documents, scripts, images and videos.
How to use pyMetadataEditor
from pymetadataeditor import MetadataEditor
import os
your_api_key = os.getenv("API_KEY")
api_url = os.getenv("API_URL")
me = MetadataEditor(api_url=api_url, api_key=your_api_key, verify_ssl=False)
Listing your projects
me.list_projects(limit=8)
| type | idno | study_idno | title | abbreviation | nation | year_start | year_end | |
|---|---|---|---|---|---|---|---|---|
| id | ||||||||
| 1003 | document | 12345 | DOC_001 | Sample Document 1 | SD1 | Example Nation | 2020 | 2025 |
| 1002 | survey | 67890 | SURVEY_002 | Sample Survey 2 | SS2 | Example Nation | 2019 | 2024 |
| 1001 | timeseries | 54321 | TS_003 | Time Series 3 | TS3 | Another Nation | 2021 | 2026 |
Creating a new indicator project
demo_name = "GB20241030_demo"
series_description = {
"idno": demo_name,
"doi": "V1",
"name": "Version 1",
"display_name": "Version 1"
}
indicator_id = me.create_project_log({"idno": demo_name, "series_description": series_description}, "indicator")
Starting with outlines
The metadata can be both large and hierarchical. Starting with a skeleton outline makes things easier.
Outlines are available in three modes - dictionary, pydantic model and as an Excel file.
Dictionaries
Dictionaries are created like so:
indicator_dict = me.make_metadata_outline('indicator', output_mode='dict')
indicator_dict
{'metadata_information': {'title': None,
'idno': None,
'producers': [{'name': '', 'abbr': None, 'affiliation': None, 'role': None}],
'prod_date': None,
...
'email': None,
'telephone': None,
'uri': None}]},
'tags': [{'tag': None, 'tag_group': None}]}
Pydantic
Pydantic is a nice python library for defining and validating data schemas. An outline for the indicator schema can be created like so:
indicator_pydantic = me.make_metadata_outline('indicator', 'pydantic')
indicator_pydantic
giving
IHSN_INDICATOR_1-0_Template_v01_EN(metadata_information=metadata_information(title=None, idno=None, producers=[Producer(name='', abbr=None, affiliation=None, role=None)], prod_date=None, ...
It can be updated using dot notation, for example:
indicator_pydantic.metadata_information.producers[0].name = "example_producer"
indicator_pydantic
giving
IHSN_INDICATOR_1-0_Template_v01_EN(metadata_information=metadata_information(title=None, idno=None, producers=[Producer(name='example_producer', abbr=None, affiliation=None, role=None)], prod_date=None, ...
Excel
Finally, a nicely formatted Excel file can be created into which the metadata can be written, with the name of the metadata type or of the default template used as the filename if no filename is explicitly given.
outline_filename = me.make_metadata_outline('indicator', 'excel')
And then read back in from Excel like so:
indicator_excel = me.read_metadata_from_excel(outline_filename)
Retreiving existing metadata
Likewise, existing projects can be downloaded as either dictionaries, pydantic models or as excel spreadsheets.
Asking for the metadata as a pydantic object
demo_pydantic = me.get_project_metadata_by_id(indicator_id, 'pydantic')
demo_pydantic
which gives:
IHSN_INDICATOR_1-0_Template_v01_EN(metadata_information=metadata_information(title=None, idno=None, producers=[Producer(name='', abbr=None, affiliation=None, role=None)], prod_date=None, ...
Automatic Metadata Creation and Augmentation from Sources
We can use a Large Language Model to make a first draft of metadata from a source document or documents.
We can create metadata from source files such as:
- pdfs
- word
- excel
- powerpoint
- text files
- csv
- XML
- ZIP files
- Images
docs = ["survey_records/cambodia/cambodia_lsms_basic_information_document.pdf", "survey_records/cambodia/cambodia_living_standards_measurement_study_plus_manual_english.pdf"]
example = me.draft_metadata_from_files(openai_api_key=openai_key,
files=docs,
metadata_type_or_template_uid='microdata',
output_mode='pydantic',
metadata_producer_organization="The World Bank Group, DEC - Development Data Group"
)
The files are read in and sent to the LLM for processing.
Read in survey_records/cambodia/cambodia_lsms_basic_information_document.pdf, running token count is 6373
Read in survey_records/cambodia/cambodia_living_standards_measurement_study_plus_manual_english.pdf, running token count is 24901
Sending to OpenAI, this may take a few minutes...
We can then view the new metadata
example.pretty_print()
which gives
IHSN_DDI_2-5_Template_v01_EN( doc_desc=doc_desc( producers=[ Producer( name='The World Bank Group, DEC - Development Data Group', abbr='WBG', affiliation='World Bank', role='Metadata producer' ) ], prod_date='2025-01-28', idno='CAMBODIA_LSMS_PLUS_2019_2020_v01_EN', version_statement=version_statement( version='1.0', version_date='2025-01-28', version_resp='', version_notes='First draft of the metadata for the Cambodia Living Standards Measurement Study - Plus (LSMS+) 2019-20.' ) ), ... )
Contributing
Setting up the python environment
This library uses Poetry for dependency management (https://python-poetry.org/docs/basic-usage/).
In your python environment run pip install poetry then navigate to the pymetadataeditor folder and run poetry install or, if that doesn't work, try python -m poetry install.
Development python environment
If you want to make changes to this repo then you also need to install the tools used for development but which aren't used otherwise, for example pytest.
Run:
poetry install --with dev
poetry run pre-commit install
Poetry troubleshooting
If you are running on Windows and see errors about numpy installation errors then it could be an issue with Windows file paths. With default settings, file paths that exceed a few hundred characters can cause installation problems. To overcome this you can either
- enable long path support in Windows (https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=powershell#enable-long-paths-in-windows-10-version-1607-and-later)
- install python libraries in a folder in the current directory by running
poetry config virtualenvs.in-project trueand then runningpoetry install
Markdown API Documentation
Create documentation for the pyMetadataEditor class by running the following command:
python make_docs.py
Notes
In keeping with World Bank Group practice, it should be noted that parts of this code base were written with the assistance of ChatGPT.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymetadataeditor-0.3.1.tar.gz.
File metadata
- Download URL: pymetadataeditor-0.3.1.tar.gz
- Upload date:
- Size: 39.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5441712ffb29f7dba65cf064ab06dff9355bd3bdaab52ba61eafd7316b326369
|
|
| MD5 |
4900e6bfdb4d4a0fe76a6d36289fb2e7
|
|
| BLAKE2b-256 |
17ca6aec147bfa0b2dd0191fe860560080af0db5a4340027c1efaa026c571fbc
|
File details
Details for the file pymetadataeditor-0.3.1-py3-none-any.whl.
File metadata
- Download URL: pymetadataeditor-0.3.1-py3-none-any.whl
- Upload date:
- Size: 39.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4d373b128232587b5f2552af1ffa67f320bee4d3d1938dbd5eb749fb0431112
|
|
| MD5 |
af2804a80392876ca62eb430430e097c
|
|
| BLAKE2b-256 |
414302489d89672f403add575ed130b30113b6ffe222e7e790b030a985dbf2cd
|