Metadata Editor client for python

These details have not been verified by PyPI

Project description

pyMetadataEditor

A tool connected to Metadata Editor for creating, editing and managing metadata for microdata, indicators, geospatial data, documents, scripts, images and videos.

How to use pyMetadataEditor

from pymetadataeditor import MetadataEditor
import os

your_api_key = os.getenv("API_KEY")
api_url = os.getenv("API_URL")
me = MetadataEditor(api_url=api_url, api_key=your_api_key, verify_ssl=False)

Listing your projects

me.list_projects(limit=8)

	type	idno	study_idno	title	abbreviation	nation	year_start	year_end
id
1003	document	12345	DOC_001	Sample Document 1	SD1	Example Nation	2020	2025
1002	survey	67890	SURVEY_002	Sample Survey 2	SS2	Example Nation	2019	2024
1001	timeseries	54321	TS_003	Time Series 3	TS3	Another Nation	2021	2026

Creating a new indicator project

demo_name = "GB20241030_demo"

series_description = {
                        "idno": demo_name,
                        "doi": "V1",
                        "name": "Version 1",
                        "display_name": "Version 1"
                     }

indicator_id = me.create_project_log({"idno": demo_name, "series_description": series_description}, "indicator")

Starting with outlines

The metadata can be both large and hierarchical. Starting with a skeleton outline makes things easier.

Outlines are available in three modes - dictionary, pydantic model and as an Excel file.

Dictionaries

Dictionaries are created like so:

indicator_dict = me.make_metadata_outline('indicator', output_mode='dict')
indicator_dict

{'metadata_information': {'title': None,
  'idno': None,
  'producers': [{'name': '', 'abbr': None, 'affiliation': None, 'role': None}],
  'prod_date': None,
  ...
    'email': None,
    'telephone': None,
    'uri': None}]},
 'tags': [{'tag': None, 'tag_group': None}]}

Pydantic

Pydantic is a nice python library for defining and validating data schemas. An outline for the indicator schema can be created like so:

indicator_pydantic = me.make_metadata_outline('indicator', 'pydantic')
indicator_pydantic

giving

IHSN_INDICATOR_1-0_Template_v01_EN(metadata_information=metadata_information(title=None, idno=None, producers=[Producer(name='', abbr=None, affiliation=None, role=None)], prod_date=None, ...

It can be updated using dot notation, for example:

indicator_pydantic.metadata_information.producers[0].name = "example_producer"
indicator_pydantic

giving

IHSN_INDICATOR_1-0_Template_v01_EN(metadata_information=metadata_information(title=None, idno=None, producers=[Producer(name='example_producer', abbr=None, affiliation=None, role=None)], prod_date=None, ...

Excel

Finally, a nicely formatted Excel file can be created into which the metadata can be written, with the name of the metadata type or of the default template used as the filename if no filename is explicitly given.

outline_filename = me.make_metadata_outline('indicator', 'excel')

And then read back in from Excel like so:

indicator_excel = me.read_metadata_from_excel(outline_filename)

Retreiving existing metadata

Likewise, existing projects can be downloaded as either dictionaries, pydantic models or as excel spreadsheets.

Asking for the metadata as a pydantic object

demo_pydantic = me.get_project_metadata_by_id(indicator_id, 'pydantic')
demo_pydantic

which gives:

IHSN_INDICATOR_1-0_Template_v01_EN(metadata_information=metadata_information(title=None, idno=None, producers=[Producer(name='', abbr=None, affiliation=None, role=None)], prod_date=None, ...

Automatic Metadata Creation and Augmentation from Sources

We can use a Large Language Model to make a first draft of metadata from a source document or documents.

We can create metadata from source files such as:

pdfs
word
excel
powerpoint
text files
csv
XML
ZIP files
Images

docs = ["survey_records/cambodia/cambodia_lsms_basic_information_document.pdf", "survey_records/cambodia/cambodia_living_standards_measurement_study_plus_manual_english.pdf"]

example = me.draft_metadata_from_files(openai_api_key=openai_key, 
                                       files=docs, 
                                       metadata_type_or_template_uid='microdata',
                                       output_mode='pydantic',
                                       metadata_producer_organization="The World Bank Group, DEC - Development Data Group"
                                       )

The files are read in and sent to the LLM for processing.

Read in survey_records/cambodia/cambodia_lsms_basic_information_document.pdf, running token count is 6373
Read in survey_records/cambodia/cambodia_living_standards_measurement_study_plus_manual_english.pdf, running token count is 24901
Sending to OpenAI, this may take a few minutes...

We can then view the new metadata

example.pretty_print()

which gives

IHSN_DDI_2-5_Template_v01_EN(
    doc_desc=doc_desc(
        producers=[
            Producer(
                name='The World Bank Group, DEC - Development Data Group',
                abbr='WBG',
                affiliation='World Bank',
                role='Metadata producer'
            )
        ],
        prod_date='2025-01-28',
        idno='CAMBODIA_LSMS_PLUS_2019_2020_v01_EN',
        version_statement=version_statement(
            version='1.0',
            version_date='2025-01-28',
            version_resp='',
            version_notes='First draft of the metadata for the Cambodia Living Standards Measurement Study - Plus 
(LSMS+) 2019-20.'
        )
    ),
    ...

)

Contributing

Setting up the python environment

This library uses Poetry for dependency management (https://python-poetry.org/docs/basic-usage/).

In your python environment run pip install poetry then navigate to the pymetadataeditor folder and run poetry install or, if that doesn't work, try python -m poetry install.

Development python environment

If you want to make changes to this repo then you also need to install the tools used for development but which aren't used otherwise, for example pytest.

Run:

poetry install --with dev poetry run pre-commit install

Poetry troubleshooting

If you are running on Windows and see errors about numpy installation errors then it could be an issue with Windows file paths. With default settings, file paths that exceed a few hundred characters can cause installation problems. To overcome this you can either

enable long path support in Windows (https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=powershell#enable-long-paths-in-windows-10-version-1607-and-later)
install python libraries in a folder in the current directory by running poetry config virtualenvs.in-project true and then running poetry install

Markdown API Documentation

Create documentation for the pyMetadataEditor class by running the following command:

python make_docs.py

Notes

In keeping with World Bank Group practice, it should be noted that parts of this code base were written with the assistance of ChatGPT.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.2

Dec 12, 2025

0.3.1

May 23, 2025

0.3.0

Apr 4, 2025

0.2.0

Feb 6, 2025

0.1.1

Jan 29, 2025

0.1.0

Jan 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymetadataeditor-0.3.2.tar.gz (39.6 kB view details)

Uploaded Dec 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pymetadataeditor-0.3.2-py3-none-any.whl (39.3 kB view details)

Uploaded Dec 12, 2025 Python 3

File details

Details for the file pymetadataeditor-0.3.2.tar.gz.

File metadata

Download URL: pymetadataeditor-0.3.2.tar.gz
Upload date: Dec 12, 2025
Size: 39.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/24.6.0

File hashes

Hashes for pymetadataeditor-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`f93609df63bd7cdc4a31f72fde60db7b2ae17bb7a5ba7ad701827551cb2375ba`
MD5	`b3c66f601cc1da2e64096cd8e2b1c40a`
BLAKE2b-256	`55e8da7a68bafea2af93400bc846f0893689de6dca85164395a1e7c4d12a627c`

See more details on using hashes here.

File details

Details for the file pymetadataeditor-0.3.2-py3-none-any.whl.

File metadata

Download URL: pymetadataeditor-0.3.2-py3-none-any.whl
Upload date: Dec 12, 2025
Size: 39.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/24.6.0

File hashes

Hashes for pymetadataeditor-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6ba0e9204fde1d8bb84167cbe21008fd6f675e4e854992392b779c7a3062e665`
MD5	`4f91c7577a98ab985cf6cc8d46858f95`
BLAKE2b-256	`56b9da3e53fb31e5db637765e734ed3ac0eb360a350810ae2d5b024e242ee8cd`

See more details on using hashes here.

pymetadataeditor 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

pyMetadataEditor

How to use pyMetadataEditor

Listing your projects

Creating a new indicator project

Starting with outlines

Dictionaries

Pydantic

Excel

Retreiving existing metadata

Automatic Metadata Creation and Augmentation from Sources

Contributing

Setting up the python environment

Development python environment

Poetry troubleshooting

Markdown API Documentation

Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes