Harmony Tool for Retrospective Data Harmonisation
Project description
Harmony Python library
Who to contact?
You can contact Harmony team at https://harmonydata.org/, or Thomas Wood at http://fastdatascience.com/.
Looking to try Harmony in the browser?
Visit: https://app.harmonydata.org/
You can also visit our blog at https://harmonydata.org/
You need Tika if you want to extract instruments from PDFs
Download and install Java if you don't have it already. Download and install Apache Tika and run it on your computer https://tika.apache.org/download.html
java -jar tika-server-standard-2.3.0.jar
Installing Harmony Python package
You can install from PyPI.
pip install harmonydata
Loading instruments from PDFs
If you have a local file, you can load it into a list of Instrument
instances:
from harmony import load_instruments_from_local_file
instruments = load_instruments_from_local_file("gad-7.pdf")
Matching instruments
Once you have some instruments, you can match them with each other with a call to match_instruments
.
from harmony import match_instruments
all_questions, similarity, query_similarity = match_instruments(instruments)
all_questions
is a list of the questions passed to Harmony, in order.similarity
is the similarity matrix returned by Harmony.query_similarity
is the degree of similarity of each item to an optional query passed as argument tomatch_instruments
.
Contributing to Harmony
If you'd like to contribute to this project, you can contact us at https://harmonydata.org/ or make a pull request on our Github repository. You can also raise an issue.
Developing Harmony
Automated tests
Test code is in tests/ folder using unittest.
The testing tool tox
is used in the automation with GitHub Actions CI/CD.
Use tox locally
Install tox and run it:
pip install tox
tox
In our configuration, tox runs a check of source distribution using check-manifest (which requires your repo to be git-initialized (git init
) and added (git add .
) at least), setuptools's check, and unit tests using pytest. You don't need to install check-manifest and pytest though, tox will install them in a separate environment.
The automated tests are run against several Python versions, but on your machine, you might be using only one version of Python, if that is Python 3.9, then run:
tox -e py39
Thanks to GitHub Actions' automated process, you don't need to generate distribution files locally. But if you insist, click to read the "Generate distribution files" section.
Continuous integration/deployment to PyPI
This package is based on the template https://pypi.org/project/example-pypi-package/
This package
- uses GitHub Actions for both testing and publishing
- is tested when pushing
master
ormain
branch, and is published when create a release - includes test files in the source distribution
- uses setup.cfg for version single-sourcing (setuptools 46.4.0+)
Re-releasing the package manually
The code to re-release Harmony on PyPI is as follows:
source activate py311
pip install twine
rm -rf dist
python setup.py sdist
twine upload dist/*
License
MIT License. Copyright (c) 2023 Ulster University (https://www.ulster.ac.uk)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for harmonydata-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d26871dba2075bd94a1e66e3d3e6d61bc5d926a3938af5cbb24cf7cca7e04784 |
|
MD5 | 63449c317d920a0c70c882aa35ca4922 |
|
BLAKE2b-256 | 1bada48da22563fede4e12d9567b5c1e4e4d404cbf09880053b9e0ef5cf4f480 |