Automated subject indexing and classification tool
Project description
Annif is an automated subject indexing toolkit. It was originally created as a statistical automated indexing tool that used metadata from the Finna.fi discovery interface as a training corpus.
This repo contains a rewritten production version of Annif based on the prototype. It is a work in progress, but already functional for many common tasks.
Basic install
You will need Python 3.6+ to install Annif.
The recommended way is to install Annif from PyPI into a virtual environment.
python3 -m venv annif-venv
source annif-venv/bin/activate
pip install annif
You will also need NLTK data files:
python -m nltk.downloader punkt
Start up the application:
annif
See Getting Started in the wiki for more details.
Docker install
You can use Annif as a pre-built Docker container. Please see the wiki documentation for details.
Development install
A development version of Annif can be installed by cloning the GitHub repository.
Installation and setup
Clone the repository.
Switch into the repository directory.
Create and activate a virtual environment (optional, but highly recommended):
python3 -m venv venv
. venv/bin/activate
Install dependencies (including development) and make the installation editable:
pip install .[dev]
pip install -e .
You will also need NLTK data files:
python -m nltk.downloader punkt
Start up the application:
annif
Unit tests
Run . venv/bin/activate
to enter the virtual environment and then run pytest
.
To have the test suite watch for changes in code and run automatically, use
pytest-watch by running ptw
.
Getting help
Many resources are available:
- Usage documentation in the wiki
- Annif tutorial for learning to use Annif
- annif-users discussion forum
- Internal API documentation on ReadTheDocs
- annif.org project web site
Publications / How to cite
An article about Annif has been published in the peer-reviewed Open Access journal LIBER Quarterly. The software itself is also archived on Zenodo and has a citable DOI.
Annif article
Suominen, O., 2019. Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly, 29(1), pp.1–25. DOI: https://doi.org/10.18352/lq.10285
@article{suominen2019annif,
title={Annif: DIY automated subject indexing using multiple algorithms},
author={Suominen, Osma},
journal={{LIBER} Quarterly},
volume={29},
number={1},
pages={1--25},
year={2019},
doi = {10.18352/lq.10285},
url = {https://doi.org/10.18352/lq.10285}
}
Citing the software itself
Zenodo DOI: https://doi.org/10.5281/zenodo.2578948
@misc{https://doi.org/10.5281/zenodo.2578948,
doi = {10.5281/ZENODO.2578948},
url = {https://doi.org/10.5281/zenodo.2578948},
title = {NatLibFi/Annif},
year = {2019}
}
License
The code in this repository is licensed under Apache License 2.0, except for the
dependencies included under annif/static/css
and annif/static/js
,
which have their own licenses. See the file headers for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.