Library for text and image analysis by the Digital Humanities lab (DH-lab)
Project description
DHLAB
dhlab
is a python library for accessing reduced representations of text and pictures at
the National Library of Norway (NLN), Nasjonalbiblioteket (NB) in Norwegian.
It is developed and maintained by The Digital Humanities lab group.
The python package includes wrapper functions for the API (Application Programming Interface) that can be used to query the texts in NB Digital, the NLN's digital collection of books and newspapers.
The API allows for textual qualitative and quantitative analyses of the digital texts by generating e.g. word frequency lists, concordances, collocations, n-grams, as well as extracting names and narrative graphs.
Analyses can be performed on both a single document, and on a larger corpus. It is also possible to build one's own corpora based on bibliographic metadata.
The Jupyter Notebooks in the digital_tekstanalyse repo show examples on how to use the library, and can be used directly in your browser without prior programming experience.
Installation
Install dhlab
in your terminal with pip:
pip install dhlab
Example use
You could start by building your own corpus, e.g. of books published between 1814 and 1905:
from dhlab.text import Corpus
book_corpus = Corpus(doctype="digibok", from_year=1814, to_year=1905)
Contact
If you have any questions, or run into any problems with the code, please log them in our issue tracker in the DHLAB repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.