Skip to main content

Text and image analysis of NB's digital collection

Project description

DHLAB

dhlab is a python library for accessing reduced representations of text and pictures at the National Library of Norway (NLN), Nasjonalbiblioteket (NB) in Norwegian. It is developed and maintained by The Digital Humanities lab group.

The python package includes wrapper functions for the API (Application Programming Interface) that can be used to query the texts in NB Digital, the NLN's digital collection of books and newspapers.

The API allows for textual qualitative and quantitative analyses of the digital texts by generating e.g. word frequency lists, concordances, collocations, n-grams, as well as extracting names and narrative graphs.

Analyses can be performed on both a single document, and on a larger corpus. It is also possible to build one's own corpora based on bibliographic metadata.

The Jupyter Notebooks in the digital_tekstanalyse repo show examples on how to use the library, and can be used directly in your browser without prior programming experience.

Installation

Install dhlab in your terminal with pip:

pip install dhlab

Example use

You could start by building your own corpus, e.g. of books published between 1814 and 1905:

from dhlab.text import Corpus

book_corpus = Corpus(doctype="digibok", from_year=1814, to_year=1905)

Contact

If you have any questions, or run into any problems with the code, please log them in our issue tracker in the DHLAB repo.

Changelog

v2.1.0 (2022-05-13)

Feat

  • geodata

v2.0.25 (2022-04-27)

Fix

  • setup.cfg: make package dhlab importable

v2.0.24 (2022-04-19)

Fix

  • add missing newline (#50)

v2.0.23 (2022-04-01)

Fix

  • github-workflows: change github access token (#47)
  • github-workflows: change github access token (#46)

Refactor

  • expose dhlab v1 modules

v2.0.22 (2022-03-22)

Fix

  • import all legacy modules in __init__.py

Refactor

  • move dhlab_v1 code into its own subpackage
  • docs/package_summary.rst: add reference table for legacy code

v2.0.21 (2022-03-21)

Refactor

  • constants: add global variables for URLs in constants.py
  • Reformat code with pep8 tools
  • turn relative imports into absolute imports
  • simplify and reduce expressions
  • rename classes with CamelCase

Docs

  • README: add "Example use"
  • add docstrings in subpackages
  • add docs/CHANGELOG.md
  • docs: add *.rst documentation files
  • add autosummary of whole dhlab package
  • logo: update logo image
  • add jupyter integration and toggle feature
  • add copybutton to code blocks
  • add docstrings and make functions private

v2.0.20geo (2022-03-02)

Feat

  • dhlab.api.dhlab_api: add function get_places
  • text.geo_data: add class GeoData

Fix

  • text.dispersion: pass **kwargs to plot()

v.2.0.18dispersion (2022-02-21)

Feat

  • text.dispersion: add class Dispersion
  • api.dhlab_api: add get_dispersion

Fix

  • requirements: remove wordcloud

v2.0.17params (2022-02-08)

Refactor

  • text.corpus: add parameter fulltext
  • api.dhlab_api.document_corpus: add parameter fulltext
  • text.conc_coll.Concordance: add parameters window and limit
  • text.conc_coll.Collocations: add parameter samplesize

Fix

  • text.corpus.urnlist: fix urnlist assignment

v2.0.12.chunk (2022-01-29)

Refactor

  • text.chunking: add attribute self.chunks

Fix

  • imports

v2.0.10chunks (2022-01-29)

Feat

  • text.conc_coll: add class Counts
  • text.corpus: add class Corpus_from_identifiers
  • text.chunking: add class Chunks
  • text.chunking: add functions get_chunks, get_chunks_para

Fix

  • imports
  • dhlab_api.get_chunks: return dict not dataframe
  • apply autopep8

v2.0.5 (2022-01-19)

Refactor

  • nbtokenizer: edit tokens for mail and web addresses

Feat

  • add Tokens class

Fix

  • imports

v2.0.2a (2022-01-18)

Fix

  • typecheck of corpus objects

v2.0.1.alpha6 (2022-01-18)

  • changed wordcloud import
  • fixed corpus transfer in conc_coll

v2.0.0.beta (2022-01-18)

Feat

  • add get_file_from_github, download_from_github in utils

Refactor

  • New package structure

Docs

  • include installation instructions in README

v1.0.0 (2022-01-06)

  • Set up Github Actions to run automatic linting and testing
  • Set up documentation pages
  • Include documentation of the code in docstrings

Fix

  • address linting issues from flake8
  • reformat code

Feat

  • add documentation summaries for all modules
  • add documentation for the repo
  • add docstrings from README.md to nbtext.py
  • add pylint config file

Refactor

  • reduce code duplication
  • update workflow file reference
  • change str.format to f-strings
  • optimize imports
  • rename workflow that packages and publishes dhlab to pypi
  • use default publish workflow
  • reduce compatible python versions
  • update publishing workflow
  • type out scope for linting explicitly
  • move pylint.yml

v0.75 (2019-09-09)

  • Inital release to pypi

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dhlab-2.1.0.tar.gz (188.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dhlab-2.1.0-py3-none-any.whl (168.5 kB view details)

Uploaded Python 3

File details

Details for the file dhlab-2.1.0.tar.gz.

File metadata

  • Download URL: dhlab-2.1.0.tar.gz
  • Upload date:
  • Size: 188.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for dhlab-2.1.0.tar.gz
Algorithm Hash digest
SHA256 1facfed70d1fccd11a9e9836c747df33d7984697d64f78b67ef826fb2ce82e93
MD5 729255274f38cc60194695b13c06117b
BLAKE2b-256 d368a708239297a004e989cf51b04c0999af76d7872afb02019ef5b81f0167c6

See more details on using hashes here.

File details

Details for the file dhlab-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: dhlab-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 168.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for dhlab-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 40eaa5ce548ab759c3ba3f4d789fd7317915157243cfe3f433c4f9441b8e8c82
MD5 f3ce7256f8a1b704d09a319fdda465f6
BLAKE2b-256 7407bbf0cb7d5d1a96359ea2286340939868ee3a06617029e4b650a780a00a93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page