Skip to main content

Text and image analysis of NB's digital collection

Project description

Binder

DHLAB

dhlab is a python library for doing qualitative and quantitative analyses of the digital texts from nettbiblioteket (eng: "the online library") at the National Library of Norway (NLN). Nettbiblioteket is the NLN's digital collection of media publications.

Check out our documentation for more info.

Install dhlab from github repo

Open you terminal in the file location you will work with DHLAB.

git clone https://github.com/NationalLibraryOfNorway/DHLAB.git
cd DHLAB
pip install -U -e .

Contact

The code here is developed and maintained by The Digital Humanities lab group.

If you have any questions, or run into any problems with the code, please log them in our issue tracker.

Changelog

v2.26.1 (2023-04-12)

Fix

  • counts: bug on from_df method

v2.26.0 (2023-03-16)

Feat

  • future: ngram + fix (#161)

v2.25.0 (2023-03-03)

v2.24.0 (2023-03-03)

Feat

  • konkordanser basert på ord er lagt til

v2.23.3 (2023-03-01)

Fix

  • counts: (#158)

v2.23.2 (2023-02-24)

Fix

  • Counts: (#157)

v2.23.1 (2023-02-24)

Fix

  • future: func names (#156)

v2.23.0 (2023-02-24)

Feat

  • future: (#154)

v2.22.2 (2023-02-21)

Fix

  • natbib, Counts: add functools.wraps (#150)

v2.22.1 (2023-02-16)

v2.22.0 (2023-02-16)

v2.21.2 (2023-02-16)

Fix

  • freq: add freq Counts method to corpus

v2.21.1 (2023-02-15)

Fix

  • Counts: fix counts

v2.21.0 (2023-02-13)

v2.20.0 (2023-02-09)

Feat

  • metadata: update metadata service tools

v2.19.0 (2023-01-25)

Feat

  • added urncount - removed wordcloud message

v2.18.1 (2023-01-23)

v2.18.0 (2023-01-18)

Feat

  • text: (#133)

v2.17.1 (2023-01-16)

Fix

  • parameter passing wildcard

v2.17.0 (2023-01-16)

Feat

  • added wildcard search for words

v2.16.0 (2023-01-16)

Feat

  • heatmap: add heatmap wrapper to viz package

v2.15.1 (2023-01-16)

Fix

  • EmptyCorpus (#129)

v2.15.0 (2023-01-16)

Feat

  • corpus: (#128)

Fix

  • bump-version: disable triggering new workflow [skip-ci] (#127)

v2.14.0 (2023-01-13)

Feat

  • visualize, ngram, wordbank: (#126)

v2.13.3 (2023-01-09)

Fix

  • handle API errors

v2.13.2 (2023-01-06)

Fix

  • dhlab_api: construct df from response.json (#124)

v2.13.1 (2022-12-08)

Fix

  • Pos0 (#123)

v2.13.0 (2022-12-07)

Feat

  • api: added start page and to page for NER and POS (#122)

v2.12.0 (2022-12-07)

Feat

  • ngram: make ngram subclass of dhlabobj (#121)

v2.11.2 (2022-11-24)

Fix

  • ngram: lang param is not supported for avis
  • requirements: add scipy to reqirements

v2.11.1 (2022-11-14)

Feat

  • totals: Add totals() to top level
  • DhlabObj: add DhlabObj
  • Corpus: allow empty Corpus and construct df

Fix

  • conc_coll: fix repr functions

Refactor

  • corpus: change add()

v2.10.0 (2022-10-26)

Feat

  • added image search in bokhylla books to api

v2.9.5 (2022-10-26)

Fix

  • nbtokenizer now handles numbers combined with words, e.g. '1800-tallet'

v2.9.4 (2022-10-04)

Fix

  • relative and absolute

v2.9.3 (2022-10-04)

Fix

  • ngram smooth param

v2.9.2 (2022-10-03)

Fix

  • lang added to ngram

v2.9.1 (2022-09-12)

Fix

  • revert ngram_conv()

v2.9.0 (2022-09-05)

Feat

  • added reference for words

v2.8.2 (2022-09-01)

Fix

  • corpus: : extend_from_identifiers (#88)

    need to be able to work with dataframe with urns

v2.8.1 (2022-08-30)

Fix

  • delete whitespace

v2.8.0 (2022-08-30)

Feat

  • added spacy pos parse (pos, lemma, dependency)

Refactor

  • corpus: ignore index in corpus.add (#84)

v2.7.0 (2022-08-24)

  • automatic bump with features from 2.6.x versions

v2.6.3 (2022-08-23)

Feat

  • add Empty Corpus to dhlab.init

Fix

  • dewey search in corpus builder

v2.6.2 (2022-08-17)

Feat

  • add corpus default limit
  • add docstring to concordance

Fix

  • revert gitignore
  • revert Collocation

v2.6.1 (2022-08-15)

Fix

  • error in pandas corpus (#75)

v2.6.0 (2022-08-08)

Feat

  • code for word evaluations

v2.5.0 (2022-08-05)

Feat

  • geolocation

v2.4.0 (2022-07-12)

Feat

  • ner with spaCy

v2.3.6 (2022-07-12)

Fix

  • nb_ngram to point to new endpoint

v2.3.0 (2022-06-02)

Feat

  • added access to Norsk Ordbank, wordbank

v2.2.0 (2022-05-13)

Feat

  • ngram, geodata

v2.1.0 (2022-05-13)

Feat

  • geodata

v2.0.25 (2022-04-27)

Fix

  • setup.cfg: make package dhlab importable

v2.0.24 (2022-04-19)

Refactor

  • expose dhlab v1 modules

v2.0.22 (2022-03-22)

Fix

  • import all legacy modules in __init__.py

Refactor

  • move dhlab_v1 code into its own subpackage
  • docs/package_summary.rst: add reference table for legacy code

v2.0.21 (2022-03-21)

Refactor

  • constants: add global variables for URLs in constants.py
  • Reformat code with pep8 tools
  • turn relative imports into absolute imports
  • simplify and reduce expressions
  • rename classes with CamelCase

Docs

  • README: add "Example use"
  • add docstrings in subpackages
  • add docs/CHANGELOG.md
  • docs: add *.rst documentation files
  • add autosummary of whole dhlab package
  • logo: update logo image
  • add jupyter integration and toggle feature
  • add copybutton to code blocks
  • add docstrings and make functions private

v2.0.20geo (2022-03-02)

Feat

  • dhlab.api.dhlab_api: add function get_places
  • text.geo_data: add class GeoData

Fix

  • text.dispersion: pass **kwargs to plot()

v.2.0.18dispersion (2022-02-21)

Feat

  • text.dispersion: add class Dispersion
  • api.dhlab_api: add get_dispersion

Fix

  • requirements: remove wordcloud

v2.0.17params (2022-02-08)

Refactor

  • text.corpus: add parameter fulltext
  • api.dhlab_api.document_corpus: add parameter fulltext
  • text.conc_coll.Concordance: add parameters window and limit
  • text.conc_coll.Collocations: add parameter samplesize

Fix

  • text.corpus.urnlist: fix urnlist assignment

v2.0.12.chunk (2022-01-29)

Refactor

  • text.chunking: add attribute self.chunks

v2.0.10chunks (2022-01-29)

Feat

  • text.conc_coll: add class Counts
  • text.corpus: add class Corpus_from_identifiers
  • text.chunking: add class Chunks
  • text.chunking: add functions get_chunks, get_chunks_para

Fix

  • imports
  • dhlab_api.get_chunks: return dict not dataframe
  • apply autopep8

v2.0.5 (2022-01-19)

Refactor

  • nbtokenizer: edit tokens for mail and web addresses

Feat

  • add Tokens class

Fix

  • imports

v2.0.2a (2022-01-18)

Fix

  • typecheck of corpus objects

v2.0.1.alpha6 (2022-01-18)

  • changed wordcloud import
  • fixed corpus transfer in conc_coll

v2.0.0.beta (2022-01-18)

Feat

  • add get_file_from_github, download_from_github in utils

Refactor

  • New package structure

Docs

  • include installation instructions in README

v1.0.0 (2022-01-06)

  • Set up Github Actions to run automatic linting and testing
  • Set up documentation pages
  • Include documentation of the code in docstrings

Fix

  • address linting issues from flake8
  • reformat code

Feat

  • add documentation summaries for all modules
  • add documentation for the repo
  • add docstrings from README.md to nbtext.py
  • add pylint config file

Refactor

  • reduce code duplication
  • update workflow file reference
  • change str.format to f-strings
  • optimize imports
  • rename workflow that packages and publishes dhlab to pypi
  • use default publish workflow
  • reduce compatible python versions
  • update publishing workflow
  • type out scope for linting explicitly
  • move pylint.yml

v0.75 (2019-09-09)

  • Inital release to pypi

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dhlab-2.26.1.tar.gz (209.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dhlab-2.26.1-py3-none-any.whl (188.1 kB view details)

Uploaded Python 3

File details

Details for the file dhlab-2.26.1.tar.gz.

File metadata

  • Download URL: dhlab-2.26.1.tar.gz
  • Upload date:
  • Size: 209.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for dhlab-2.26.1.tar.gz
Algorithm Hash digest
SHA256 dba7b852a4106bf489a26c07fa97332e8ce092b77e649e84a8f2e0b79ebe95b9
MD5 94ad5849526fd8138b6df27971db3f6f
BLAKE2b-256 643cd45877ccce9ca36f8eeecf962e8198a06d895dbac84b725c17eec04d7e42

See more details on using hashes here.

File details

Details for the file dhlab-2.26.1-py3-none-any.whl.

File metadata

  • Download URL: dhlab-2.26.1-py3-none-any.whl
  • Upload date:
  • Size: 188.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for dhlab-2.26.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3856cc3d0c1258a87c21729e7b6def62163a5e055d7b7cfe44a012209ed510fc
MD5 e3006658becf68c7a73a8f5ea0f0ab8c
BLAKE2b-256 073254a9d2e797c7622c07f0e27e0015c3ac73b6ec6800f14c5d0699fc0a6c18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page