Code to tidy archivist catalogues into human- and computer-readable formats.
Project description
The Tidy Archive Catalogues project
The aim of the Tidy Archive Catalogues project is to enable archivist catalogues to be human- and computer-readable. We want to create a python package which will:
- enable researchers to analyse and visualise at the level of a catalogue or a subset of the catalogue that they find interesting.
- make checking for mistakes in archiving (typos, or non-compliance with archiving standards) easy for archivists, who work quickly and make many subtle decisions.
Road-map
The project currently aims to have the following functionality:
- Enable archivists to search their archives for potential typos/mistakes. Specifically, they will be able to be returned with: A. Specific places to check in the archive (e.g. specific reference numbers/codes, and columns) for mistakes, particularly for: i. Dates ii. Named entities (places, people, and businesses) B. Flag inconsistencies in the archive, according to a chosen set of archiving guidelines, for example if 'c.', 'c', 'circa' are used to mean approximately, offer one option. i. According to national archives guidelines ii. According to Theatre Collection guidelines
- Enable digital humanities researchers to digitally visualise/analyse the collection, by creating machine-readable and human-readable (for labels, etc) versions of the following c: A. Dates, date ranges, and date uncertainty B. Named entities (places, people, and businesses)
The output of (2.) should give the data in a useful format to interact with existing python/R libraries.
For example dates should be in a datetime format, places should be either points in longitude/latitude, or polygonal areas (e.g. in geojson), or easy to convert to. There should be tutorial notebooks, for creating simple visualisations of: A. Social networks B. Geographical areas C. Timelines
Both (1) and (2) should be tested on large archives (e.g. British Library and The National Collection), and smaller ones (e.g. The Theatre Collection, and the Harry Ransom Centre)
Contributors
Contributors are recognised using all-contributors guidelines.
Natalie Thurlby Julian Warren Jo Elseworth Elaine McGirr Emma Howgill
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tidy-archive-catalogues-0.0.1.tar.gz
.
File metadata
- Download URL: tidy-archive-catalogues-0.0.1.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5a904b41e031443ba18156d497a8aa9ad705be05635183c972f0555cd179721 |
|
MD5 | 583b86fc33805371ea74db35a1a544e9 |
|
BLAKE2b-256 | e1e5d87f744aacc6930a6df63437284726e9656ea27bbd6c9a75fc8a39e12120 |
File details
Details for the file tidy_archive_catalogues-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: tidy_archive_catalogues-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfcea243dba014e1196bfcf1fa36e2c64c5e50c4d7ee70b2353404d3cf1d012a |
|
MD5 | f5d9a6ddc0894b969b7dc3337654112a |
|
BLAKE2b-256 | 5f0b3ca508dbe34dddbba36b3b1f15e420ef4be1757bde656a73578742922448 |