Library to import newspaper data from a variety of OCR formats into Impresso's JSON format.
Project description
Impresso Text Preparation
The Impresso TextImporter is a library and a collection of scripts to import newspaper data from a variety of formats (e.g. Olive XML, various flavors of Mets/Alto XML, etc.) into Impresso’s JSON format.
Please refer to the documentation for further information on this library.
Installation
With pip:
pip install impresso-text-preparation
Usage
TODO: document usage here
About Impresso
Impresso project
Impresso - Media Monitoring of the Past is an interdisciplinary research project that aims to develop and consolidate tools for processing and exploring large collections of media archives across modalities, time, languages and national borders. The first project (2017-2021) was funded by the Swiss National Science Foundation under grant No. CRSII5_173719 and the second project (2023-2027) by the SNSF under grant No. CRSII5_213585 and the Luxembourg National Research Fund under grant No. 17498891.
Copyright
Copyright (C) 2024 The Impresso team.
License
This program is provided as open source under the GNU Affero General Public License v3 or later.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file impresso_text_preparation-3.1.0.tar.gz.
File metadata
- Download URL: impresso_text_preparation-3.1.0.tar.gz
- Upload date:
- Size: 238.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a0d48964717ebeac3232ada9a2090f066766bf0531ded835de7f83ddab772da
|
|
| MD5 |
083705c197520151e2f15d44e0a7fe38
|
|
| BLAKE2b-256 |
0e061cab3465b506ec8c7962f15941f2c72cc34c7596ecbc701f629e15981b21
|
File details
Details for the file impresso_text_preparation-3.1.0-py3-none-any.whl.
File metadata
- Download URL: impresso_text_preparation-3.1.0-py3-none-any.whl
- Upload date:
- Size: 285.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10eeeb2cd54cb511122fe513e89c8d2933e828a3a9a92da91a658abaf8c1a34b
|
|
| MD5 |
c6ca35df6311dced9528f746d4d2eff1
|
|
| BLAKE2b-256 |
08097d3d1aa2043ab412f2bcb1fcc5208226601326b6fb137bd4bfd5fc925dc4
|