Skip to main content

Library to import newspaper data from a variety of OCR formats into Impresso's JSON format.

Project description

Impresso Text Importer

Documentation Status PyPI version PyPI - License

The Impresso TextImporter is a library and a collection of scripts to import newspaper data from a variety of formats (e.g. Olive XML, various flavors of Mets/Alto XML, etc.) into Impresso’s JSON format.

Please refer to the documentation for further information on this library.

Installation

With pip:

pip install impresso-text-importer

License

The second project 'impresso - Media Monitoring of the Past II. Beyond Borders: Connecting Historical Newspapers and Radio' is funded by the Swiss National Science Foundation (SNSF) under grant number CRSII5_213585 and the Luxembourg National Research Fund under grant No. 17498891.

Aiming to develop and consolidate tools to process and explore large-scale collections of historical newspapers and radio archives, and to study the impact of this tooling on historical research practices, Impresso II builds upon the first project – 'impresso - Media Monitoring of the Past' (grant number CRSII5_173719, Sinergia program). More information at https://impresso-project.ch.

Copyright (C) 2024 The impresso team (contributors to this program: Matteo Romanello, Maud Ehrmann, Alex Flückinger, Edoardo Tarek Hölzl, Pauline Conti).

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU Affero General Public License for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impresso_text_importer-1.2.0.tar.gz (172.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

impresso_text_importer-1.2.0-py3-none-any.whl (199.1 kB view details)

Uploaded Python 3

File details

Details for the file impresso_text_importer-1.2.0.tar.gz.

File metadata

  • Download URL: impresso_text_importer-1.2.0.tar.gz
  • Upload date:
  • Size: 172.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for impresso_text_importer-1.2.0.tar.gz
Algorithm Hash digest
SHA256 940369bf2c744b56e5b8762a4d52ffba60065e770091333d035f0d7a939cba82
MD5 53ae93e78be4c9a65f0c1d27a2641a03
BLAKE2b-256 46a3bb41ca12b1c99ed92068831302649c25e873fd94ea7c60bcd3408d73a598

See more details on using hashes here.

File details

Details for the file impresso_text_importer-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for impresso_text_importer-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e63c6ddbcd956ed2855563db6f44da4073091a4f2a0a762591b8ec8a2d05968
MD5 4dc72003e020072e3914ba0c7d96308e
BLAKE2b-256 4f32ff317bd1fa6faeb351853692561726f4db8704157864cc5a1837bbad2dc7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page