Skip to main content

Library to import newspaper data from a variety of OCR formats into Impresso's JSON format.

Project description

Impresso Text Preparation

Documentation Status PyPI version PyPI - License

The Impresso TextImporter is a library and a collection of scripts to import newspaper data from a variety of formats (e.g. Olive XML, various flavors of Mets/Alto XML, etc.) into Impresso’s JSON format.

Please refer to the documentation for further information on this library.

Installation

With pip:

pip install impresso-text-preparation

Usage

TODO: document usage here

About Impresso

Impresso project

Impresso - Media Monitoring of the Past is an interdisciplinary research project that aims to develop and consolidate tools for processing and exploring large collections of media archives across modalities, time, languages and national borders. The first project (2017-2021) was funded by the Swiss National Science Foundation under grant No. CRSII5_173719 and the second project (2023-2027) by the SNSF under grant No. CRSII5_213585 and the Luxembourg National Research Fund under grant No. 17498891.

Copyright

Copyright (C) 2024 The Impresso team.

License

This program is provided as open source under the GNU Affero General Public License v3 or later.


Impresso Project Logo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impresso_text_preparation-2.3.0.tar.gz (212.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

impresso_text_preparation-2.3.0-py3-none-any.whl (257.2 kB view details)

Uploaded Python 3

File details

Details for the file impresso_text_preparation-2.3.0.tar.gz.

File metadata

File hashes

Hashes for impresso_text_preparation-2.3.0.tar.gz
Algorithm Hash digest
SHA256 0e752cc3bfa1c794e805e08d2cbb1cc0abd5718b9e1200067e01ab8e810d8bfb
MD5 08999c2a0fe2886e716faa22f9406468
BLAKE2b-256 9677d663ae8e7a3522d2e20d2bd128b97ebf62f8cbb38eb7921fcb37e08b92b3

See more details on using hashes here.

File details

Details for the file impresso_text_preparation-2.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for impresso_text_preparation-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d0c9374ee528d067c258c2a202b3f20bc9b276ea8bdb66e295aa0fb3eec3cfe
MD5 217a7dcd63bac4667d5e0574c961358d
BLAKE2b-256 958131d744f61ecb1fde1357cd5d9899152b7462b3e04b47eb2ea4621dff7adc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page