Skip to main content

This package allows the pdftextsplitter engine to communicate with a Django-database

Project description

djangotextsplitter

This package is meant as a django-extension for the pdftextsplitter package. As such, the pdftextsplitter package should be installed before this package.

This django-extension provides an out-of-the-box django-app with database models for the python-classes in pdftextsplitter. As such, it becomes possible to store the results of the pdftextsplitter package in the django database.

The django-application in this package does not contain any views, urls, templates, static files or any other functionality. Only database models (including admin-registration) and load/write functions. These models and load/write functions can then be used in other django applications, together with the pdftextsplitter engine.

Installation works like: pip install djangopdftextsplitter

List of database models

The database models in this application are:

  • textpart (corresponds to the textpart-class from the pdftextsplitter-package)
  • fontregion (corresponds to the fonregion-class from the pdftextsplitter-package)
  • lineregion (corresponds to the lineregion-class from the pdftextsplitter-package)
  • readingline (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
  • readinghistogram (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
  • title (corresponds to the title-class from the pdftextsplitter-package)
  • body (corresponds to the body-class from the pdftextsplitter-package)
  • footer (corresponds to the footer-class from the pdftextsplitter-package)
  • headlines (corresponds to the headlines-class from the pdftextsplitter-package)
  • headlines_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
  • enumeration (corresponds to the enumeration-class from the pdftextsplitter-package)
  • enumeration_hierarchy (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
  • textsplitter (corresponds to the textsplitter-class from the pdftextsplitter-package)
  • native_toc_element (corresponds to the native_toc_element-class from the pdftextsplitter-package)
  • breakdown_decision (needed to store certain information, but does not have an equivalent in the pdftextsplitter-package)
  • textalinea (corresponds to the textalinea-class from the pdftextsplitter-package)

Getting started

Within a django-environment (if the djangotextsplitter is installed in the virtual environment and registered in the django), one can simpy have access to the model by calling
from djangotextsplitter.models import textsplitter as db_textsplitter
We recommend using the 'as db_' to distinguish django database models from base classes in the pdftextsplitter-package.
Loading/writing operations can be accessed with:
from djangotextsplitter.loads import load_textsplitter
Each model that has an associated class in pdftextsplitter, has a load-function, a newwrite-function, an overwrite-function and a delete-function.
They can be called as:
from pdftextsplitter import textsplitter
from djangotextsplitter.models import textsplitter as db_textsplitter
from djangotextsplitter.loads import load_textsplitter
from djangotextsplitter.newwrites import newwrite_textsplitter
from djangotextsplitter.overwrites import overwrite_textsplitter
from djangotextsplitter.deletes import delete_textsplitter
mysplitter = load_textsplitter(31) # 31 is database primary key; in django the pk
db_splitter = newwrite_textsplitter(mysplitter) # No need for a key here, as it is appended to the list
db_splitter = overwrite_textsplitter(31,mysplitter) # 31 is database primary key; in django the pk
delete_textsplitter(31) # 31 is database primary key; in django the pk

For further details, we refer the user to the documentation of pdftextsplitter, or to the mode details documentation in the docs-folder of this package.
djangotextsplitter is not very complicated. It just provides the database models and load/newwrite/overwrite/delete functions to the pdftextsplitter package, so the pdftextsplitter package can be efficiently used from within a django webapplication.

Permissions

The admin registration of the models is done in such a way that only superusers have access to the models in the admin function, even if other users have admin-access and the permissions to view/add/change/delete them. This is done to enforce people to only change the models using the load/newwrite/overwrite/delete functions. If someone would manually change the structure of the models somewhere in the hierarchy, this could cause major disruptions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

djangotextsplitter-1.2.1.tar.gz (560.8 kB view hashes)

Uploaded Source

Built Distribution

djangotextsplitter-1.2.1-py3-none-any.whl (61.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page