a Deep Learning Framework for Text

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kermitt2 lfoppiano

These details have not been verified by PyPI

Project description

DeLFT

DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labeling (e.g. named entity tagging, information extraction) and text classification (e.g. comment classification). This library re-implements standard state-of-the-art Deep Learning architectures relevant to text processing tasks.

DeLFT has three main purposes:

Covering text and rich texts: most of the existing Deep Learning works in NLP only consider simple texts as input. In addition to simple texts, we also target rich text where tokens are associated to layout information (font. style, etc.), positions in structured documents, and possibly other lexical or symbolic contextual information. Text is usually coming from large documents like PDF or HTML, and not just from segments like sentences or paragraphs, and contextual features appear very useful. Rich text is the most common textual content used by humans to communicate and work.
Reproducibility and benchmarking: by implementing several references/state-of-the-art models for both sequence labeling and text classification tasks, we want to offer the capacity to easily validate reported results and to benchmark several methods under the same conditions and criteria.
Production level, by offering optimzed performance, robustness and integration possibilities, we aim at supporting better engineering decisions/trade-off and successful production-level applications.

Some contributions include:

A variety of modern NLP architectures and tasks to be used following the same API and input formats, including RNN and transformers.
Reduction of the size of RNN models, in particular by removing word embeddings from them. For instance, the model for the toxic comment classifier went down from a size of 230 MB with embeddings to 1.8 MB. In practice the size of all the models of DeLFT is less than 2 MB, except for Ontonotes 5.0 NER model which is 4.7 MB.
Implementation of a generic support of categorical features, available in various architectures.
Usage of dynamic data generator so that the training data do not need to stand completely in memory.
Efficient loading and management of an unlimited volume of static pre-trained embeddings.
A comprehensive evaluation framework with the standard metrics for sequence labeling and classification tasks, including n-fold cross validation.
Integration of HuggingFace transformers as Keras layers.

A native Java integration of the library has been realized in GROBID via JEP.

The latest DeLFT release 0.4.6 has been tested successfully with Python 3.10/3.11 and TensorFlow 2.17. As always, GPU(s) are required for decent training time. For example, a GeForce GTX 1050 Ti (4GB) is working very well for running RNN models and BERT or RoBERTa base models. Using BERT large model is no problem with a GeForce GTX 1080 Ti (11GB), including training with modest batch size. Using multiple GPUs (training and inference) is supported.

Changes in 0.4.1

Breaking changes

TensorFlow 2.17 / tf_keras 2.17: DeLFT now requires TensorFlow 2.17.1 and the standalone tf_keras 2.17.0 package. All Keras imports have been updated from tensorflow.keras to tf_keras. Pre-trained model weights from 0.3.4 are not directly compatible, but can be converted without retraining:
```
python -m delft.utilities.convert_model --input <old-model-dir> --output <new-model-dir> --verify
```
The converter rebuilds the model architecture from the saved config.json, remaps weights from the old HDF5 file into the fresh model, and saves a new weights file. Additional flags: --redownload-tokenizer (when the saved tokenizer is incompatible with the current transformers version), --force-partial (allow partial conversion when some weights cannot be matched), --dry-run (inspect without writing). Use --help for full options.
Python 3.10+ required: Python 3.8 and 3.9 are no longer supported.
CUDA 12.1 required for GPU: TensorFlow 2.17 requires CUDA 12.1. On Linux, torch is no longer included in the base pip install delft to avoid CUDA version conflicts between torch (CUDA 12.4) and TensorFlow (CUDA 12.1). Use pip install "delft[gpu]" with the PyTorch cu121 index instead (see installation instructions below).
LMDB embedding format changed: Embeddings are now stored as raw float32 bytes instead of pickle-serialized objects. This enables Java interoperability (used by GROBID) and improves performance. Existing LMDB caches must be converted using the provided utility:
```
python -m delft.utilities.convert_lmdb_embeddings --input <old-lmdb-path> --output <new-lmdb-path>
```
ELMo support removed: ELMo embeddings are no longer supported. The use_ELMo parameter has been removed from all application scripts and configurations. Use transformer-based models (BERT, SciBERT, etc.) or static embeddings (GloVe, fastText) instead.

Other changes

Weights & Biases integration for experiment tracking (--wandb flag)
Distributed training support via SLURM scripts
Additional checks for avoiding empty embeddings
Updated default word2vec embedding URL
Updated dependency versions (transformers 4.48, torch 2.5.1, numpy 1.26.4, scikit-learn 1.6.1, pandas 2.2.3)

DeLFT Documentation

Visit the DELFT documentation for detailed information on installation, usage and models.

Using DeLFT

PyPI packages are available for stable versions. Latest stable version is 0.4.1:

# macOS
pip install delft==0.4.1

# Linux with CUDA 12.1 (GPU)
pip install "delft[gpu]==0.4.1" --extra-index-url https://download.pytorch.org/whl/cu121

DeLFT Installation

For installing DeLFT and use the current master version, get the github repo:

git clone https://github.com/kermitt2/delft
cd delft

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

uv venv --python 3.11
source .venv/bin/activate
uv pip install pip

Install the project in editable state:

# macOS (torch is included automatically)
uv pip install -e .

# Linux with CUDA 12.1 (recommended for GPU)
uv pip install -e ".[gpu]" --extra-index-url https://download.pytorch.org/whl/cu121

# Linux with CUDA 12.1 (alternative using requirements file)
uv pip install -e . -r requirements-cuda.txt

See the DELFT documentation for usage.

Send data to Weight and Biases

Create a file .env in the root of the project with the following content:

WANDB_API_KEY=your_api_key
WANDB_PROJECT=your_project_name
WANDB_ENTITY=your_entity_name

use the parameter --wandb when running the scripts, e.g.

python -m applications.delft.grobidTagger date train --architecture BidLSTM --wandb

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started, code style, running tests, and the pull request process.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

If you contribute to DeLFT, you agree to share your contribution following these licenses.

Contact: Patrice Lopez (patrice.lopez@science-miner.com) and Luca Foppiano (@lfoppiano).

How to cite

If you want to this work, please refer to the present GitHub project, together with the Software Heritage project-level permanent identifier. For example, with BibTeX:

@misc{DeLFT,
    title = {DeLFT},
    howpublished = {\url{https://github.com/kermitt2/delft}},
    publisher = {GitHub},
    year = {2018--2026},
    archivePrefix = {swh},
    eprint = {1:dir:54eb292e1c0af764e27dd179596f64679e44d06e}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kermitt2 lfoppiano

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.6

Apr 12, 2026

0.4.5

Mar 23, 2026

0.4.4

Mar 20, 2026

0.4.3

Mar 19, 2026

0.4.2

Mar 8, 2026

0.4.1

Mar 4, 2026

0.4.0

Mar 4, 2026

0.3.5

Mar 4, 2026

0.3.4

Nov 29, 2023

0.3.3

Feb 12, 2023

0.3.2

Jul 24, 2022

0.3.1

Apr 16, 2022

0.3.0

Mar 29, 2022

0.2.8

Jun 25, 2021

0.2.7

Apr 17, 2021

0.2.6

Dec 26, 2020

0.2.5

Dec 21, 2020

0.2.4

Sep 12, 2020

0.2.3

Jun 10, 2019

0.2.2

May 8, 2019

0.2.1

May 8, 2019

0.2.0

Apr 4, 2019

0.1.6

Apr 2, 2019

0.1.5

Apr 2, 2019

0.1.4

Apr 2, 2019

0.1.3

Mar 16, 2019

0.1.2

Mar 16, 2019

0.1.1

Feb 25, 2019

0.1.0

Feb 7, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delft-0.4.6.tar.gz (131.1 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

delft-0.4.6-py3-none-any.whl (157.4 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file delft-0.4.6.tar.gz.

File metadata

Download URL: delft-0.4.6.tar.gz
Upload date: Apr 12, 2026
Size: 131.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for delft-0.4.6.tar.gz
Algorithm	Hash digest
SHA256	`68ce000da8c1ff6e5f4ff4bda01482b0572f83e99c4164aeb0a4f957f20138c8`
MD5	`fe0b63785e919d2bc2fc581479fed4a1`
BLAKE2b-256	`b4b2c6521a22b15584033010adac911e98d4056a35cfbe2569b2f7fd563b0a38`

See more details on using hashes here.

Provenance

The following attestation bundles were made for delft-0.4.6.tar.gz:

Publisher: ci-release.yml on kermitt2/delft

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: delft-0.4.6.tar.gz
- Subject digest: 68ce000da8c1ff6e5f4ff4bda01482b0572f83e99c4164aeb0a4f957f20138c8
- Sigstore transparency entry: 1280669925
- Sigstore integration time: Apr 12, 2026
Source repository:
- Permalink: kermitt2/delft@9e3856b1fafc9609c8eb7bb5df87bc42bd84098b
- Branch / Tag: refs/tags/v0.4.6
- Owner: https://github.com/kermitt2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci-release.yml@9e3856b1fafc9609c8eb7bb5df87bc42bd84098b
- Trigger Event: push

File details

Details for the file delft-0.4.6-py3-none-any.whl.

File metadata

Download URL: delft-0.4.6-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 157.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for delft-0.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07ebd07689dfb451158875884e32b4eca7f7312dc9c97026080741d3f14a8675`
MD5	`103da8e1a8eb7f390c0b352f877723a5`
BLAKE2b-256	`fca710dec6f140a186acebd9971ed7fa04ac1147d573fdb151bda480cb0f2be8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for delft-0.4.6-py3-none-any.whl:

Publisher: ci-release.yml on kermitt2/delft

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: delft-0.4.6-py3-none-any.whl
- Subject digest: 07ebd07689dfb451158875884e32b4eca7f7312dc9c97026080741d3f14a8675
- Sigstore transparency entry: 1280669931
- Sigstore integration time: Apr 12, 2026
Source repository:
- Permalink: kermitt2/delft@9e3856b1fafc9609c8eb7bb5df87bc42bd84098b
- Branch / Tag: refs/tags/v0.4.6
- Owner: https://github.com/kermitt2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci-release.yml@9e3856b1fafc9609c8eb7bb5df87bc42bd84098b
- Trigger Event: push

delft 0.4.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

DeLFT

Changes in 0.4.1

Breaking changes

Other changes

DeLFT Documentation

Using DeLFT

DeLFT Installation

Send data to Weight and Biases

Contributing

License and contact

How to cite

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance