INSPIRE package aimed to automatically classify the new papers that are added to INSPIRE, such as if they are core or not.

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

Inspire Classifier

About

INSPIRE package aimed to automatically classify the new papers that are added to INSPIRE, such as if they are core or not.

The current implementation uses the ULMfit approach. Universal Language Model Fine-tuning, is a method for training text classifiers by first pre-training a language model on a large corpus to learn general language features (in this case a pre-loaded model, which was trained using the WikiText-103 dataset is used). The pre-trained model is then fine-tuned on the title and abstract of the INSPIRE dataset before training the classifier on top.

Package Usage

from inspire_classifier import Classifier

classifier = Classifier(model_path="PATH/TO/MODEL.h5")

title = "Search for new physics in high-energy particle collisions"
abstract = "We present results from a search for beyond..."

result = classifier.predict_coreness(title, abstract)
print(result) --> {'prediction': 'core', 'scores': {'rejected': 0.1, 'non_core': 0.3, 'core': 0.6}}

Installation for local usage and Training:

Install and activate python 3.11 environment (for example using pyenv)
Install poetry: pip install poetry==1.8.3
Run poetry install: poetry install

Train new classifier model

1. Gather training data

Set the environment variables for inspire-prod es database and run the create_dataset.py file, passing the range of years. This will create a inspire_classifier_dataset.pkl, containing the label (core, non-core, rejected) as well as the title and abstract of the fetched records. This data will be used in the next step to train the model. Make sure the generated file is called inspire_classifier_dataset.pkl!

export ES_USERNAME=XXXX
export ES_PASSWORD=XXXX

poetry run python scripts/create_dataset.py --year-from $YEAR_FROM --month-from $MONTH_FROM --year-to $YEAR_TO --month-to $MONTH_TO

($MONTH_FROM and $MONTH_TO are optional parameters)

2. Run training and validate model

The train_classifier.py script will run the commands to train and validate a new model. Configurations changes like the amount of training epochs as well as the train-test split can be adjusted here. In short, the script first splits the pkl file from the first step into a training and a test dataset inside the classifier/data folder. The training set is then used to train the model, while the test set is used to evaluate the model after the training is finished. The model will be saved into classifier/models/language_model/finetuned_language_model_encoder.h5

poetry run python scripts/train_classifier.py

3. Upload the model to CERN S3

In order to use the new model in production upload it to CERN S3 and follow this writeup

Project details

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

2.0.5

Oct 27, 2025

2.0.4

Oct 24, 2025

2.0.3

Oct 24, 2025

2.0.2

Oct 24, 2025

2.0.1

Oct 23, 2025

2.0.0

Oct 23, 2025

0.1.12

Jan 18, 2019

0.1.11

Jan 18, 2019

0.1.10

Dec 14, 2018

0.1.9

Dec 13, 2018

0.1.8

Nov 6, 2018

0.1.7

Nov 5, 2018

0.1.6

Oct 26, 2018

0.1.5

Oct 26, 2018

0.1.4

Oct 25, 2018

0.1.3

Oct 24, 2018

0.1.2

Oct 24, 2018

0.1.1

Oct 24, 2018

0.1.0

Oct 24, 2018

0.0.21

Oct 4, 2018

0.0.20

Oct 4, 2018

0.0.19

Oct 4, 2018

0.0.18

Oct 10, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inspire_classifier-2.0.5.tar.gz (9.3 kB view details)

Uploaded Oct 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

inspire_classifier-2.0.5-py3-none-any.whl (13.1 kB view details)

Uploaded Oct 27, 2025 Python 3

File details

Details for the file inspire_classifier-2.0.5.tar.gz.

File metadata

Download URL: inspire_classifier-2.0.5.tar.gz
Upload date: Oct 27, 2025
Size: 9.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inspire_classifier-2.0.5.tar.gz
Algorithm	Hash digest
SHA256	`2e10b160b3e9db54c0e14124074fced2c777bdad1d81a7fd45964c5ee0eae21a`
MD5	`64db65100a6e09df24b05b69f6940457`
BLAKE2b-256	`b339cd5baa1c1b77810c95e122f55c45d47aac99f1fde1ecc022fd1ffc2ce146`

See more details on using hashes here.

File details

Details for the file inspire_classifier-2.0.5-py3-none-any.whl.

File metadata

Download URL: inspire_classifier-2.0.5-py3-none-any.whl
Upload date: Oct 27, 2025
Size: 13.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inspire_classifier-2.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4013e3a7019b8b55ef59ad0f81474e4285af8dd3407076017b9338a42081dbd`
MD5	`184e494f469b16eddd30b8122fbb0b60`
BLAKE2b-256	`10ae1293fa3cc868a910640f5315a4681deb68ff2fd387965eb4f56ec4c7b77a`

See more details on using hashes here.

inspire-classifier 2.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Inspire Classifier

About

Package Usage

Installation for local usage and Training:

Train new classifier model

1. Gather training data

2. Run training and validate model

3. Upload the model to CERN S3

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes