sequifier

Train a transformer model with the command line

These details have not been verified by PyPI

Project links

Project description

What is sequifier?

Sequifier is the library that make prototyping autoregressive transformer models for sequence modelling easy, reliable and comparable.

The process looks like this:

Motivation

Researchers, data scientists and ml scientists can take their sequential data sets, transform them into a standardized format, and from then use sequifier and configuration files to develop a model for these sequential data. These models can be applied to a test set, and be used to extrapolate these sequences through autoregression for an arbitrary number of steps. This should enable much faster development and evaluation cycles of generative transformer models across domains.

Importantly, sequifier works for an arbitrary number of categorical and real valued input and output columns, and can therefore represent a large set of possible mappings from inputs to outputs. The input and output columns do not have to be identical.

The standardized implementation of a decoder-only autorgressive transformer saves the work of implementing this model and the workflows around it repeatedly, across different domains and data sets, thereby reducing duplicate work and the probability of bugs and compromised results.

The standardized configuration enables easier experimentation and experiment tracking, and, if results are shared, an ever-improving basis for decision making on the initial configuration when applying the transformer architecture to a new problem.

Overall, it should be possible, even for non-experts in machine learning, to develop an initial prototype for a transformer model for a new domain. If the results are promising, it might become necessary to implement architecture variants that fall outside the scope of sequifier, but with a much cheaper (in terms of time and effort) initial exploration, many more potential application domains can be investigated.

Sequifier can also be used to train and infer forward-looking embedding models. These models output the activations of the last shared layer of the transformer, which encapsulate the information contained by the sequence so far that is useful in predicting the next time step.

Data Formats

The basic data format that is used as input to the library takes the following form:

sequenceId	itemPosition	column1	column2	...
0	0	"high"	12.3	...
0	1	"high"	10.2	...
...	...	...	...	...
1	0	"medium"	20.6	...
...	...	...	...	...

The two columns "sequenceId" and "itemPositions" have to be present, and then there must be at least one feature column. There can also be many feature columns, and these can be categorical or real valued.

Data of this input format can be transformed into the format that is used for model training and inference, which takes this form:

sequenceId	subsequenceId	startItemPosition	columnName	[Subsequence Length]	[Subsequence Length - 1]	...	0
0	0	0	column1	"high"	"high"	...	"low"
0	0	0	column2	12.3	10.2	...	14.9
...	...	...	...	...	...	...	...
1	0	15	column1	"medium"	"high"	...	"medium"
1	0	15	column2	20.6	18.5	...	21.6
...	...	...	...	...	...	...	...

On inference, the output is returned in the library input format, introduced first.

sequenceId	itemPosition	column1	column2	...
0	963	"medium"	8.9	...
0	964	"low"	6.3	...
...	...	...	...	...
1	732	"medium"	14.4	...
...	...	...	...	...

There are four standalone commands within sequifier: make, preprocess, train and infer. make sets up a new sequifier project in a new folder, preprocess preprocesses the data from the input format into subsequences of a fixed length, train trains a model on the preprocessed data, and infer generates outputs from data in the preprocessed format and outputs it in the initial input format.

The input data can be a single csv or parquet file, or a folder of csv or parquet files. The preprocessing output can be a csv or parquet file per split, or a folder of multiple torch tensor (pt) files per split. The training step does not output any data files (it outputs model files and logs). The inference output can be a single csv or parquet file, or a folder of csv and parquet files. In general, it is recommended to store every step as a single file if the initial input is a single file, and a folder of files if the initial data is a folder of files. For the folder "flow", the preprocessing step write format has to be "pt".

Other materials

To get more details on the specific configuration options, go to these docs page.

If you want to first get a more specific understanding of the transformer architecture, have a look at the Wikipedia article.

If you want to see a benchmark on a small synthetic dataset with 10k cases, agains a random forest, an xgboost model and a logistic regression, check out this notebook.

Complete example how to build and apply a transformer sequence classifier with sequifier

create a conda environment with python >=3.9 activate and run

pip install sequifier

To create the project folder with the config templates in the configs subfolder, run

sequifier make YOUR_PROJECT_NAME

cd into the YOUR_PROJECT_NAME folder, create a data folder and add your data and adapt the config file preprocess.yaml in the configs folder to take the path to the data
run

sequifier preprocess

the preprocessing step outputs a "data driven config" at configs/ddconfigs/[FILE NAME]. It contains the number of classes found in the data, a map of classes to indices and the oaths to train, validation and test splits of data. Adapt the dd_config parameter in train.yaml and infer.yaml in to the path configs/ddconfigs/[FILE NAME]
Adapt the config file train.yaml to specify the transformer hyperparameters you want and run

sequifier train

adapt data_path in infer.yaml to one of the files output in the preprocessing step
run

sequifier infer

find your predictions at [PROJECT PATH]/outputs/predictions/sequifier-default-best-predictions.csv

More detailed explanations of the three steps

Preprocessing of data into sequences for training

sequifier preprocess --config_path=[CONFIG PATH]

The config path specifies the path to the preprocessing config and the project path the path to the (preferably empty) folder the output files of the different steps are written to.

The default config can be found on this path:

configs/preprocess.yaml

Configuring and training the sequence classification model

The training step is executed with the command:

sequifier train --config_path=[CONFIG PATH]

If the data on which the model is trained DOES NOT come from the preprocessing step, the flag

--on-unprocessed

should be added.

If the training data does not come from the preprocessing step, both train and validation data have to take the form of a csv file with the columns "sequenceId", "subsequenceId", "inputCol", [SEQ LENGTH], [SEQ LENGTH - 1],...,"1", "0". You can find an example of the preprocessing input data at documentation/example_inputs/training_input.csv

The training step is configured using the config. The two default configs can be found here:

configs/train.yaml

depending on whether the preprocessing step was executed.

Inferring on test data using the trained model

Inference is done using the command:

sequifier infer --config_path=[CONFIG PATH]

and configured using a config file. The default version can be found here:

configs/infer.yaml

Distributed Training

Sequifier supports distributed training using torch DistributedDataParallel. To make use of multi gpu support, the write format of the preprocessing step must be set to 'pt'.

Citation

Please cite with:

@software{sequifier_2025,
  author = {Luithlen, Leon},
  title = {sequifier - autoregressive transformer models for multivariate sequence modelling},
  year = {2025},
  publisher = {GitHub},
  version = {0.6.2.8},
  url = {https://github.com/0xideas/sequifier}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.1.4

Apr 13, 2026

1.1.1.3

Apr 10, 2026

1.1.1.2

Apr 9, 2026

1.1.1.1

Apr 2, 2026

1.1.1.0

Mar 10, 2026

1.1.0.6

Mar 6, 2026

1.1.0.5

Feb 27, 2026

1.1.0.4

Feb 25, 2026

1.1.0.1

Feb 6, 2026

1.1.0.0

Feb 6, 2026

1.0.0.6

Jan 23, 2026

1.0.0.5

Dec 17, 2025

1.0.0.4

Dec 12, 2025

1.0.0.3

Dec 4, 2025

1.0.0.2

Nov 29, 2025

1.0.0.1

Nov 29, 2025

1.0.0.0

Nov 25, 2025

0.9.1.0

Nov 10, 2025

This version

0.9.0.4

Nov 7, 2025

0.9.0.3

Nov 7, 2025

0.9.0.2

Nov 7, 2025

0.9.0.1

Nov 7, 2025

0.9.0.0

Nov 7, 2025

0.8.1.0

Oct 28, 2025

0.8.0.10

Oct 27, 2025

0.8.0.9

Oct 24, 2025

0.8.0.0

Oct 22, 2025

0.7.1.5

Oct 22, 2025

0.7.1.0

Oct 20, 2025

0.7.0.0

Oct 20, 2025

0.6.2.8

Aug 14, 2025

0.6.2.7

Aug 12, 2025

0.6.2.6

Jul 29, 2025

0.6.2.5

Jul 29, 2025

0.6.2.4

Jul 22, 2025

0.6.2.3

Jul 16, 2025

0.6.2.2

Jul 16, 2025

0.6.2.1

Jul 16, 2025

0.6.1.9

Jul 14, 2025

0.6.1.8

Jul 14, 2025

0.6.1.7

Jul 14, 2025

0.6.1.6

Jul 1, 2025

0.6.1.5

Jun 30, 2025

0.6.1.4

Jun 30, 2025

0.6.1.3

Jun 26, 2025

0.6.1.2

Jun 25, 2025

0.6.1.1

Jun 25, 2025

0.6.1.0

Jun 20, 2025

0.6.0.2

Jun 19, 2025

0.6.0.0

May 29, 2025

0.5.0.0

Apr 30, 2025

0.4.1.0

Oct 18, 2024

0.4.0.0

Oct 2, 2024

0.3.1.10

Oct 1, 2024

0.3.1.9

Oct 1, 2024

0.3.1.8

Sep 19, 2024

0.3.1.7

Sep 19, 2024

0.3.1.6

Sep 10, 2024

0.3.1.5

Sep 9, 2024

0.3.1.4

Sep 9, 2024

0.3.1.3

Sep 7, 2024

0.3.1.2

Sep 3, 2024

0.3.1.1

Sep 3, 2024

0.3.1.0

Sep 3, 2024

0.3.0.12

Aug 30, 2024

0.3.0.11

Aug 28, 2024

0.3.0.10

Aug 28, 2024

0.3.0.8

Aug 26, 2024

0.3.0.7

Aug 26, 2024

0.3.0.6

Aug 16, 2024

0.3.0.5

Aug 2, 2024

0.3.0.3

Jul 28, 2024

0.3.0.2

Jul 28, 2024

0.3.0.1

Jul 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequifier-0.9.0.4.tar.gz (83.3 kB view details)

Uploaded Nov 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sequifier-0.9.0.4-py3-none-any.whl (90.1 kB view details)

Uploaded Nov 7, 2025 Python 3

File details

Details for the file sequifier-0.9.0.4.tar.gz.

File metadata

Download URL: sequifier-0.9.0.4.tar.gz
Upload date: Nov 7, 2025
Size: 83.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for sequifier-0.9.0.4.tar.gz
Algorithm	Hash digest
SHA256	`38505adfd8a6f73ce87a3494fc0297d512214356d3e11b8ae7602a200b2389e4`
MD5	`be83f0bcafac70fe57e8741d30905c85`
BLAKE2b-256	`b010cc6c31bc7f2e7e7d9b8acba5851f8d4670661ceb068500a88b992c2c1249`

See more details on using hashes here.

File details

Details for the file sequifier-0.9.0.4-py3-none-any.whl.

File metadata

Download URL: sequifier-0.9.0.4-py3-none-any.whl
Upload date: Nov 7, 2025
Size: 90.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for sequifier-0.9.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0ad6378ba231254bd467e10e04d69bd2ca79d5baab6e809e5a2077fd40f96ae`
MD5	`dc06cf3ca44fc630305e7a07bb7e8e61`
BLAKE2b-256	`8785f0bf3ed470c774a646825c3e5cfb12659a4768f99e7463d50ad973e525d3`

See more details on using hashes here.

sequifier 0.9.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is sequifier?

Motivation

Data Formats

Other materials

Complete example how to build and apply a transformer sequence classifier with sequifier

More detailed explanations of the three steps

Preprocessing of data into sequences for training

Configuring and training the sequence classification model

Inferring on test data using the trained model

Distributed Training

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes