Skip to main content

Text utilities, models, transforms, and datasets for PyTorch.

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

CAUTION: As of September 2023 we have paused active development of TorchText because our focus has shifted away from building out this library offering. We will continue to release new versions but do not anticipate any new feature development as we figure out future investments in this space.

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

2.2.0

0.17.0

>=3.8, <=3.11

2.1.0

0.16.0

>=3.8, <=3.11

2.0.0

0.15.0

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.18.0-cp312-cp312-win_amd64.whl (2.0 MB view details)

Uploaded CPython 3.12 Windows x86-64

torchtext-0.18.0-cp312-cp312-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12

torchtext-0.18.0-cp312-cp312-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

torchtext-0.18.0-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.18.0-cp311-cp311-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11

torchtext-0.18.0-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.18.0-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.18.0-cp310-cp310-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10

torchtext-0.18.0-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.18.0-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.18.0-cp39-cp39-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9

torchtext-0.18.0-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.18.0-cp38-cp38-win_amd64.whl (2.0 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.18.0-cp38-cp38-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8

torchtext-0.18.0-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file torchtext-0.18.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 99b5148f77aa5d94adb8d4d5b684181d87673b90ba266d858b1dd8812b418b95
MD5 4a44583f05a25aaf47c5ff620f3fb2dc
BLAKE2b-256 fde3cbdeadc9f32ae807569c6e3f9e61e12e869c243405375bcdd2fb337b65a5

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp312-cp312-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp312-cp312-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 eeebf2ec950c9f9d3b276faf6948e763836c215747354f0340746b32512d11f6
MD5 8766a7d2600522fe9af7b5c825b4f5a2
BLAKE2b-256 651326c37c5433658d3f1eb30be07a4b42b29893bc42ff7cb7261ef6e474fc3c

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fec43696fb6fa7573e740a8175fd69681106574fd1fc840211182d941b88a2ba
MD5 73b6a5ef5c5c8377916c3d1aa8b40605
BLAKE2b-256 8b80c78f88844e576c21580e64bd93de9df9bca223fef3e8195345680eb6f87f

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 1e00475dbf629ba529d27903f2dd6b53c4a559f1483539b8c2a821d393bd24cf
MD5 cfdaf8f3aa774d272532159d598e0acb
BLAKE2b-256 04be03b76f21b87db03101e3dc8055b7e490c2f9cf26b01dabedc15aed15da61

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7ac7a392ae42d8b7675bdb31f1764bec77d4dec3a44bca5a2644c2cee3484453
MD5 17c48f6e53275e8fe643fb38f5dd8297
BLAKE2b-256 130a0d1e5426dbab2171551fe1b53e3ed80f42dbcadf7c28cc3676f0e311dc2f

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0f3855b2ada84f02298e72ad19c1a86f940df2f4ce62d89098955f3ae575d174
MD5 56e789c5824491b82aa3c53ee6e28f9c
BLAKE2b-256 612238bba8f6255f3e58b7f9d66cc988e044ee8cfb4edf0c57f4a7918e7c4aa6

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d4bfe9cb7b08cf7ff3473309d9f24ed243c3a847bfbb2c932925551bf7a05892
MD5 73e31e1008464a63c8bde02f23f49d58
BLAKE2b-256 b23d6f18d551b00bf8babaa3a569d5fd62cba2bd7bbdeaf82167a959352ba56b

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3dc446f74aaa9aebab045fbefd102752675258e72ba447982c65e010e1cfd29a
MD5 86546fa30dac1f2da3e291960d3cc282
BLAKE2b-256 d74f9953b4d4b79917e03c393484ea8ce8f46a4cc1745f272cc371550fb7fc05

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5826d5bbfe84a3c533e7e97659f72dbff73e1614c00c06709607d17c8446e09c
MD5 c546b66425ac236ff6899b6d85ff2815
BLAKE2b-256 cc941e805ef3ec6541de75e8a86c32e00be471d98cdcef5035ad26457bc388cf

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 077639a367e1f77b2c7cefd952ec83c9f830a7568fb49f10cbc100eb965da06b
MD5 dba652453b46e791fe273caa7f59c114
BLAKE2b-256 02b55c5f58b4b35296cb32b7b68ee9ca404487cceca7bd9fffd5d32cbfd5f67e

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8e8d847a5e359718c1a97cab363de93aef93733c102528231f3b36c9cf580ce2
MD5 3de834b68d8a0ace2a91f8a467b67b90
BLAKE2b-256 1d181cc071c71049cc58460e417e6d0ca39b49f65db313218eb7d22a5305f181

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b74b0b1e93ff852a0410bdf2b630f4b00a870ec95be6266e01cd5e19acdf3e95
MD5 527b34ed0b56b2b755d4232a1e8bf01c
BLAKE2b-256 23f4fec80f6bd3ba2ef7e998929b1cb52e44e1d30067918c8976bbabc789838d

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 6dd72c5fbca0680cfef14cb620f8edf7b01e4121916f4b45e2d50f1cdba53fe9
MD5 916f49d4e0df180dde227fcd708664e1
BLAKE2b-256 079ce54dd5f1382f6328c2af5f0525f5132ab4d5933384da39e9569021c660a9

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 0d60cde93217086372e6819806298a327aaa71f1818ff9c54380bbd5995dda78
MD5 be846e10380f66c86691fbfa33529b06
BLAKE2b-256 4d1fd5a981bdd81919dbc950c495d6b8670ed9e995e40c0927127e4a2a3e8fb2

See more details on using hashes here.

File details

Details for the file torchtext-0.18.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.18.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6694b823cb409706a0efe4d6b0ccf6b5be5af695fad29aa062f1f63bd296e77b
MD5 02a6d5ad90e72c00cc49bf3a6f5f3a4c
BLAKE2b-256 677a97421e05c4e6f6e03fbb41a68398c72ebecfdf53910dea91a4f0dcb8813d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page