Command-line interface with state-of-the-art neural network language models
Project description
Language Model Zoo
The Language Model Zoo is an open-source repository of state-of-the-art
language models, designed to support black-box access to model predictions and
representations. It provides the command line tool lm-zoo
, a standard
interface for interacting with language models.
You can use lm-zoo
to
- compute language model predictions at the word level,
- extract token-level surprisal data (popularly used in psycholinguistic experiments), and
- preprocess corpora according to a language model's particular tokenization standards.
Quick links:
Getting started
Running language models from this repository requires Docker.
You can install the lm-zoo
via pip
:
$ pip install lm-zoo
List available language models:
$ lm-zoo list
gpt2
Image URI: docker.io/cpllab/language-models:gpt2
Full name: None
Reference URL: https://openai.com/blog/better-language-models/
Maintainer: None
Last updated: None
RNNG
Image URI: docker.io/cpllab/language-models:rnng
Full name: None
Reference URL: TODO
Maintainer: None
Last updated: None
ordered-neurons
Image URI: docker.io/cpllab/language-models:ordered-neurons
Full name: None
Reference URL: https://github.com/yikangshen/Ordered-Neurons
Maintainer: None
Last updated: None
...
Tokenize some text according to a language model's standard:
$ wget https://cpllab.github.io/lm-zoo/metamorphosis.txt -O metamorphosis.txt
$ lm-zoo tokenize gpt2 metamorphosis.txt
Pulling latest Docker image for cpllab/language-models:gpt2.
One Ġmorning , Ġwhen ĠGreg or ĠSam sa Ġwoke Ġfrom Ġtroubled Ġdreams , Ġhe Ġfound Ġhimself Ġtransformed Ġin Ġhis Ġbed Ġinto Ġa Ġhorrible Ġver min .
He Ġlay Ġon Ġhis Ġarmour - like Ġback , Ġand Ġif Ġhe Ġlifted Ġhis Ġhead Ġa Ġlittle Ġhe Ġcould Ġsee Ġhis Ġbrown Ġbelly , Ġslightly Ġdom ed Ġand Ġdivided Ġby Ġar ches Ġinto Ġstiff Ġsections .
The Ġbed ding Ġwas Ġhardly Ġable Ġto Ġcover Ġit Ġand Ġseemed Ġready Ġto Ġslide Ġoff Ġany Ġmoment .
...
Get token-level surprisals for text data:
$ lm-zoo get-surprisals ngram metamorphosis.txt
sentence_id token_id token surprisal
1 1 one 7.76847
1 2 morning 9.40638
1 3 , 1.05009
1 4 when 7.08489
1 5 gregor 18.8963
1 6 <unk> 4.27466
1 7 woke 19.0607
1 8 from 10.3404
1 9 troubled 17.478
1 10 dreams 10.671
1 11 , 3.39374
1 12 he 5.99193
1 13 found 8.07358
1 14 himself 2.92718
1 15 transformed 16.7328
1 16 in 5.32057
1 17 his 7.26454
1 18 bed 9.78166
1 19 into 8.90954
1 20 a 3.72355
1 21 horrible 14.2477
1 22 <unk> 3.56907
1 23 . 3.90242
1 24 </s> 22.8395
2 1 he 4.43708
2 2 lay 14.1721
...
For more information, see our Quickstart tutorial.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file lm-zoo-1.2.tar.gz
.
File metadata
- Download URL: lm-zoo-1.2.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c43f664f1185398045505dd433ebb6e5a08f562b1fdf97a8fbc04d4e9c88a652 |
|
MD5 | efaf6a4945265ce427972be7f70622a4 |
|
BLAKE2b-256 | 242c891e3676078884019b58bab429c73e39fba9fb5dd67e1fc4eecfb8b958c4 |