Skip to main content

Language Model Decomposition

Project description

lmd

Code for paper titled "Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models" (accepted EMNLP 2022). The arxiv version is here:

Install

Create virtual env if needed

python3 -m venv .venv
source .venv/bin/activate

Install from pip

pip install nlp.lmd

Install from source

git clone git@github.com:haozhg/lmd.git
cd lmd
pip install -e .

To use lmd cli, run lmd --help or python -m lmd.cli --help

$ lmd --help
usage: Language Model Decomposition [-h] [--target TARGET] [--basis BASIS]
                                    [--tokenizer-name TOKENIZER_NAME]
                                    [--max-seq-length MAX_SEQ_LENGTH]
                                    [--batch-size BATCH_SIZE]
                                    [--dataset-name DATASET_NAME]
                                    [--dataset-config-name DATASET_CONFIG_NAME]
                                    [--val-split-percentage VAL_SPLIT_PERCENTAGE]
                                    [--test-split-percentage TEST_SPLIT_PERCENTAGE]
                                    [--max-train-samples MAX_TRAIN_SAMPLES]
                                    [--max-val-samples MAX_VAL_SAMPLES]
                                    [--max-test-samples MAX_TEST_SAMPLES]
                                    [--preprocessing-num-workers PREPROCESSING_NUM_WORKERS]
                                    [--overwrite_cache OVERWRITE_CACHE]
                                    [--preprocess-dir PREPROCESS_DIR]
                                    [--embedding-dir EMBEDDING_DIR]
                                    [--results-dir RESULTS_DIR]
                                    [--models-dir MODELS_DIR] [--alpha ALPHA]
                                    [--log-level LOG_LEVEL]
                                    [--try-models TRY_MODELS]
                                    [--pre-select-multiplier PRE_SELECT_MULTIPLIER]
                                    [--seed SEED]

Results

To reproduce the results in Appendix B of the paper, run bash scripts/run.sh. The results are also stored in results/128k

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp.lmd-0.2.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlp.lmd-0.2.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file nlp.lmd-0.2.0.tar.gz.

File metadata

  • Download URL: nlp.lmd-0.2.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.5

File hashes

Hashes for nlp.lmd-0.2.0.tar.gz
Algorithm Hash digest
SHA256 066fcbded2df9150b6bd6c3671e39a33b8e7a67af6476408b9559422c10069c5
MD5 05d90768e4e6ca64a3c6dbe8d3dad7b2
BLAKE2b-256 185b8418e2eeee423651b5b24521f6da7307afd8437fd77ed69763b0e06876a0

See more details on using hashes here.

File details

Details for the file nlp.lmd-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: nlp.lmd-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.5

File hashes

Hashes for nlp.lmd-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 497bb055ccdb4b30134ea3eda187e3fefca24256a86ad96dd8a27a2991d12b0e
MD5 8fc3d04d172d720b7815137189c39e25
BLAKE2b-256 a3f27cefeae3a0dddc42bde7350b6167e3cd79b73b940ce4f7cd72a248c1fa28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page