Skip to main content

Parcae: Stable Looped Language Models

Project description

Parcae

Parcae

Parcae: Scaling Laws For Stable Looped Language Models
Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, Daniel Y. Fu
Paper: https://arxiv.org/abs/2604.12946

About

Parcae is a new looped architecture, which utilizes a handful of techniques to drastically stabilize training. Parcae enables stable, hassle-free training of looped models, which we use to derive the first scaling laws for looping, finding that compute-optimal training scales looping and data in tandem.

Installation

Just wanna use models off the shelf? We make things easy with a PyPI package to access models (cooming soon). Install the package with the following:

pip install parcae-lm

If you are training models then please clone the github repository

git clone https://github.com/SandyResearch/parcae.git
cd parcae

and then follow the following:

Docker (recommended)

Our launch scripts handle everything automatically. Set PROJECT_DIR and DOCKER_IMAGE at the top of launch_job.slurm or launch_interactive.sh, then:

# Interactive development shell
bash launch_interactive.sh

# Submit a training job
CONFIG=launch_configs/parcae-small-140m.yaml sbatch launch_job.slurm

The Docker image is hosted publicly at ghcr.io/sandyresearch/parcae and will be pulled automatically.

Local

Requires Python 3.11+ and PyTorch 2.4+. Install PyTorch first following pytorch.org, then:

pip install -e .

Usage

Models

We provide three ways to instantiate models: load pretrained weights with from_pretrained, build from a built-in config with create_model, or customize a config before building with create_config.

import parcae_lm

# Load a pretrained model from HuggingFace
model = parcae_lm.from_pretrained("SandyResearch/parcae-140m")

# Create a model from a built-in config
model = parcae_lm.create_model("parcae-small-140m")

# Or get the config, customize it, then build
config = parcae_lm.create_config("parcae-small-140m")
config.mean_recurrence = 16
model = config.construct_model()

Training

Downloading Data

python scripts/download_data.py fineweb-100bt   # FineWeb-Edu 100B tokens
python scripts/download_data.py fineweb-350bt   # FineWeb-Edu 350B tokens
python scripts/download_data.py huginn          # Huginn dataset

Training a Tokenizer

Train a GPT-4 style BPE tokenizer on your data:

python scripts/tok_train.py --data-dir fineweb --output-dir tokenizer/ --vocab-size 32768

Evaluate compression ratios against GPT-2 and GPT-4 tokenizers:

python scripts/tok_eval.py --tokenizer tokenizer/parcae_tokenizer --data-dir fineweb

Launching Training

Training is configured via YAML files in launch_configs/. Available configs:

Config Architecture Parameters
parcae-small-140m.yaml Parcae 140M
parcae-medium-370m.yaml Parcae 370M
parcae-large-770m.yaml Parcae 770M
parcae-xlarge-1_3b.yaml Parcae 1.3B
gpt-small-140m.yaml GPT 140M
gpt-medium-370m.yaml GPT 370M
gpt-large-770m.yaml GPT 770M
gpt-xlarge-1_3b.yaml GPT 1.3B

Single node:

bash runs/run_training.sh launch_configs/parcae-small-140m.yaml parcae-small 8

Multi-node (Slurm):

CONFIG=launch_configs/parcae-large-770m.yaml sbatch launch_job.slurm

Eval

Evaluate models using scripts/eval.py. Supports loading from HuggingFace or local checkpoints.

# Evaluate a pretrained model from HuggingFace
python scripts/eval.py --hf_repo SandyResearch/parcae-140m --eval_tasks core

# Evaluate a local checkpoint
bash runs/run_eval.sh outputs/parcae-small-140m eval_configs/eval-core.yaml 8

# Evaluate validation loss
python scripts/eval.py --hf_repo SandyResearch/parcae-140m --eval_tasks bpb \
    --tasks.bpb.val_data_dir /path/to/val/data

Available eval configs in eval_configs/:

  • eval-core.yaml — Core benchmark suite
  • eval-core-extended.yaml — Extended core benchmarks
  • eval-val-loss.yaml — Validation loss / bits-per-byte
  • eval-lambada.yaml — LAMBADA evaluation

Pretrained Models

Pretrained models are uploaded to Hugging Face: parcae-140m, parcae-370m, parcae-770m, parcae-1.3b, trained on the FineWeb-Edu dataset. Models will be auto-downloaded when using from_pretrained.

These models dimensions are:

Model Parameters Prelude Core Coda Model dim. Recurrence
Parcae-140M 140M 2 2 2 768 8
Parcae-370M 370M 4 4 4 1024 8
Parcae-770M 770M 6 6 6 1280 8
Parcae-1.3B 1.3B 8 8 8 1536 8

Note: these are base models without any form of downstream modification (instruction tuning, etc.).

Replicating Scaling Laws

The sweep scripts in runs/ reproduce the scaling law experiments from the paper. See runs/sweep_recurrence.sh for recurrence scaling and runs/sweep_flops.sh for compute-optimal scaling.

Citations

@misc{prairie2026parcaescalinglawsstable,
      title={Parcae: Scaling Laws For Stable Looped Language Models}, 
      author={Hayden Prairie and Zachary Novack and Taylor Berg-Kirkpatrick and Daniel Y. Fu},
      year={2026},
      eprint={2604.12946},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.12946}, 
}

References

This code-base was built on karpathy/nanochat, seal-rg/recurrent-pretraining, and Lightning-AI/litgpt. While most code has been thoroughly adapted, we greatly appreciate the work that went into developing each of these training libraries.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parcae_lm-0.1.0.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parcae_lm-0.1.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file parcae_lm-0.1.0.tar.gz.

File metadata

  • Download URL: parcae_lm-0.1.0.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for parcae_lm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8cfa8c7f4bf3f451c13a684d2c721ce3dd9f0292d23b8a42f8965fb3cc90313d
MD5 4954aacc5808988b0246dbacd7818650
BLAKE2b-256 74f7235a4cb56e7d709138fc500a8c43af0ebb22463c33359778fee55d581264

See more details on using hashes here.

Provenance

The following attestation bundles were made for parcae_lm-0.1.0.tar.gz:

Publisher: pypi.yml on sandyresearch/parcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parcae_lm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: parcae_lm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for parcae_lm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 be15dd65b51edf56b8cee1636a3ea92d172aabcd553435ea1e9c73c0e5d9b2c7
MD5 eee553ec123647513cbc96aa46be2ae8
BLAKE2b-256 18d8b442331f0b9341d4ff5073a78130aac9df9b1ea0de4d9c682c580170e91f

See more details on using hashes here.

Provenance

The following attestation bundles were made for parcae_lm-0.1.0-py3-none-any.whl:

Publisher: pypi.yml on sandyresearch/parcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page