Fast Triton-based implementations for RWKV

These details have not been verified by PyPI

Project links

Homepage

Project description

RWKV-FLA

This repo aims at providing Triton kernel for RWKV models. RWKV is a brand new network architecture that integrates the advantages of transformers and RNNs, and can be used for a variety of natural language processing tasks. Also, RWKV is the state-of-the-art RNN model.

This project implements multi-level state chain differentiation for RWKV6, efficient differentiation of all input parameters, while maintaining high computational precision (both bf16 and fp32). Currently, it does not consider pure fp16 variants such as RWKV x060c.

Some benchmarks (chunk_rwkv6(fla) vs CUDA kernel)

Since the project is under active development, the calculated times may differ.

fused_recurrent_rwkv6 will be much slower!

Test Case	Implementation	Forward Time	Backward Time
Test Case 1: B=8, T=4096, C=4096, HEAD_SIZE=64	CUDA BF16	9.69 ms	46.41 ms
	FLA BF16	13.06 ms	40.79 ms
Test Case 2: B=32, T=4096, C=4096, HEAD_SIZE=64	CUDA BF16	32.80 ms	148.05 ms
	FLA BF16	50.17 ms	162.42 ms
Test Case 3: B=8, T=4096, C=4096, HEAD_SIZE=128	CUDA BF16	12.01 ms	65.68 ms
	FLA BF16	14.18 ms	51.36 ms
Test Case 4: B=8, T=4096, C=4096, HEAD_SIZE=256	CUDA BF16	40.82 ms	225.59 ms
	FLA BF16	19.34 ms	72.03 ms
Test Case 5: B=16, T=4096, C=4096, HEAD_SIZE=128	CUDA BF16	20.56 ms	109.76 ms
	FLA BF16	27.72 ms	102.35 ms
Test Case 6: B=16, T=4096, C=4096, HEAD_SIZE=256	CUDA BF16	61.54 ms	344.85 ms
	FLA BF16	38.24 ms	144.12 ms

from fla.ops.rwkv6 import chunk_rwkv6, fused_recurrent_rwkv6, native_recurrent_rwkv6
@torch.compile(fullgraph=True)
# torch.compiler introduces errors in numerical precision (torch 2.4)
def RUN_FLA_CHUNK(B, T, C, H, r, k, v, w, u, h, scale=1.0, chunk_size=32):
    r = r.view(B,T,H,-1).transpose(1,2)
    k = k.view(B,T,H,-1).transpose(1,2)
    v = v.view(B,T,H,-1).transpose(1,2)
    # u can be 3d or 2d (B, H, -1) or just (H, -1) to save VRAM
    w = -torch.exp(w.view(B,T,H,-1).transpose(1,2))
    # change to scale=-1.0 when using fp16, this will apply scale to r and k.
    o, final_state = chunk_rwkv6(r, k, v, w, u=u, scale=scale, initial_state=h, 
        output_final_state=True, chunk_size=chunk_size)
    return o.transpose(1,2).reshape(B,T,C), final_state

This repo aims at providing a collection of efficient Triton-based implementations for state-of-the-art linear attention models. Any pull requests are welcome!

News
Models
Installation
Usage
Evaluations
Benchmarks
Citation

News

[2024-12]: :loudspeaker: fla now officially supports kernels with variable-length inputs.
[2024-11]: The inputs are now switched from head-first to seq-first format.
[2024-11]: :rocket: fla now provides a flexible way for training hybrid models.
[2024-10]: :fire: Announcing flame, a minimal and scalable framework for training fla models. Check out the details here.
[2024-09]: fla now includes a fused linear and cross-entropy layer, significantly reducing memory usage during training.
[2024-09]: :tada: Add GSA implementation to fla (paper).
[2024-05]: :tada: Add DeltaNet implementation to fla (paper).
[2024-05]: :rocket: fla v0.1: a variety of subquadratic kernels/layers/models integrated (RetNet/GLA/Mamba/HGRN/HGRN2/RWKV6, etc., see Models).
[2023-12]: :tada: Launched fla, offering a collection of implementations for state-of-the-art linear attention models.

Models

Roughly sorted according to the timeline supported in fla

Date	Model	Title	Paper	Code	`fla` impl
2023-07	RetNet	Retentive network: a successor to transformer for large language models	arxiv	official	code
2023-12	GLA	Gated Linear Attention Transformers with Hardware-Efficient Training	arxiv	official	code
2023-12	Based	An Educational and Effective Sequence Mixer	blog	official	code
2024-01	Rebased	Linear Transformers with Learnable Kernel Functions are Better In-Context Models	arxiv	official	code
2021-02	Delta Net	Linear Transformers Are Secretly Fast Weight Programmers	arxiv	official	code
2021-10	ABC	Attention with Bounded-memory Control	arxiv		code
2023-09	HGRN	Hierarchically Gated Recurrent Neural Network for Sequence Modeling	openreview	official	code
2024-04	HGRN2	HGRN2: Gated Linear RNNs with State Expansion	arxiv	official	code
2024-04	RWKV6	Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence	arxiv	official	code
2024-06	Samba	Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling	arxiv	official	code
2024-05	Mamba2	Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality	arxiv	official	code
2024-09	GSA	Gated Slot Attention for Efficient Linear-Time Sequence Modeling	arxiv	official	code

Installation

The following requirements should be satisfied

PyTorch >= 2.0 (>=2.4 is the best choice)
Triton >=2.2 (3.0 is the best choice)
einops

As fla is actively developed now, you should alwayd check for latest version pip install --upgrade rwkv-fla triton

Or you can install if with pip install rwkv-fla[cuda], pip install rwkv-fla[xpu], pip install rwkv-fla[rocm]

If you do need to use fla ops/modules and contemplate further explorations, an alternative way is to install the package from source

pip install -U git+https://github.com/TorchRWKV/flash-linear-attention

pip install -U git+https://gitee.com/uniartisan2018/flash-linear-attention

or manage fla with submodules

git submodule add https://github.com/TorchRWKV/flash-linear-attention.git 3rdparty/rwkv-fla
ln -s 3rdparty/rwkv-fla/fla fla

[!CAUTION] If you're not working with Triton v2.2 or its nightly release, it's important to be aware of potential issues with the FusedChunk implementation, detailed in this issue. You can run the test python tests/test_fused_chunk.py to check if your version is affected by similar compiler problems. While we offer some fixes for Triton<=2.1, be aware that these may result in reduced performance.

For both Triton 2.2 and earlier versions (up to 2.1), you can reliably use the Chunk version (with hidden states materialized into HBMs). After careful optimization, this version generally delivers high performance in most scenarios.

Acknowledgments

The rwkv-fla project is a fork of the fla project. We extend our sincere gratitude to the original maintainers for their tremendous efforts and contributions. This project builds upon the work described in:

@software{yang2024fla,
  title  = {FLA: A Triton-Based Library for Hardware-Efficient Implementations of Linear Attention Mechanism},
  author = {Yang, Songlin and Zhang, Yu},
  url    = {https://github.com/sustcsonglin/flash-linear-attention},
  month  = jan,
  year   = {2024}
}

Their innovative work and expertise laid the foundation for the development of rwkv-fla.

Models

Date	Model	Title	Paper	Code	FLA impl
2023-07	RetNet (@MSRA@THU)	Retentive network: a successor to transformer for large language models	[arxiv]	[official] [RetNet]	code
2023-12	GLA (@MIT@IBM)	Gated Linear Attention Transformers with Hardware-Efficient Training	[arxiv]	[official]	code
2023-12	Based (@Stanford@Hazyresearch)	An Educational and Effective Sequence Mixer	[blog]	[official]	code
2024-01	Rebased	Linear Transformers with Learnable Kernel Functions are Better In-Context Models	[arxiv]	[official]	code
2021-02	Delta Net	Linear Transformers Are Secretly Fast Weight Programmers	[arxiv]	[official]	code
2023-09	Hedgehog (@HazyResearch)	The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry	openreview		code
2023-10	PolySketchFormer (@CMU@Google)	Fast Transformers via Sketching Polynomial Kernels	arxiv		TODO
2023-07	TransnormerLLM	A Faster and Better Large Language Model with Improved TransNormer (@Shanghai AI Lab)	openreview arxiv	[official] [Lightning2]	TODO
2023-05	RWKV-v4 (@BlinkDL)	Reinventing RNNs for the Transformer Era	arxiv	[official]	TODO
2023-10	GateLoop	Fully Data-Controlled Linear Recurrence for Sequence Modeling	openreview arxiv	[official] [jax]	TODO
2021-10	ABC (@UW)	Attention with Bounded-memory Control	arxiv		code
2023-09	VQ-transformer	Linear-Time Transformers via Vector Quantization	arxiv	[official]	TODO
2023-09	HGRN	Hierarchically Gated Recurrent Neural Network for Sequence Modeling	openreview	[official]	code
2024-04	HGRN2	HGRN2: Gated Linear RNNs with State Expansion	arxiv	[official]	code
2024-04	RWKV6	Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence	arxiv	[official]	code
2024-06	Samba	Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling	arxiv	[official]	code
2024-05	Mamba2	Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality	arxiv	[official]	code

Usage

Token Mixing

We provide "token mixing" linear attention layers in fla.layers for you to use. You can replace the standard multihead attention layer in your model with other linear attention layers. Example usage is as follows:

>>> import torch
>>> from fla.layers import MultiScaleRetention
>>> batch_size, num_heads, seq_len, hidden_size = 32, 4, 2048, 1024
>>> device, dtype = 'cuda:0', torch.bfloat16
>>> retnet = MultiScaleRetention(hidden_size=hidden_size, num_heads=num_heads).to(device=device, dtype=dtype)
>>> retnet
MultiScaleRetention(
  (q_proj): Linear(in_features=1024, out_features=1024, bias=False)
  (k_proj): Linear(in_features=1024, out_features=1024, bias=False)
  (v_proj): Linear(in_features=1024, out_features=2048, bias=False)
  (g_proj): Linear(in_features=1024, out_features=2048, bias=False)
  (o_proj): Linear(in_features=2048, out_features=1024, bias=False)
  (g_norm_swish_gate): FusedRMSNormSwishGate(512, eps=1e-05)
  (rotary): RotaryEmbedding()
)
>>> x = torch.randn(batch_size, seq_len, hidden_size).to(device=device, dtype=dtype)
>>> y, *_ = retnet(x)
>>> y.shape
torch.Size([32, 2048, 1024])

We provide the implementations of models that are compatible with 🤗 Transformers library. Here's an example of how to initialize a GLA model from the default configs in fla:

>>> from fla.models import GLAConfig
>>> from transformers import AutoModelForCausalLM
>>> config = GLAConfig()
>>> config
GLAConfig {
  "attn": null,
  "attn_mode": "chunk",
  "bos_token_id": 1,
  "clamp_min": null,
  "conv_size": 4,
  "elementwise_affine": true,
  "eos_token_id": 2,
  "expand_k": 0.5,
  "expand_v": 1,
  "feature_map": null,
  "fuse_cross_entropy": true,
  "fuse_norm": true,
  "hidden_act": "swish",
  "hidden_ratio": 4,
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": null,
  "max_position_embeddings": 2048,
  "model_type": "gla",
  "norm_eps": 1e-06,
  "num_heads": 4,
  "num_hidden_layers": 24,
  "num_kv_heads": null,
  "tie_word_embeddings": false,
  "transformers_version": "4.45.0",
  "use_cache": true,
  "use_gk": true,
  "use_gv": false,
  "use_output_gate": true,
  "use_short_conv": false,
  "vocab_size": 32000
}

>>> AutoModelForCausalLM.from_config(config)
GLAForCausalLM(
  (model): GLAModel(
    (embeddings): Embedding(32000, 2048)
    (layers): ModuleList(
      (0-23): 24 x GLABlock(
        (attn_norm): RMSNorm(2048, eps=1e-06)
        (attn): GatedLinearAttention(
          (q_proj): Linear(in_features=2048, out_features=1024, bias=False)
          (k_proj): Linear(in_features=2048, out_features=1024, bias=False)
          (v_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (g_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (gk_proj): Sequential(
            (0): Linear(in_features=2048, out_features=16, bias=False)
            (1): Linear(in_features=16, out_features=1024, bias=True)
          )
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (g_norm_swish_gate): FusedRMSNormSwishGate(512, eps=1e-06)
        )
        (mlp_norm): RMSNorm(2048, eps=1e-06)
        (mlp): GLAMLP(
          (gate_proj): Linear(in_features=2048, out_features=11264, bias=False)
          (down_proj): Linear(in_features=5632, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
      )
    )
    (norm): RMSNorm(2048, eps=1e-06)
  )
  (lm_head): Linear(in_features=2048, out_features=32000, bias=False)
)

Fused Modules

We offer a collection of fused modules in fla.modules to facilitate faster training:

Rotary Embedding: rotary positional embeddings as adopted by the Llama architecture, a.k.a., Transformer++.
Norm Layers:
- RMSNorm, LayerNorm and GroupNorm
- RMSNormLinear, LayerNormLinear and GroupNormLinear to reduce memory usage of intermediate tensors for improved memory efficiency.
Norm Layers with Gating: combine norm layers with element-wise gating, as used by RetNet/GLA.
Cross Entropy: faster Triton implementation of cross entropy loss.
Linear Cross Entropy: fused linear layer and cross entropy loss to avoid the materialization of large logits tensors. Also refer to implementations by mgmalek and Liger-Kernel.
Linear KL Divergence: fused linear layer and KL divergence loss in a similar vein as CE loss.

Generation

Upon successfully pretraining a model, it becomes accessible for generating text using the 🤗 text generation APIs. In the following, we give a generation example:

>>> import fla
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> name = 'fla-hub/gla-1.3B-100B'
>>> tokenizer = AutoTokenizer.from_pretrained(name)
>>> model = AutoModelForCausalLM.from_pretrained(name).cuda()
>>> input_prompt = "Power goes with permanence. Impermanence is impotence. And rotation is castration."
>>> input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
>>> outputs = model.generate(input_ids, max_length=64)
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

We also provide a simple script here for benchmarking the generation speed. Simply run it by:

$ python -m benchmarks.benchmark_generation \
  --path 'fla-hub/gla-1.3B-100B' \
  --repetition_penalty 2. \
  --prompt="Hello everyone, I'm Songlin Yang"

Prompt:
Hello everyone, I'm Songlin Yang
Generated:
Hello everyone, I'm Songlin Yang.
I am a 20 year old girl from China who is currently studying in the United States of America for my Master degree and also working as an English teacher at school here on campus since last summer (1st semester). My main goal to be able do well with this course so that we can have

Prompt length: 10, generation length: 64
Total prompt processing + decoding time: 4593ms

All of the pretrained models currently available can be found in fla-hub.

>>> from huggingface_hub import list_models
>>> for model in list_models(author='fla-hub'): print(model.id)

Hybrid Models

fla provides a flexible method to incorporate standard attention layers into existing linear attention models. This is easily achieved by specifying the attn argument in the model configuration.

For example, to create a 2-layer Samba model with interleaved Mamba and local attention layers, using a sliding window size of 2048:

>>> from fla.models import SambaConfig
>>> from transformers import AutoModelForCausalLM
>>> config = SambaConfig(num_hidden_layers=2)
>>> config.attn = { 
  'layers': [1], 
  'num_heads': 18, 
  'num_kv_heads': 18,
  'window_size': 2048
}
>>> config
SambaConfig {
  "attn": {
    "layers": [
      1
    ],
    "num_heads": 18,
    "num_kv_heads": 18,
    "window_size": 2048
  },
  "bos_token_id": 1,
  "conv_kernel": 4,
  "eos_token_id": 2,
  "expand": 2,
  "fuse_cross_entropy": true,
  "fuse_norm": true,
  "hidden_act": "silu",
  "hidden_ratio": 4,
  "hidden_size": 2304,
  "initializer_range": 0.02,
  "intermediate_size": 4608,
  "max_position_embeddings": 2048,
  "model_type": "samba",
  "norm_eps": 1e-05,
  "num_hidden_layers": 2,
  "pad_token_id": 0,
  "rescale_prenorm_residual": false,
  "residual_in_fp32": false,
  "state_size": 16,
  "tie_word_embeddings": false,
  "time_step_floor": 0.0001,
  "time_step_init_scheme": "random",
  "time_step_max": 0.1,
  "time_step_min": 0.001,
  "time_step_rank": 144,
  "time_step_scale": 1.0,
  "transformers_version": "4.45.0",
  "use_bias": false,
  "use_cache": true,
  "use_conv_bias": true,
  "vocab_size": 32000
}

>>> AutoModelForCausalLM.from_config(config)
SambaForCausalLM(
  (backbone): SambaModel(
    (embeddings): Embedding(32000, 2304)
    (layers): ModuleList(
      (0): SambaBlock(
        (mixer_norm): RMSNorm(2304, eps=1e-05)
        (mixer): MambaMixer(
          (conv1d): Conv1d(4608, 4608, kernel_size=(4,), stride=(1,), padding=(3,), groups=4608)
          (act): SiLU()
          (in_proj): Linear(in_features=2304, out_features=9216, bias=False)
          (x_proj): Linear(in_features=4608, out_features=176, bias=False)
          (dt_proj): Linear(in_features=144, out_features=4608, bias=True)
          (out_proj): Linear(in_features=4608, out_features=2304, bias=False)
        )
        (mlp_norm): RMSNorm(2304, eps=1e-05)
        (mlp): SambaMLP(
          (gate_proj): Linear(in_features=2304, out_features=12288, bias=False)
          (down_proj): Linear(in_features=6144, out_features=2304, bias=False)
          (act_fn): SiLU()
        )
      )
      (1): SambaBlock(
        (mixer_norm): RMSNorm(2304, eps=1e-05)
        (mixer): Attention(
          (q_proj): Linear(in_features=2304, out_features=2304, bias=False)
          (k_proj): Linear(in_features=2304, out_features=2304, bias=False)
          (v_proj): Linear(in_features=2304, out_features=2304, bias=False)
          (o_proj): Linear(in_features=2304, out_features=2304, bias=False)
          (rotary): RotaryEmbedding()
        )
        (mlp_norm): RMSNorm(2304, eps=1e-05)
        (mlp): SambaMLP(
          (gate_proj): Linear(in_features=2304, out_features=12288, bias=False)
          (down_proj): Linear(in_features=6144, out_features=2304, bias=False)
          (act_fn): SiLU()
        )
      )
    )
    (norm_f): RMSNorm(2304, eps=1e-05)
  )
  (lm_head): Linear(in_features=2304, out_features=32000, bias=False)
)

During inference, you DO NOT need to revise anything for generation! The model will produce output as-is, without any need for additional configurations or modifications.

Evaluations

The lm-evaluation-harness library allows you to easily perform (zero-shot) model evaluations. Follow the steps below to use this library:

Install lm_eval following their instructions.
Run evaluation with:

$ PATH='fla-hub/gla-1.3B-100B'
$ python -m evals.harness --model hf \
    --model_args pretrained=$PATH,dtype=bfloat16 \
    --tasks wikitext,lambada_openai,piqa,hellaswag,winogrande,arc_easy,arc_challenge,boolq,sciq,copa,openbookqa \
    --batch_size 64 \
    --num_fewshot 0 \
    --device cuda \
    --show_config

We've made fla compatible with hf-style evaluations, you can call evals.harness to finish the evaluations. Running the command above will provide the task results reported in the GLA paper.

[!Tip] If you are using lm-evaluation-harness as an external library and can't find (almost) any tasks available, before calling lm_eval.evaluate() or lm_eval.simple_evaluate(), simply run the following to load the library's stock tasks!

>>> from lm_eval.tasks import TaskManager; TaskManager().initialize_tasks()

Benchmarks

We compared our Triton-based RetNet implementation with CUDA-based FlashAttention2, using a batch size of 8, 32 heads, and a head dimension of 128, across different sequence lengths. These tests were conducted on a single A100 80GB GPU, as illustrated in the following graph

# you might have to first install `fla` to enable its import via `pip install -e .`
$ python benchmark_retention.py
Performance:
   seq_len  fused_chunk_fwd  chunk_fwd  parallel_fwd  fused_chunk_fwdbwd  chunk_fwdbwd  parallel_fwdbwd  flash_fwd  flash_fwdbwd
0    128.0         0.093184   0.185344      0.067584            1.009664      1.591296         1.044480   0.041984      0.282624
1    256.0         0.165888   0.219136      0.126976            1.024000      1.596928         1.073152   0.074752      0.413696
2    512.0         0.308224   0.397312      0.265216            1.550336      1.603584         1.301504   0.156672      0.883712
3   1024.0         0.603136   0.747520      0.706560            3.044864      3.089408         3.529728   0.467968      2.342912
4   2048.0         1.191424   1.403904      2.141184            6.010880      6.059008        11.009024   1.612800      7.135232
5   4096.0         2.377728   2.755072      7.392256           11.932672     11.938816        37.792770   5.997568     24.435200
6   8192.0         4.750336   5.491712     26.402817           23.759359     23.952385       141.014023  22.682114     90.619904
7  16384.0         9.591296  10.870784    101.262337           47.666176     48.745472       539.853821  91.346947    346.318848

Performance

Citation

If you find this repo useful, please consider citing our works:

@inproceedings{yang2024gla,
  title     = {Gated Linear Attention Transformers with Hardware-Efficient Training},
  author    = {Yang, Songlin and Wang, Bailin and Shen, Yikang and Panda, Rameswar and Kim, Yoon},
  booktitle = {Proceedings of ICML},
  year      = {2024}
}

@software{yang2024fla,
  title  = {FLA: A Triton-Based Library for Hardware-Efficient Implementations of Linear Attention Mechanism},
  author = {Yang, Songlin and Zhang, Yu},
  url    = {https://github.com/sustcsonglin/flash-linear-attention},
  month  = jan,
  year   = {2024}
}

@inproceedings{yang2024parallelizing,
  title     = {Parallelizing Linear Transformers with the Delta Rule over Sequence Length},
  author    = {Yang, Songlin and Wang, Bailin and Zhang, Yu and Shen, Yikang and Kim, Yoon},
  booktitle = {Proceedings of NeurIPS},
  year      = {2024}
}

@inproceedings{zhang2024gsa,
  title     = {Gated Slot Attention for Efficient Linear-Time Sequence Modeling},
  author    = {Zhang, Yu and Yang, Songlin and Zhu, Ruijie and Zhang, Yue and Cui, Leyang and Wang, Yiqiao and Wang, Bolun and Shi, Freda and Wang, Bailin and Bi, Wei and Zhou, Peng and Fu, Guohong},
  booktitle = {Proceedings of NeurIPS},
  year      = {2024}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.7.202508221413

Aug 22, 2025

0.7.202507281053

Jul 28, 2025

0.7.202507151200

Jul 15, 2025

0.7.202507131615

Jul 13, 2025

0.7.202507130905

Jul 13, 2025

0.7.202507080322

Jul 8, 2025

0.7.202506181413

Jun 18, 2025

0.7.202506171407

Jun 17, 2025

0.7.202506151651

Jun 15, 2025

0.7.202506151351

Jun 15, 2025

0.7.202506150816

Jun 15, 2025

0.7.202506141151

Jun 14, 2025

0.7.202506141148

Jun 14, 2025

0.7.202506140829

Jun 14, 2025

0.7.202506070752

Jun 7, 2025

0.7.202506070311

Jun 7, 2025

0.7.202506050612

Jun 5, 2025

0.7.202506041155

Jun 4, 2025

0.7.202505261222

May 26, 2025

0.7.202505251203

May 25, 2025

0.7.202505121157

May 12, 2025

0.7.202505121119

May 12, 2025

0.7.202505051657

May 5, 2025

0.7.202505010918

May 1, 2025

0.7.202504301232

Apr 30, 2025

0.7.202504241536

Apr 24, 2025

0.7.202504241316

Apr 24, 2025

0.7.202504210826

Apr 21, 2025

0.7.202504171130

Apr 17, 2025

0.7.202504161220

Apr 16, 2025

0.7.202504150708

Apr 15, 2025

0.7.202504121539

Apr 12, 2025

0.7.202504021545

Apr 2, 2025

0.7.202503140658

Mar 14, 2025

0.7.202503131137

Mar 13, 2025

0.7.202503111020

Mar 11, 2025

0.7.202503111017

Mar 11, 2025

0.7.202503111000

Mar 11, 2025

0.7.202503101419

Mar 10, 2025

0.7.202503101103

Mar 10, 2025

0.7.202503101102

Mar 10, 2025

0.7.202503101055

Mar 10, 2025

0.7.202503020902

Mar 2, 2025

0.7.202503010653

Mar 1, 2025

0.7.202503010351

Mar 1, 2025

0.7.202503010334

Mar 1, 2025

0.7.202502281638

Feb 28, 2025

0.7.202502281620

Feb 28, 2025

0.7.202502281213

Feb 28, 2025

0.7.202502260828

Feb 26, 2025

0.7.202502260810

Feb 26, 2025

0.7.202502251536

Feb 25, 2025

0.7.202502241358

Feb 24, 2025

0.7.202502241353

Feb 24, 2025

0.7.202502220932

Feb 22, 2025

0.7.202502201402

Feb 20, 2025

0.7.202502201334

Feb 20, 2025

0.7.202502190842

Feb 19, 2025

0.7.202502171252

Feb 17, 2025

0.7.202502170943

Feb 17, 2025

0.7.202502170753

Feb 17, 2025

0.7.202502170748

Feb 17, 2025

0.7.202502141506

Feb 14, 2025

0.7.202502121403

Feb 12, 2025

0.7.202502110847

Feb 11, 2025

0.7.202502110622

Feb 11, 2025

0.7.202502101320

Feb 10, 2025

0.7.202502070734

Feb 7, 2025

0.7.202502070728

Feb 7, 2025

0.7.202501301547

Jan 30, 2025

0.7.202501301520

Jan 30, 2025

0.7.202501211312

Jan 21, 2025

0.7.202501210328

Jan 21, 2025

0.7.202501200744

Jan 20, 2025

0.1.202502070725

Feb 7, 2025

This version

0.1.202501171350

Jan 17, 2025

0.1.202501171155

Jan 17, 2025

0.1.202501171152

Jan 17, 2025

0.1.202501031305

Jan 3, 2025

0.1.202412250150

Dec 25, 2024

0.1.202412021508

Dec 2, 2024

0.1.202412021443

Dec 2, 2024

0.1.202412021435

Dec 2, 2024

0.1.202411291119

Nov 29, 2024

0.1.202411261239

Nov 26, 2024

0.1.202411250601

Nov 25, 2024

0.1.202411250557

Nov 25, 2024

0.1.202411240436

Nov 24, 2024

0.1.202411240422

Nov 24, 2024

0.1.202411240356

Nov 24, 2024

0.1.202411240112

Nov 24, 2024

0.1.202410200535

Oct 20, 2024

0.1.202410121111

Oct 12, 2024

0.1.202409231131

Sep 23, 2024

0.1.dev202409111040 pre-release

Sep 11, 2024

0.1.dev202409110926 pre-release

Sep 11, 2024

0.1.dev202409090235 pre-release

Sep 9, 2024

0.1.dev202409062214 pre-release

Sep 6, 2024

0.1.dev202409061536 pre-release

Sep 6, 2024

0.1.dev202409060642 pre-release

Sep 5, 2024

0.1.dev202409052046 pre-release

Sep 5, 2024

0.1.dev20240905 pre-release

Sep 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rwkv_fla-0.1.202501171350.tar.gz (463.2 kB view details)

Uploaded Jan 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rwkv_fla-0.1.202501171350-py3-none-any.whl (704.3 kB view details)

Uploaded Jan 17, 2025 Python 3

File details

Details for the file rwkv_fla-0.1.202501171350.tar.gz.

File metadata

Download URL: rwkv_fla-0.1.202501171350.tar.gz
Upload date: Jan 17, 2025
Size: 463.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for rwkv_fla-0.1.202501171350.tar.gz
Algorithm	Hash digest
SHA256	`5c9b71cac29e92b0f4fa3247c8185000495e2c1c458e3df7d1dc7f1276ae35fb`
MD5	`cb5d680d911abbac79cc7c4304ebd60b`
BLAKE2b-256	`20ac02b7598b9369c5f2f2ea812b5a8376de561667d4c433f4245dd46999427b`

See more details on using hashes here.

File details

Details for the file rwkv_fla-0.1.202501171350-py3-none-any.whl.

File metadata

Download URL: rwkv_fla-0.1.202501171350-py3-none-any.whl
Upload date: Jan 17, 2025
Size: 704.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for rwkv_fla-0.1.202501171350-py3-none-any.whl
Algorithm	Hash digest
SHA256	`06e6f84b4bc9a3ae10485f10ac1790e53fed17a21bed7b0ccf8fedf784f7cae9`
MD5	`599d75d9d6682119ea79737001ad14f2`
BLAKE2b-256	`ba52d72b3c15377bcb00e3e78b752c5db38346dd1e2ac4fa2cf78ca7a40c90f3`

See more details on using hashes here.

rwkv-fla 0.1.202501171350

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RWKV-FLA

Table of Contents

News

Models

Installation

Acknowledgments

Models

Usage

Token Mixing

Fused Modules

Generation

Hybrid Models

Evaluations

Benchmarks

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes