Skip to main content

Megatron LM 11B on Huggingface Transformers

Project description

Megatron 11B

  • Porting of Megatron LM 11B model published on facebook on Huggingface Transformers.
  • This repo contains the model's code, checkpoints and parallelization examples.

Installation

pip install megatron-11b

Usage

1. Tokenizer

  • The usage of tokenizer is the same as other tokenizers of the existing Huggingface.
  • BOS and EOS token are automatically attached, so if you want to use it as a prompt, please exclude EOS token (using [:-1])
from megatron_11b import MegatronTokenizer

tokenizer = MegatronTokenizer.from_pretrained("hyunwoongko/megatron-11B")
tokens = tokenizer.encode("Kevin is")
# [0, 21910, 16] ---> include EOS
tokens = tokenizer.encode("Kevin is")[:, :-1]
# [0, 21910, 16, 2] ---> exclude EOS

2. Model

  • We currently support the CausalLM model and the SequenceClassification model.
  • The usage of model is also the same as other models of the existing Huggingface.
from megatron_11b import MegatronForCausalLM, MegatronForSequenceClassification

model_clm = MegatronForCausalLM.from_pretrained("hyunwoongko/megatron-11B")
model_clf = MegatronForSequenceClassification.from_pretrained("hyunwoongko/megatron-11B")

3. Generation

from megatron_11b import MegatronForCausalLM, MegatronTokenizer

tokenizer = MegatronTokenizer.from_pretrained("hyunwoongko/megatron-11B")
model = MegatronForCausalLM.from_pretrained("hyunwoongko/megatron-11B").half().cuda()

inputs = "Kevin is"
inputs = tokenizer.encode(inputs, return_tensors="pt").cuda()[:, :-1]  # exclude EOS

output = model.generate(inputs, num_beams=5, no_repeat_ngram_size=4, repetition_penalty=1.2)
print(tokenizer.batch_decode(output))
  • output of generation.
<s>Kevin is a great guy.</s>

4. Model Parallelism

  • Currently, I'm preparing an opensource called Parallelformers that can parallelize all models of Huggingface Transformers.
  • I plan to support model parallelization through this library. (maybe I can release it next month)
  • The relevant code can be found via MegatronPolicy object below.
from parallelformers.polices.base import Policy, Layer
from parallelformers.utils.dist_utils import AllReduceLinear
from megatron_11b.modeling_megatron import MegatronDecoderLayer


class MegatronPolicy(Policy):

    @staticmethod
    def replace_arguments(config, world_size):
        return {
            # 1. reduce hidden size
            "self_attn.embed_dim": config.d_model // world_size,

            # 2. reduce number of heads
            "self_attn.num_heads": config.encoder_attention_heads // world_size,
        }

    @staticmethod
    def attn_qkv():
        return [
            Layer(
                weight="self_attn.q_proj.weight",
                bias="self_attn.q_proj.bias",
            ),
            Layer(
                weight="self_attn.k_proj.weight",
                bias="self_attn.k_proj.bias",
            ),
            Layer(
                weight="self_attn.v_proj.weight",
                bias="self_attn.v_proj.bias",
            ),
        ]

    @staticmethod
    def attn_out():
        return [
            Layer(
                weight="self_attn.out_proj.weight",
                bias="self_attn.out_proj.bias",
                replace=AllReduceLinear,
            ),
        ]

    @staticmethod
    def mlp_in():
        return [
            Layer(
                weight="fc1.weight",
                bias="fc1.bias",
            ),
        ]

    @staticmethod
    def mlp_out():
        return [
            Layer(
                weight="fc2.weight",
                bias="fc2.bias",
                replace=AllReduceLinear,
            ),
        ]

    @staticmethod
    def original_layer_class():
        return MegatronDecoderLayer



References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

megatron_11b-1.1-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file megatron_11b-1.1-py3-none-any.whl.

File metadata

  • Download URL: megatron_11b-1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.3

File hashes

Hashes for megatron_11b-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7f9f2f9ac01394be0fdf1b1a86128fcfd096fd6595ddbe6f198207b73c3da5f8
MD5 fe2b0601919c2488bbce3d58f1a4e25e
BLAKE2b-256 e9d597e0af735f9887265b714c7244b3efde661ff4c2fb5870b55c1cfc343efd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page