paddlenlp

Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.

These details have not been verified by PyPI

Project links

Homepage

Project description

简体中文🀄 | English🌎

Features | Installation | Quick Start | API Reference | Community
PaddleNLP is a NLP library that is both easy to use and powerful. It aggregates high-quality pretrained models in the industry and provides a plug-and-play development experience, covering a model library for various NLP scenarios. With practical examples from industry practices, PaddleNLP can meet the needs of developers who require flexible customization.

News 📢

2024.01.04 PaddleNLP v2.7: The LLM experience is fully upgraded, and the tool chain LLM entrance is unified. Unify the implementation code of pre-training, fine-tuning, compression, inference and deployment to the PaddleNLP/llm directory. The new LLM Toolchain Documentation provides one-stop guidance for users from getting started with LLM to business deployment and launch. The full breakpoint storage mechanism Unified Checkpoint greatly improves the versatility of LLM storage. Efficient fine-tuning upgrade supports the simultaneous use of efficient fine-tuning + LoRA, and supports QLoRA and other algorithms.
2023.08.15 PaddleNLP v2.6: Release Full-process LLM toolchain , covering all aspects of pre-training, fine-tuning, compression, inference and deployment, providing users with end-to-end LLM solutions and one-stop development experience; built-in 4D parallel distributed Trainer, Efficient fine-tuning algorithm LoRA/Prefix Tuning, Self-developed INT8/INT4 quantization algorithm, etc.; fully supports LLaMA 1/2, BLOOM, ChatGLM 1/2, GLM, OPT and other mainstream LLMs.

Installation

Prerequisites

python >= 3.7
paddlepaddle >= 2.6.0

More information about PaddlePaddle installation please refer to PaddlePaddle's Website.

Python pip Installation

pip install --upgrade paddlenlp

or you can install the latest develop branch code with the following command:

pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

Features

📦 Out-of-Box NLP Toolset

🤗 Awesome Chinese Model Zoo

🎛️ Industrial End-to-end System

🚀 High Performance Distributed Training and Inference

Out-of-Box NLP Toolset

Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG technique, in the meanwhile with extremely fast inference satisfying industrial scenario.

taskflow1

For more usage please refer to Taskflow Docs.

Awesome Chinese Model Zoo

🀄 Comprehensive Chinese Transformer Models

We provide 45+ network architectures and over 500+ pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high-quality Chinese pretrained model developed by other organizations. Use AutoModel API to ⚡SUPER FAST⚡ download pretrained models of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP!

from paddlenlp.transformers import *

ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh')
bert = AutoModel.from_pretrained('bert-wwm-chinese')
albert = AutoModel.from_pretrained('albert-chinese-tiny')
roberta = AutoModel.from_pretrained('roberta-wwm-ext')
electra = AutoModel.from_pretrained('chinese-electra-small')
gpt = AutoModelForPretraining.from_pretrained('gpt-cpm-large-cn')

Due to the computation limitation, you can use the ERNIE-Tiny light models to accelerate the deployment of pretrained models.

# 6L768H
ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh')
# 6L384H
ernie = AutoModel.from_pretrained('ernie-3.0-mini-zh')
# 4L384H
ernie = AutoModel.from_pretrained('ernie-3.0-micro-zh')
# 4L312H
ernie = AutoModel.from_pretrained('ernie-3.0-nano-zh')

Unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc.

import paddle
from paddlenlp.transformers import *

tokenizer = AutoTokenizer.from_pretrained('ernie-3.0-medium-zh')
text = tokenizer('natural language processing')

# Semantic Representation
model = AutoModel.from_pretrained('ernie-3.0-medium-zh')
sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
# Text Classificaiton and Matching
model = AutoModelForSequenceClassification.from_pretrained('ernie-3.0-medium-zh')
# Sequence Labeling
model = AutoModelForTokenClassification.from_pretrained('ernie-3.0-medium-zh')
# Question Answering
model = AutoModelForQuestionAnswering.from_pretrained('ernie-3.0-medium-zh')

Wide-range NLP Task Support

PaddleNLP provides rich examples covering mainstream NLP task to help developers accelerate problem solving. You can find our powerful transformer Model Zoo, and wide-range NLP application examples with detailed instructions.

Also you can run our interactive Notebook tutorial on AI Studio, a powerful platform with FREE computing resource.

PaddleNLP Transformer model summary (click to show details)

Model	Sequence Classification	Token Classification	Question Answering	Text Generation	Multiple Choice
ALBERT	✅	✅	✅	❌	✅
BART	✅	✅	✅	✅	❌
BERT	✅	✅	✅	❌	✅
BigBird	✅	✅	✅	❌	✅
BlenderBot	❌	❌	❌	✅	❌
ChineseBERT	✅	✅	✅	❌	❌
ConvBERT	✅	✅	✅	❌	✅
CTRL	✅	❌	❌	❌	❌
DistilBERT	✅	✅	✅	❌	❌
ELECTRA	✅	✅	✅	❌	✅
ERNIE	✅	✅	✅	❌	✅
ERNIE-CTM	❌	✅	❌	❌	❌
ERNIE-Doc	✅	✅	✅	❌	❌
ERNIE-GEN	❌	❌	❌	✅	❌
ERNIE-Gram	✅	✅	✅	❌	❌
ERNIE-M	✅	✅	✅	❌	❌
FNet	✅	✅	✅	❌	✅
Funnel-Transformer	✅	✅	✅	❌	❌
GPT	✅	✅	❌	✅	❌
LayoutLM	✅	✅	❌	❌	❌
LayoutLMv2	❌	✅	❌	❌	❌
LayoutXLM	❌	✅	❌	❌	❌
LUKE	❌	✅	✅	❌	❌
mBART	✅	❌	✅	❌	✅
MegatronBERT	✅	✅	✅	❌	✅
MobileBERT	✅	❌	✅	❌	❌
MPNet	✅	✅	✅	❌	✅
NEZHA	✅	✅	✅	❌	✅
PP-MiniLM	✅	❌	❌	❌	❌
ProphetNet	❌	❌	❌	✅	❌
Reformer	✅	❌	✅	❌	❌
RemBERT	✅	✅	✅	❌	✅
RoBERTa	✅	✅	✅	❌	✅
RoFormer	✅	✅	✅	❌	❌
SKEP	✅	✅	❌	❌	❌
SqueezeBERT	✅	✅	✅	❌	❌
T5	❌	❌	❌	✅	❌
TinyBERT	✅	❌	❌	❌	❌
UnifiedTransformer	❌	❌	❌	✅	❌
XLNet	✅	✅	✅	❌	✅

For more pretrained model usage, please refer to Transformer API Docs.

Industrial End-to-end System

We provide high value scenarios including information extraction, semantic retrieval, question answering high-value.

For more details industrial cases please refer to Applications.

🔍 Neural Search System

For more details please refer to Neural Search.

❓ Question Answering System

We provide question answering pipeline which can support FAQ system, Document-level Visual Question answering system based on 🚀RocketQA.

For more details please refer to Question Answering and Document VQA.

💌 Opinion Extraction and Sentiment Analysis

We build an opinion extraction system for product review and fine-grained sentiment analysis based on SKEP Model.

For more details please refer to Sentiment Analysis.

🎙️ Speech Command Analysis

Integrated ASR Model, Information Extraction, we provide a speech command analysis pipeline that show how to use PaddleNLP and PaddleSpeech to solve Speech + NLP real scenarios.

For more details please refer to Speech Command Analysis.

High Performance Distributed Training and Inference

⚡ FastTokenizer: High Performance Text Preprocessing Library

AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)

Set use_fast=True to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to FastTokenizer.

⚡ FastGeneration: High Performance Generation Library

model = GPTLMHeadModel.from_pretrained('gpt-cpm-large-cn')
...
outputs, _ = model.generate(
    input_ids=inputs_ids, max_length=10, decode_strategy='greedy_search',
    use_fast=True)

Set use_fast=True to achieve 5x speedup for Transformer, GPT, BART, PLATO, UniLM text generation. For more usage please refer to FastGeneration.

🚀 Fleet: 4D Hybrid Distributed Training

For more super large-scale model pre-training details please refer to GPT-3.

Quick Start

Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG scenario, in the meanwhile with extremely fast inference satisfying industrial applications.

from paddlenlp import Taskflow

# Chinese Word Segmentation
seg = Taskflow("word_segmentation")
seg("第十四届全运会在西安举办")
>>> ['第十四届', '全运会', '在', '西安', '举办']

# POS Tagging
tag = Taskflow("pos_tagging")
tag("第十四届全运会在西安举办")
>>> [('第十四届', 'm'), ('全运会', 'nz'), ('在', 'p'), ('西安', 'LOC'), ('举办', 'v')]

# Named Entity Recognition
ner = Taskflow("ner")
ner("《孤女》是2010年九州出版社出版的小说，作者是余兼羽")
>>> [('《', 'w'), ('孤女', '作品类_实体'), ('》', 'w'), ('是', '肯定词'), ('2010年', '时间类'), ('九州出版社', '组织机构类'), ('出版', '场景事件'), ('的', '助词'), ('小说', '作品类_概念'), ('，', 'w'), ('作者', '人物类_概念'), ('是', '肯定词'), ('余兼羽', '人物类_实体')]

# Dependency Parsing
ddp = Taskflow("dependency_parsing")
ddp("9月9日上午纳达尔在亚瑟·阿什球场击败俄罗斯球员梅德韦杰夫")
>>> [{'word': ['9月9日', '上午', '纳达尔', '在', '亚瑟·阿什球场', '击败', '俄罗斯', '球员', '梅德韦杰夫'], 'head': [2, 6, 6, 5, 6, 0, 8, 9, 6], 'deprel': ['ATT', 'ADV', 'SBV', 'MT', 'ADV', 'HED', 'ATT', 'ATT', 'VOB']}]

# Sentiment Analysis
senta = Taskflow("sentiment_analysis")
senta("这个产品用起来真的很流畅，我非常喜欢")
>>> [{'text': '这个产品用起来真的很流畅，我非常喜欢', 'label': 'positive', 'score': 0.9938690066337585}]

API Reference

Support LUGE dataset loading and compatible with Hugging Face Datasets. For more details please refer to Dataset API.
Using Hugging Face style API to load 500+ selected transformer models and download with fast speed. For more information please refer to Transformers API.
One-line of code to load pre-trained word embedding. For more usage please refer to Embedding API.

Please find all PaddleNLP API Reference from our readthedocs.

Community

Slack

To connect with other users and contributors, welcome to join our Slack channel.

WeChat

Scan the QR code below with your Wechat⬇️. You can access to official technical exchange group. Look forward to your participation.

Citation

If you find PaddleNLP useful in your research, please consider cite

@misc{=paddlenlp,
    title={PaddleNLP: An Easy-to-use and High Performance NLP Library},
    author={PaddleNLP Contributors},
    howpublished = {\url{https://github.com/PaddlePaddle/PaddleNLP}},
    year={2021}
}

Acknowledge

We have borrowed from Hugging Face's Transformers🤗 excellent design on pretrained models usage, and we would like to express our gratitude to the authors of Hugging Face and its open source community.

License

PaddleNLP is provided under the Apache-2.0 License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.0.0b4 pre-release

Mar 12, 2025

3.0.0b3 pre-release

Dec 13, 2024

3.0.0b2 pre-release

Oct 8, 2024

3.0.0b1 pre-release

Aug 22, 2024

3.0.0b0 pre-release

Jun 28, 2024

This version

2.8.1

Jun 20, 2024

2.8.0

Apr 24, 2024

2.7.2

Jan 30, 2024

2.7.1

Jan 4, 2024

2.7.0

Jan 3, 2024

2.6.1

Sep 14, 2023

2.6.0

Aug 15, 2023

2.6.0rc0 pre-release

Jun 12, 2023

2.5.2

Mar 7, 2023

2.5.1

Feb 17, 2023

2.5.0

Jan 12, 2023

2.4.9

Dec 30, 2022

2.4.8

Dec 26, 2022

2.4.7

Dec 23, 2022

2.4.6

Dec 23, 2022

2.4.5

Dec 9, 2022

2.4.4

Nov 28, 2022

2.4.3

Nov 17, 2022

2.4.2

Oct 27, 2022

2.4.1

Oct 14, 2022

2.4.1.dev0 pre-release

Dec 23, 2022

2.4.0

Sep 6, 2022

2.3.7

Aug 24, 2022

2.3.5

Aug 1, 2022

2.3.4

Jun 28, 2022

2.3.3

Jun 7, 2022

2.3.2

Jun 2, 2022

2.3.1

May 19, 2022

2.3.0

May 16, 2022

2.3.0rc1 pre-release

May 13, 2022

2.3.0rc0 pre-release

Apr 29, 2022

2.2.6

Apr 15, 2022

2.2.5

Mar 21, 2022

2.2.4

Jan 26, 2022

2.2.3

Dec 31, 2021

2.2.2

Dec 28, 2021

2.2.1

Dec 17, 2021

2.2.0

Dec 10, 2021

2.1.1

Oct 20, 2021

2.1.0

Oct 11, 2021

2.0.8

Aug 22, 2021

2.0.7

Jul 28, 2021

2.0.6

Jul 19, 2021

2.0.5

Jun 27, 2021

2.0.4

Jun 25, 2021

2.0.3

Jun 17, 2021

2.0.2

Jun 4, 2021

2.0.1

May 21, 2021

2.0.0

May 20, 2021

2.0.0rc25 pre-release

May 19, 2021

2.0.0rc24 pre-release

May 19, 2021

2.0.0rc23 pre-release

May 19, 2021

2.0.0rc22 pre-release

May 13, 2021

2.0.0rc21 pre-release

May 7, 2021

2.0.0rc20 pre-release

Apr 30, 2021

2.0.0rc19 pre-release

Apr 29, 2021

2.0.0rc18 pre-release

Apr 16, 2021

2.0.0rc17 pre-release

Apr 13, 2021

2.0.0rc16 pre-release

Apr 3, 2021

2.0.0rc15 pre-release

Apr 1, 2021

2.0.0rc14 pre-release

Mar 16, 2021

2.0.0rc13 pre-release

Mar 15, 2021

2.0.0rc12 pre-release

Mar 11, 2021

2.0.0rc11 pre-release

Mar 10, 2021

2.0.0rc10 pre-release

Mar 9, 2021

2.0.0rc9 pre-release

Mar 8, 2021

2.0.0rc8 pre-release

Mar 6, 2021

2.0.0rc7 pre-release

Mar 4, 2021

2.0.0rc6 pre-release

Mar 3, 2021

2.0.0rc5 pre-release

Mar 1, 2021

2.0.0rc4 pre-release

Feb 26, 2021

2.0.0rc3 pre-release

Feb 25, 2021

2.0.0rc2 pre-release

Feb 22, 2021

2.0.0rc1 pre-release

Feb 2, 2021

2.0.0rc0 pre-release

Feb 2, 2021

2.0.0b4 pre-release

Jan 28, 2021

2.0.0b3 pre-release

Jan 13, 2021

2.0.0b2 pre-release

Dec 30, 2020

2.0.0b1 pre-release

Dec 23, 2020

2.0.0b0 pre-release

Dec 17, 2020

2.0.0a9 pre-release

Dec 16, 2020

2.0.0a8 pre-release

Dec 15, 2020

2.0.0a7 pre-release

Dec 15, 2020

2.0.0a6 pre-release

Dec 14, 2020

2.0.0a5 pre-release

Dec 14, 2020

2.0.0a4 pre-release

Dec 12, 2020

2.0.0a3 pre-release

Dec 11, 2020

2.0.0a2 pre-release

Dec 10, 2020

2.0.0a1 pre-release

Dec 7, 2020

2.0.0a0 pre-release

Dec 4, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paddlenlp-2.8.1.tar.gz (2.3 MB view details)

Uploaded Jun 20, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paddlenlp-2.8.1-py3-none-any.whl (2.9 MB view details)

Uploaded Jun 20, 2024 Python 3

File details

Details for the file paddlenlp-2.8.1.tar.gz.

File metadata

Download URL: paddlenlp-2.8.1.tar.gz
Upload date: Jun 20, 2024
Size: 2.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.8.18

File hashes

Hashes for paddlenlp-2.8.1.tar.gz
Algorithm	Hash digest
SHA256	`dda5f07255c6e6658e5adabcfc8fcf4730e2770dab166a6516173689fccc686d`
MD5	`5164c7a365bda1e1830c02fea72ab9c7`
BLAKE2b-256	`3e508b5fc5a50cfdbc72cd98b8612436294b9eb435fd74e28e6289cf9996fa7e`

See more details on using hashes here.

File details

Details for the file paddlenlp-2.8.1-py3-none-any.whl.

File metadata

Download URL: paddlenlp-2.8.1-py3-none-any.whl
Upload date: Jun 20, 2024
Size: 2.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.8.18

File hashes

Hashes for paddlenlp-2.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cb5324ee5c39d29264ec5049ea5a6beeebb04a625b5a5c519b92475ea7d067f`
MD5	`b295b2b6c5ef4c73e1f5c2dd56da54fe`
BLAKE2b-256	`446298dd0ca2f6600ca1dfc9c59ba1b40628df5f7948abc85ba16c3367c49cf4`

See more details on using hashes here.

paddlenlp 2.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

News 📢

Installation

Prerequisites

Python pip Installation

Features

📦 Out-of-Box NLP Toolset

🤗 Awesome Chinese Model Zoo

🎛️ Industrial End-to-end System

🚀 High Performance Distributed Training and Inference

Out-of-Box NLP Toolset

Awesome Chinese Model Zoo

🀄 Comprehensive Chinese Transformer Models

Wide-range NLP Task Support

Industrial End-to-end System

🔍 Neural Search System

❓ Question Answering System

💌 Opinion Extraction and Sentiment Analysis

🎙️ Speech Command Analysis

High Performance Distributed Training and Inference

⚡ FastTokenizer: High Performance Text Preprocessing Library

⚡ FastGeneration: High Performance Generation Library

🚀 Fleet: 4D Hybrid Distributed Training

Quick Start

API Reference

Community

Slack

WeChat

Citation

Acknowledge

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes