John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.

These details have not been verified by PyPI

Project links

Homepage

Project description

Spark NLP: State of the Art Natural Language Processing

Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports state-of-the-art transformers such as BERT, XLNet, ELMO, ALBERT, and Universal Sentence Encoder that can be used seamlessly in a cluster. It also offers Tokenization, Word Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Dependency Parsing, Spell Checking, Multi-class Text Classification, Multi-class Sentiment Analysis, Machine Translation (+180 languages), Summarization and Question Answering (Google T5), and many more NLP tasks.

Project's website

Take a look at our official Spark NLP page: http://nlp.johnsnowlabs.com/ for user documentation and examples

Community support

Slack For live discussion with the Spark NLP community and the team
GitHub Bug reports, feature requests, and contributions
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
Medium Spark NLP articles
YouTube Spark NLP video tutorials

Features

Tokenization
Trainable Word Segmentation
Stop Words Removal
Token Normalizer
Document Normalizer
Stemmer
Lemmatizer
NGrams
Regex Matching
Text Matching
Chunking
Date Matcher
Sentence Detector
Deep Sentence Detector (Deep learning)
Dependency parsing (Labeled/unlabeled)
Part-of-speech tagging
Sentiment Detection (ML models)
Spell Checker (ML and DL models)
Word Embeddings (GloVe and Word2Vec)
BERT Embeddings (TF Hub models)
ELMO Embeddings (TF Hub models)
ALBERT Embeddings (TF Hub models)
XLNet Embeddings
Universal Sentence Encoder (TF Hub models)
BERT Sentence Embeddings (42 TF Hub models)
Sentence Embeddings
Chunk Embeddings
Unsupervised keywords extraction
Language Detection & Identification (up to 375 languages)
Multi-class Sentiment analysis (Deep learning)
Multi-label Sentiment analysis (Deep learning)
Multi-class Text Classification (Deep learning)
Neural Machine Translation
Text-To-Text Transfer Transformer (Google T5)
Named entity recognition (Deep learning)
Easy TensorFlow integration
GPU Support
Full integration with Spark ML functions
+710 pre-trained models in +192 languages!
+450 pre-trained pipelines in +192 languages!
Multi-lingual NER models: Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Hewbrew, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, and Urdu.

Quick Start

This is a quick example of how to use Spark NLP pre-trained pipeline in Python and PySpark:

$ java -version
# should be Java 8 (Oracle or OpenJDK)
$ conda create -n sparknlp python=3.6 -y
$ conda activate sparknlp
$ pip install spark-nlp pyspark==2.4.7

In Python console or Jupyter Python3 kernel:

# Import Spark NLP
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.pretrained import PretrainedPipeline
import sparknlp

# Start Spark Session with Spark NLP
# start() functions has two parameters: gpu and spark23
# sparknlp.start(gpu=True) will start the session with GPU support
# sparknlp.start(spark23=True) is when you have Apache Spark 2.3.x installed
spark = sparknlp.start()

# Download a pre-trained pipeline
pipeline = PretrainedPipeline('explain_document_dl', lang='en')

# Your testing dataset
text = """
The Mona Lisa is a 16th century oil painting created by Leonardo.
It's held at the Louvre in Paris.
"""

# Annotate your testing dataset
result = pipeline.annotate(text)

# What's in the pipeline
list(result.keys())
Output: ['entities', 'stem', 'checked', 'lemma', 'document',
'pos', 'token', 'ner', 'embeddings', 'sentence']

# Check the results
result['entities']
Output: ['Mona Lisa', 'Leonardo', 'Louvre', 'Paris']

For more examples, you can visit our dedicated repository to showcase all Spark NLP use cases!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

5.5.0rc1 pre-release

Sep 6, 2024

5.4.2

Aug 28, 2024

5.4.1

Jul 14, 2024

5.4.0

Jun 29, 2024

5.4.0rc2 pre-release

Jun 12, 2024

5.4.0rc1 pre-release

Jun 5, 2024

5.3.3

Apr 5, 2024

5.3.2

Mar 20, 2024

5.3.1

Mar 4, 2024

5.3.0

Feb 27, 2024

5.2.3

Jan 18, 2024

5.2.2

Jan 1, 2024

5.2.1

Dec 27, 2023

5.2.0

Dec 8, 2023

5.1.4

Oct 26, 2023

5.1.3

Oct 10, 2023

5.1.2

Sep 25, 2023

5.1.1

Sep 11, 2023

5.1.0

Aug 26, 2023

5.0.2

Aug 2, 2023

5.0.1

Jul 18, 2023

5.0.0

Jul 3, 2023

4.4.4

Jun 8, 2023

4.4.3

May 25, 2023

4.4.2

May 10, 2023

4.4.1

Apr 25, 2023

4.4.0

Apr 10, 2023

4.3.2

Mar 14, 2023

4.3.1

Feb 24, 2023

4.3.0

Feb 9, 2023

4.2.8

Jan 24, 2023

4.2.7

Jan 12, 2023

4.2.6

Dec 21, 2022

4.2.5

Dec 15, 2022

4.2.4

Nov 28, 2022

4.2.3

Nov 10, 2022

4.2.3rc1 pre-release

Nov 4, 2022

4.2.2

Oct 27, 2022

4.2.1

Oct 12, 2022

4.2.0

Sep 27, 2022

4.1.0

Aug 24, 2022

4.0.2

Jul 19, 2022

4.0.1

Jul 1, 2022

4.0.0

Jun 15, 2022

3.4.4

May 5, 2022

3.4.3

Apr 12, 2022

3.4.2

Mar 9, 2022

3.4.1

Feb 8, 2022

3.4.0

Jan 5, 2022

3.4.0rc1 pre-release

Jan 1, 2022

3.3.4

Nov 25, 2021

3.3.3

Nov 22, 2021

3.3.2

Nov 3, 2021

3.3.1

Oct 18, 2021

3.3.0

Sep 29, 2021

3.2.3

Sep 15, 2021

3.2.2

Sep 1, 2021

3.2.1

Aug 11, 2021

3.2.0

Aug 10, 2021

3.1.3

Jul 20, 2021

3.1.2

Jul 7, 2021

3.1.1

Jun 23, 2021

3.1.0

Jun 7, 2021

3.0.3

May 6, 2021

3.0.2

Apr 21, 2021

3.0.1

Apr 2, 2021

3.0.0

Mar 22, 2021

3.0.0rc11 pre-release

Mar 18, 2021

This version

3.0.0rc10 pre-release

Mar 18, 2021

3.0.0rc8 pre-release

Mar 15, 2021

3.0.0rc7 pre-release

Mar 15, 2021

3.0.0rc3 pre-release

Mar 12, 2021

3.0.0rc2 pre-release

Mar 12, 2021

3.0.0rc1 pre-release

Mar 12, 2021

2.7.5

Mar 5, 2021

2.7.4

Feb 19, 2021

2.7.3

Feb 5, 2021

2.7.2

Jan 25, 2021

2.7.1

Jan 8, 2021

2.7.0

Jan 4, 2021

2.7.0rc1 pre-release

Dec 25, 2020

2.6.5

Dec 15, 2020

2.6.4

Nov 24, 2020

2.6.3

Oct 28, 2020

2.6.3rc3 pre-release

Oct 15, 2020

2.6.3rc2 pre-release

Oct 13, 2020

2.6.3rc1 pre-release

Oct 7, 2020

2.6.2

Oct 1, 2020

2.6.1

Sep 11, 2020

2.6.0

Sep 2, 2020

2.6.0rc3 pre-release

Aug 29, 2020

2.6.0rc2 pre-release

Aug 28, 2020

2.6.0rc1 pre-release

Aug 28, 2020

2.5.5

Aug 4, 2020

2.5.4

Jul 20, 2020

2.5.3

Jul 3, 2020

2.5.2

Jun 11, 2020

2.5.1

May 26, 2020

2.5.0

May 10, 2020

2.5.0rc2 pre-release

May 5, 2020

2.5.0rc1 pre-release

Apr 29, 2020

2.4.5

Apr 2, 2020

2.4.4

Mar 16, 2020

2.4.4rc1 pre-release

Mar 15, 2020

2.4.3

Mar 9, 2020

2.4.2

Mar 4, 2020

2.4.1

Feb 17, 2020

2.4.0

Feb 3, 2020

2.4.0rc1 pre-release

Jan 22, 2020

2.3.6

Jan 11, 2020

2.3.5

Dec 21, 2019

2.3.4

Nov 27, 2019

2.3.3

Nov 21, 2019

2.3.2

Nov 8, 2019

2.3.1

Oct 30, 2019

2.3.0

Oct 26, 2019

2.2.2

Sep 26, 2019

2.2.1

Aug 28, 2019

2.2.0

Aug 23, 2019

2.2.0rc3 pre-release

Aug 21, 2019

2.2.0rc2 pre-release

Aug 18, 2019

2.2.0rc1 pre-release

Aug 16, 2019

2.1.1

Aug 18, 2019

2.1.0

Jul 14, 2019

2.1.0rc2 pre-release

Jun 29, 2019

2.1.0rc1 pre-release

Jun 28, 2019

2.0.9

Jul 2, 2019

2.0.8

Jun 5, 2019

2.0.7

Jun 2, 2019

2.0.6

May 30, 2019

2.0.5

May 29, 2019

2.0.4

May 16, 2019

2.0.3

Apr 30, 2019

2.0.2

Apr 29, 2019

2.0.1

Mar 24, 2019

2.0.0

Mar 20, 2019

1.8.4

Mar 31, 2019

1.8.3

Feb 24, 2019

1.8.2

Feb 8, 2019

1.8.1

Jan 26, 2019

1.8.0

Dec 26, 2018

1.7.3

Nov 12, 2018

1.7.2

Oct 22, 2018

1.7.1

Oct 20, 2018

1.7.0

Oct 16, 2018

1.6.3

Sep 17, 2018

1.6.2

Sep 7, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark-nlp-3.0.0rc10.tar.gz (31.1 kB view hashes)

Uploaded Mar 18, 2021 Source

Built Distribution

spark_nlp-3.0.0rc10-py2.py3-none-any.whl (140.1 kB view hashes)

Uploaded Mar 18, 2021 Python 2 Python 3

Hashes for spark-nlp-3.0.0rc10.tar.gz

Hashes for spark-nlp-3.0.0rc10.tar.gz
Algorithm	Hash digest
SHA256	`eaf1d1beea1bcaa8c98ec8f7b770f9be0493e24640ed22c4a567e5b38d4febc1`
MD5	`72791e58ec884a8b3c49ac800ae64b07`
BLAKE2b-256	`fda64ce4fd568f010a8349d43af3beb8de45a396fb626e9f4377607f7221d362`

Hashes for spark_nlp-3.0.0rc10-py2.py3-none-any.whl

Hashes for spark_nlp-3.0.0rc10-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e6530d7d5fc677336308800ba8d501ec855da376f3b28bce2996fd52b5695e5`
MD5	`5b41f7d0f16177c7592628dc2191f757`
BLAKE2b-256	`9dfcd4fdc782b87ecf867836be2020bc5658a4507b92a4f5d5227f9523863bc5`