Skip to main content

PyThaiNLP For spaCy

Project description

spaCy-PyThaiNLP

This package wraps the PyThaiNLP library to add support Thai for spaCy.

Support List

  • Word segmentation
  • Part-of-speech
  • Named entity recognition
  • Sentence segmentation
  • Dependency parsing
  • Word vector

Install

pip install spacy-pythainlp

How to use

Example

import spacy
import spacy_pythainlp.core

nlp = spacy.blank("th")
# Segment the Doc into sentences
nlp.add_pipe(
   "pythainlp", 
)

data=nlp("ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  ผมอยากไปเที่ยว")
print(list(list(data.sents)))
# output: [ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  , ผมอยากไปเที่ยว]

You can config the setting in the nlp.add_pipe.

nlp.add_pipe(
    "pythainlp", 
    config={
        "pos_engine": "perceptron",
        "pos": True,
        "pos_corpus": "orchid_ud",
        "sent_engine": "crfcut",
        "sent": True,
        "ner_engine": "thainer",
        "ner": True,
        "tokenize_engine": "newmm",
        "tokenize": False,
        "dependency_parsing": False,
        "dependency_parsing_engine": "esupar",
        "dependency_parsing_model": None,
        "word_vector": True,
        "word_vector_model": "thai2fit_wv"
    }
)
  • tokenize: Bool (True or False) to change the word tokenize. (the default spaCy is newmm of PyThaiNLP)
  • tokenize_engine: The tokenize engine. You can read more: Options for engine
  • sent: Bool (True or False) to turn on the sentence tokenizer.
  • sent_engine: The sentence tokenizer engine. You can read more: Options for engine
  • pos: Bool (True or False) to turn on the part-of-speech.
  • pos_engine: The part-of-speech engine. You can read more: Options for engine
  • ner: Bool (True or False) to turn on the NER.
  • ner_engine: The NER engine. You can read more: Options for engine
  • dependency_parsing: Bool (True or False) to turn on the Dependency parsing.
  • dependency_parsing_engine: The Dependency parsing engine. You can read more: Options for engine
  • dependency_parsing_model: The Dependency parsing model. You can read more: Options for model
  • word_vector: Bool (True or False) to turn on the word vector.
  • word_vector_model: The word vector model. You can read more: Options for model

Note: If you turn on Dependency parsing, word segmentation and sentence segmentation are turn off to use word segmentation and sentence segmentation from Dependency parsing.

License

   Copyright 2016-2023 PyThaiNLP Project

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-pythainlp-0.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

spacy_pythainlp-0.1-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file spacy-pythainlp-0.1.tar.gz.

File metadata

  • Download URL: spacy-pythainlp-0.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for spacy-pythainlp-0.1.tar.gz
Algorithm Hash digest
SHA256 d60cc0593b1abcf6fbc84403ba4747ae92ffb8e434261d64200c6e5e594dd931
MD5 ce9b14006d5ef64529573fabd30a9d7c
BLAKE2b-256 cd312e99b56c472413a9ff62817dde4f8325787c82503129ab5f42e615455873

See more details on using hashes here.

File details

Details for the file spacy_pythainlp-0.1-py3-none-any.whl.

File metadata

  • Download URL: spacy_pythainlp-0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for spacy_pythainlp-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3094bb8ebb2a9a7bea003ecd5220f57007ed9f9b58d180d46dfebb36c5dc715d
MD5 ab00e1f38a7d91d02f254db97f2fdece
BLAKE2b-256 fe83c221fa4fcd5b6eb5448e50ff818260ac9bc879814711e78e1eb49089a46e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page