Skip to main content

The bareun python library using grpc

Reason this release was yanked:

broken dependency

Project description

What is this?

bareunpy is the python 3 library for bareun.

Bareun is a Korean NLP, which provides tokenizing, POS tagging for Korean.

How to install

pip3 install bareunpy

How to get bareun

  • Go to https://bareun.ai/.
    • With registration, for the first time, you can get a free license for 3 months.
    • If you are a student or a researcher, you can get also a free license for 1 year, which is able to renew after 1 year.
  • Or use docker image.
docker pull bareunai/bareun:latest

How to use, tagger

import sys
import google.protobuf.text_format as tf
from bareunpy import Tagger

# If you have your own localhost bareun.
my_tagger = Tagger('localhost')
# or if you have your own bareun which is running on 10.8.3.211:15656.
my_tagger = Tagger('10.8.3.211', 15656)
# or with smaller public cloud instance, it may be slow. It is free.
tagger = Tagger()

# print results. 
res = tagger.tags(["안녕하세요.", "반가워요!"])

# get protobuf message.
m = res.msg()
tf.PrintMessage(m, out=sys.stdout, as_utf8=True)
print(tf.MessageToString(m, as_utf8=True))
print(f'length of sentences is {len(m.sentences)}')
## output : 2
print(f'length of tokens in sentences[0] is {len(m.sentences[0].tokens)}')
print(f'length of morphemes of first token in sentences[0] is {len(m.sentences[0].tokens[0].morphemes)}')
print(f'lemma of first token in sentences[0] is {m.sentences[0].tokens[0].lemma}')
print(f'first morph of first token in sentences[0] is {m.sentences[0].tokens[0].morphemes[0]}')
print(f'tag of first morph of first token in sentences[0] is {m.sentences[0].tokens[0].morphemes[0].tag}')

## Advanced usage.
for sent in m.sentences:
    for token in sent.tokens:
        for m in token.morphemes:
            print(f'{m.text.content}/{m.tag}:{m.probability}:{m.out_of_vocab})

# get json object
jo = res.as_json()
print(jo)

# get tuple of pos tagging.
pa = res.pos()
print(pa)
# another methods
ma = res.morphs()
print(ma)
na = res.nouns()
print(na)
va = res.verbs()
print(va)

# custom dictionary
cust_dic = tagger.custom_dict("my")
cust_dic.copy_np_set({'내고유명사', '우리집고유명사'})
cust_dic.copy_cp_set({'코로나19'})
cust_dic.copy_cp_caret_set({'코로나^백신', '"독감^백신'})
cust_dic.update()

# laod prev custom dict
cust_dict2 = tagger.custom_dict("my")
cust_dict2.load()

tagger.set_domain('my')
tagger.pos('코로나19는 언제 끝날까요?')

How to use, tokenizer

import sys
import google.protobuf.text_format as tf
from bareunpy import Tokenizer

# If you have your own localhost bareun.
my_tokenizer = Tokenizer('localhost')
# or if you have your own bareun which is running on 10.8.3.211:15656.
my_tokenizer = Tagger('10.8.3.211', 15656)
# or with smaller public cloud instance, it may be slow. It is free.
tokenizer = Tokenizer()

# print results. 
tokenized = tokenizer.tokenize_list(["안녕하세요.", "반가워요!"])

# get protobuf message.
m = tokenized.msg()
tf.PrintMessage(m, out=sys.stdout, as_utf8=True)
print(tf.MessageToString(m, as_utf8=True))
print(f'length of sentences is {len(m.sentences)}')
## output : 2
print(f'length of tokens in sentences[0] is {len(m.sentences[0].tokens)}')
print(f'length of segments of first token in sentences[0] is {len(m.sentences[0].tokens[0].segments)}')
print(f'tagged of first token in sentences[0] is {m.sentences[0].tokens[0].tagged}')
print(f'first segment of first token in sentences[0] is {m.sentences[0].tokens[0].segments[0]}')
print(f'hint of first morph of first token in sentences[0] is {m.sentences[0].tokens[0].segments[0].hint}')

## Advanced usage.
for sent in m.sentences:
    for token in sent.tokens:
        for m in token.segments:
            print(f'{m.text.content}/{m.hint})

# get json object
jo = tokenized.as_json()
print(jo)

# get tuple of segments
ss = tokenized.segments()
print(ss)
ns = tokenized.nouns()
print(ns)
vs = tokenized.verbs()
print(vs)
# postpositions: 조사
ps = tokenized.postpositions()
print(ps)
# Adverbs, 부사
ass = tokenized.adverbs()
print(ass)
ss = tokenized.symbols()
print(ss)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bareunpy-1.4.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

bareunpy-1.4.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file bareunpy-1.4.0.tar.gz.

File metadata

  • Download URL: bareunpy-1.4.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.9.6 Darwin/21.6.0

File hashes

Hashes for bareunpy-1.4.0.tar.gz
Algorithm Hash digest
SHA256 5e6273f70f0c69925c4697ea3f8cae41281b8691fa46487e90c8b97d0713a64b
MD5 a5e4d73ae5c34ad599b772252033545f
BLAKE2b-256 230557b0f1a455e3d10d73a353746533cfddb5a6bd67735b0bfd1dc7399f0bb5

See more details on using hashes here.

File details

Details for the file bareunpy-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: bareunpy-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.9.6 Darwin/21.6.0

File hashes

Hashes for bareunpy-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d1259a46a7893163cf6cf02a65f4ea84c403fee8b86879e33e3c97c593c656ff
MD5 af1a05d2d60dd8d0685ee5a4f6c6d66f
BLAKE2b-256 3cf3fef14c460a60b34d1c5b362fb59590f6038d513ce78744a71ff80c1ff89e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page