Tool for NLP - handle file and text
Project description
🔨 nlp2 🔧
========
Tools for NLP using Python
This repertory used to handle file io and string cleaning/parsing
Usage
-----
Install:
``pip install nlp2``
Before using : ``from nlp2 import *``
Features
========
File Handling
~~~~~~~~~~~~~
get\ *folders*\ form_dir(path) Arguments - ``path(String)`` : getting all folders under this path (string) Returns - ``path(String)(generator)`` : path of folders under arguments path
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
get\ *files*\ from_dir(path) Arguments - ``path(String)`` : getting all files under this path (string) Returns - ``path(String)(generator)`` : path of files under arguments path
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
read\ *dir*\ files\ *into*\ lines(path)
---------------------------------------
Arguments - ``path(String)`` : getting all files line by lines under
this path (string) Returns - ``line(String)(generator)`` : files line
under arguments path
read\ *files*\ into_lines(path) Arguments - ``path(String)`` : getting content in input file path (string) Returns - ``path(String)(generator)`` : file line under arguments path
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
String cleaning/parsing
~~~~~~~~~~~~~~~~~~~~~~~
lines\ *into*\ sentence(lines)
------------------------------
Arguments - ``lines(Array(String))`` : lines array Returns -
``path(String)(generator)`` : split all line base on punctuations
split\ *sentence*\ to_ngram(text) Arguments - ``path(String)`` : sentence to ngram
----------------------------------------------------------------------------------
Returns - ``ngrams(Array)`` : ngrams array
| Examples
| ``split_sentence_to_ngram("加州旅館") return ['加','加州',"加州旅","加州旅館","州","州旅","州旅館","旅","旅館","館"]``
split\ *sentence*\ to\ *ngram*\ inpart(text)
--------------------------------------------
| Arguments - ``path(String)`` : sentence to ngram Returns -
``path(String)(generator)`` : multiple ngrams array in different start
character
| Examples
| ``split_sentence_to_ngram("加州旅館") return [['加','加州',"加州旅","加州旅館"],["州","州旅","州旅館"],["旅","旅館"],["館"]]``
spilt\ *text*\ to\ *combine*\ ways(text)
----------------------------------------
| Arguments - ``text(String)`` : input text Returns -
``path(String)(generator)`` : all of the text combines ways Examples
| ``spilt_text_to_combine_ways("加州旅館") return ['加 州 旅 館', '加 州 旅館', '加 州旅 館', '加 州旅館', '加州 旅館', '加州旅 館', '加州旅館']``
spilt\ *sentence*\ to_array(sentence) Arguments - ``sentence(String)`` : input text Returns - ``sentencearray(Array)`` : sentence array
---------------------------------------------------------------------------------------------------------------------------------------
is\ *all*\ english(text)
------------------------
Arguments - ``text(String)`` : input text Returns - ``result(Boolean)``
: whether the text is all English or not
is\ *contain*\ number(text)
---------------------------
Arguments - ``text(String)`` : input text Returns - ``result(Boolean)``
: whether the text contain number or not
is\ *contain*\ english(text)
----------------------------
Arguments - ``text(String)`` : input text Returns - ``result(Boolean)``
: whether the text contain english or not
full2half(text)
---------------
Arguments - ``string(String)`` : input string which needs turn to half
Returns - ``(String)`` : a half-string
half2full(text)
---------------
Arguments - ``text(String)`` : input string which needs turn to full
Returns - ``(String)`` : a full-string
========
Tools for NLP using Python
This repertory used to handle file io and string cleaning/parsing
Usage
-----
Install:
``pip install nlp2``
Before using : ``from nlp2 import *``
Features
========
File Handling
~~~~~~~~~~~~~
get\ *folders*\ form_dir(path) Arguments - ``path(String)`` : getting all folders under this path (string) Returns - ``path(String)(generator)`` : path of folders under arguments path
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
get\ *files*\ from_dir(path) Arguments - ``path(String)`` : getting all files under this path (string) Returns - ``path(String)(generator)`` : path of files under arguments path
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
read\ *dir*\ files\ *into*\ lines(path)
---------------------------------------
Arguments - ``path(String)`` : getting all files line by lines under
this path (string) Returns - ``line(String)(generator)`` : files line
under arguments path
read\ *files*\ into_lines(path) Arguments - ``path(String)`` : getting content in input file path (string) Returns - ``path(String)(generator)`` : file line under arguments path
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
String cleaning/parsing
~~~~~~~~~~~~~~~~~~~~~~~
lines\ *into*\ sentence(lines)
------------------------------
Arguments - ``lines(Array(String))`` : lines array Returns -
``path(String)(generator)`` : split all line base on punctuations
split\ *sentence*\ to_ngram(text) Arguments - ``path(String)`` : sentence to ngram
----------------------------------------------------------------------------------
Returns - ``ngrams(Array)`` : ngrams array
| Examples
| ``split_sentence_to_ngram("加州旅館") return ['加','加州',"加州旅","加州旅館","州","州旅","州旅館","旅","旅館","館"]``
split\ *sentence*\ to\ *ngram*\ inpart(text)
--------------------------------------------
| Arguments - ``path(String)`` : sentence to ngram Returns -
``path(String)(generator)`` : multiple ngrams array in different start
character
| Examples
| ``split_sentence_to_ngram("加州旅館") return [['加','加州',"加州旅","加州旅館"],["州","州旅","州旅館"],["旅","旅館"],["館"]]``
spilt\ *text*\ to\ *combine*\ ways(text)
----------------------------------------
| Arguments - ``text(String)`` : input text Returns -
``path(String)(generator)`` : all of the text combines ways Examples
| ``spilt_text_to_combine_ways("加州旅館") return ['加 州 旅 館', '加 州 旅館', '加 州旅 館', '加 州旅館', '加州 旅館', '加州旅 館', '加州旅館']``
spilt\ *sentence*\ to_array(sentence) Arguments - ``sentence(String)`` : input text Returns - ``sentencearray(Array)`` : sentence array
---------------------------------------------------------------------------------------------------------------------------------------
is\ *all*\ english(text)
------------------------
Arguments - ``text(String)`` : input text Returns - ``result(Boolean)``
: whether the text is all English or not
is\ *contain*\ number(text)
---------------------------
Arguments - ``text(String)`` : input text Returns - ``result(Boolean)``
: whether the text contain number or not
is\ *contain*\ english(text)
----------------------------
Arguments - ``text(String)`` : input text Returns - ``result(Boolean)``
: whether the text contain english or not
full2half(text)
---------------
Arguments - ``string(String)`` : input string which needs turn to half
Returns - ``(String)`` : a half-string
half2full(text)
---------------
Arguments - ``text(String)`` : input string which needs turn to full
Returns - ``(String)`` : a full-string
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
nlp2-1.0.3-py3.6.egg
(8.7 kB
view hashes)