nlp2 · PyPI

Tool for NLP - handle file and text

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Build Tools

Project description

# 🔨 nlp2 🔧

Tools for NLP using Python

This repertory used to handle file io and string cleaning/parsing

## Usage

Install:

```
pip install nlp2
```

Before using :
```
from nlp2 import *
```

# Features
* [File Handling](#file)
* [Text cleaning/parsing](#text)
* [Random Utility](#random)

<h2 id="file">File Handling</h2>

### get_folders_form_dir(path)
Arguments
- `path(String)` : getting all folders under this path (string)

Returns
- `path(String)(generator)` : path of folders under arguments path
Examples
```
for i in get_folders_form_dir('./corpus/')
print(i)

'./corpus/kdd'
'./corpus/nycd'
```

### get_files_from_dir(path)
Arguments
- `path(String)` : getting all files under this path (string)

Returns
- `path(String)(generator)` : path of files under arguments path
Examples
```
for i in get_files_from_dir('./data/')
print(i)

'./data/kdd.txt'
'./data/nycd.txt'
```

### read_dir_files_yield_lines(path)
Arguments
- `path(String)` : getting all files line by lines under this path (string)

Returns
- `line(String)(generator)` : files line under arguments path
Examples
```
for i in read_dir_files_into_lines('./data/')
print(i)

'file1 sent1'
'file1 sent2'
...
'file2 sent1'
...
```

### read_dir_files_into_lines(path)
Arguments
- `path(String)` : getting all files line by lines under this path (string)

Returns
- `line(String)(generator)` : files line under arguments path
Examples
```
i = read_dir_files_into_lines('./data/')
print(i)

['file1 sent1','file1 sent2'...'file2 sent1'...]
```

### read_files_yield_lines(path)
Arguments
- `path(String)` : getting content in input file path (string)

Returns
- `path(String)(generator)` : file line under arguments path
Examples
```
for i in read_dir_files_into_lines('./data/kdd.txt')
print(i)

'sent1'
'sent2'
...
```

### read_files_into_lines(path)
Arguments
- `path(String)` : getting content in input file path (string)

Returns
- `path(String)(generator)` : file line under arguments path
Examples
```
i = read_dir_files_into_lines('./data/kdd.txt')
print(i)

['sent1','sent2'...]
```

### create_new_dir_always(dirPath)
it will replace old dir if exist,or create a new one
Arguments
- `dirPath(String)` : dir location
Examples
```
create_new_dir_always('./data/')
```

### get_dir_with_notexist_create(dirPath):
it will create a new dir if not exist
Arguments
- `dirPath(String)` : dir location that you want to make sure

Returns
- `path(String)` : dir location with surely exist
Examples
```
i = get_dir_with_notexist_create('./data/kdd')
print(i)

'./data/kdd'
```

### write_json_to_file(json_str, loc)
Arguments
- `json_str(String)` : json context in string

Returns
- `path(String)` : output file path
Examples
```
i = write_json_to_file("{"sent":"hi"}",'./data/kdd.json')
print(i)

"'./data/kdd.json'"
```

### is_file_exist(path)
Arguments
- `path(String)` : file location

Returns
- `result(Boolean)` : file exist or not,true will be exist
Examples
```
i = is_file_exist('./data/kdd.txt')
print(i)

true
```

### is_dir_exist(file_dir)
Arguments
- `path(String)` : dir location

Returns
- `result(Boolean)` : dir exist or not,true will be exist
Examples
```
i = is_dir_exist('./data/kdd')
print(i)

false
```

<h2 id="text">Text cleaning/parsing</h2>

### passage_into_sentences(lines)
make lines in array form into sentences array
it split line base on any punctuation
Arguments
- `lines(String Array)` : lines array

Returns
- `sentences(String Array)` : split all line base on punctuations
Examples
```
y = lines_into_sentences(["你好啊.hello，me"]))
print(y)

['你好啊', '千萬別', 'one']
```

### split_sentence_to_ngram(sentence)
it will split sentence into n-grams as many it can
##### be careful with sentence length,long sentence will have worse performance
Arguments
- `sentence(String)` : a string with no punctuation

Returns
- `ngrams(String Array)` : ngrams array

Examples
```
split_sentence_to_ngram("加州旅館")

['加','加州',"加州旅","加州旅館","州","州旅","州旅館","旅","旅館","館"]
```

### split_sentence_to_ngram_in_part(sentence)
it will split sentence into n-grams with diff start point as many it can
##### be careful with sentence length,long sentence will have worse performance
Arguments
- `sentence(String)` : a string with no punctuation

Returns
- `ngrams(Array)` : 2D array with diff start in ngram

Examples
```
split_sentence_to_ngram_in_part("加州旅館")

[['加','加州',"加州旅","加州旅館"],["州","州旅","州旅館"],["旅","旅館"],["館"]]
```

### spilt_text_in_all_ways(sentence)
it will try to find all possible segments way to split sentence
Arguments
- `sentence(String)` : input sentence

Returns
- `seg list(String Array)` : all segments in a array

Examples
```
spilt_text_in_all_ways("加州旅館")

['加州旅館', '加州旅館', '加州旅館', '加州旅館', '加州旅館', '加州旅館', '加州旅館']
```

### spilt_sentence_to_array(sentence)
use to split sentences in different kind of language
Arguments
- `sentence(String)` : input sentence

Returns
- `segment array(String Array)` : word array

```
spilt_sentence_to_array('你好 are u 可以')

['你好', 'are', 'u', '可以']
```

### join_words_array_to_sentence(words_array):
Arguments
- `words_array(String Array)` : input array

Returns
- `sentence(String)` : output sentence
Examples
```
join_words_array_to_sentence(['你好', 'are', "可以"])

你好are可以
```

### passage_into_chunk(passage, chunk_size):
split a passage in particular size
if part of a sentence excite chunk size, it still put hole sentence into it
Arguments
- `passage(String)` : input passage
- `num_of_paragraphs(int)` : num of character in one chunk

Returns
- `chunk array(String Array)` : passage in chunk size
Examples
```
passage_into_chunk("xxxxxxxx\noo\nyyzz\ngggggg\nkkkk\n",10)

['xxxxxxxx\noo\n', 'yyzz\ngggggg\n']
```

### is_all_english(text)
Arguments
- `text(String)` : input text
Returns
- `result(Boolean)` : whether the text is all English or not
Examples
```
is_all_english("1SGD")
is_all_english("1SG哦")

True
False
```

### is_contain_number(text)
Arguments
- `text(String)` : input text

Returns
- `result(Boolean)` : whether the text contain number or not
Examples
```
is_contain_number("1SGD")
is_contain_number("SG哦")

True
False
```

### is_contain_english(text)
Arguments
- `text(String)` : input text
Returns
- `result(Boolean)` : whether the text contain english or not
Examples
```
is_contain_english("1SGD")
is_contain_english("123哦")

True
False
```

### full2half(text)
Arguments
- `string(String)` : input string which needs turn to half

Returns
- `(String)` : a half-string

Examples
```
full2half("，,")

,,
```

### half2full(text)
Arguments
- `text(String)` : input string which needs turn to full

Returns
- `(String)` : a full-string
Examples
```
half2full("，,")

，，
```

<h2 id="random">Random Utility</h2>

## random_string(length)
Arguments
- `length(int)` : length with random string

Returns
- `randstr(String)` : size will be length in "0123456789ABCDEF"
Examples
```
random_string(10)

D6857CE0F4
```

### random_string_with_timestamp(length)
Arguments
- `length(int)` : length with random string

Returns
- `randstr(String)` : size will be length + timestamp length(10)
Examples
```
random_string_with_timestamp(1)

1435474326D
```

### random_value_in_array_form(array)
random value with range in array form
int,float : [min,max]
string : [candidate1,candidate2...]

Arguments
- `range(array)` : range in array form

Returns
- `random result(depend on input)` : a random value under input condition
Examples
```
# for string
y = random_value_in_array_form(["SGD","ADAM","XDA"])
print(y)

'ADAM'

# for int
y = random_value_in_array_form([1,12])
print(y)

4

# for float
y = random_value_in_array_form([0.01,1.00])
print(y)

0.34
```

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

1.8.53

Apr 12, 2024

1.8.52

Jun 5, 2023

1.8.51

May 25, 2023

1.8.50

May 25, 2023

1.8.49

May 1, 2023

1.8.48

Aug 23, 2022

1.8.47

Jun 27, 2022

1.8.46

Jun 19, 2022

1.8.45

Jun 19, 2022

1.8.44

May 30, 2022

1.8.43

Mar 12, 2022

1.8.42

Mar 9, 2022

1.8.41

Feb 9, 2022

1.8.40

Jan 11, 2022

1.8.39

Dec 27, 2021

1.8.38

Sep 24, 2021

1.8.36

Jun 15, 2021

1.8.35

Jun 13, 2021

1.8.34

May 20, 2021

1.8.33

May 20, 2021

1.8.32

May 8, 2021

1.8.31

Apr 10, 2021

1.8.30

Apr 8, 2021

1.8.29

Nov 1, 2020

1.8.28

Oct 30, 2020

1.8.27

Oct 17, 2020

1.8.26

Oct 16, 2020

1.8.25

Oct 16, 2020

1.8.25.dev0 pre-release

Oct 14, 2020

1.8.24

Oct 14, 2020

1.8.23

Oct 3, 2020

1.8.22

Oct 1, 2020

1.8.21

Oct 1, 2020

1.8.20

Sep 18, 2020

1.8.19

Sep 3, 2020

1.8.18

Sep 2, 2020

1.8.17

Aug 26, 2020

1.8.16

Aug 26, 2020

1.8.15

Aug 24, 2020

1.8.14

Aug 24, 2020

1.8.13

Aug 7, 2020

1.8.13.dev6 pre-release

Aug 7, 2020

1.8.13.dev5 pre-release

Aug 7, 2020

1.8.13.dev4 pre-release

Aug 7, 2020

1.8.13.dev3 pre-release

Aug 7, 2020

1.8.13.dev2 pre-release

Aug 7, 2020

1.8.13.dev1 pre-release

Aug 7, 2020

1.8.12

Jul 20, 2020

1.8.11

Jul 20, 2020

1.8.10

Jul 19, 2020

1.8.9

Jul 18, 2020

1.8.8

Jul 15, 2020

1.8.7

Jul 15, 2020

1.8.6

Jul 15, 2020

1.8.5

Jul 12, 2020

1.8.4

Jul 8, 2020

1.8.3

Jul 8, 2020

1.8.2

Jul 6, 2020

1.8.1

Jul 6, 2020

1.8.0

Jul 6, 2020

1.7.10

Jul 6, 2020

1.7.9

Jul 6, 2020

1.7.8

Jul 6, 2020

1.7.7

Jul 6, 2020

1.7.6

Jul 6, 2020

1.7.5

Jul 6, 2020

1.7.4

Jul 6, 2020

1.7.3

Jul 6, 2020

1.7.2

Jul 6, 2020

1.7.1

Jul 5, 2020

1.7.0

Jul 5, 2020

1.6.9

Jul 5, 2020

1.6.8

Jul 4, 2020

1.6.7

Jul 4, 2020

1.6.6

Jul 3, 2020

1.6.5

Jul 3, 2020

1.6.2

Jun 23, 2020

1.6.1

Jun 18, 2020

1.6.0

Jun 18, 2020

1.5.9

Nov 17, 2019

1.5.8

Nov 9, 2019

1.5.7

Nov 9, 2019

1.5.6

Oct 11, 2019

1.5.5

Sep 30, 2019

1.5.0

Mar 5, 2019

1.4.5

Sep 30, 2019

1.4.0

Feb 26, 2019

1.3.9

Feb 26, 2019

1.3.8

Feb 2, 2019

1.3.5

Feb 2, 2019

1.3.0

Jan 22, 2019

1.2.5

Jan 17, 2019

1.2.0

Dec 30, 2018

1.1.3

Dec 15, 2018

1.1.2

Nov 29, 2018

1.1.1

Nov 18, 2018

1.1.0

Oct 3, 2018

1.0.9

Oct 3, 2018

1.0.8

Oct 3, 2018

1.0.7

Oct 3, 2018

This version

1.0.6

Oct 3, 2018

1.0.5

Oct 3, 2018

1.0.4

Oct 3, 2018

1.0.3

Oct 3, 2018

1.0.2

Jun 7, 2018

1.0.1

Jun 7, 2018

1.0.0

Mar 14, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp2-1.0.6.tar.gz (6.6 kB view hashes)

Uploaded Oct 3, 2018 Source

Built Distribution

nlp2-1.0.6-py3-none-any.whl (16.2 kB view hashes)

Uploaded Oct 3, 2018 Python 3

Hashes for nlp2-1.0.6.tar.gz

Hashes for nlp2-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`a5b4ca4d21c239ddb61ce12d562cd0e78b68509c9fe0a8447589c20efcf6f732`
MD5	`9129174ebe5ed94913fa019e909fc4b4`
BLAKE2b-256	`a51ccb390926d0715cd826744c91cbc0589a2441383dc15b91cb495f9fae7964`

Hashes for nlp2-1.0.6-py3-none-any.whl

Hashes for nlp2-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a5db356a313c906ff38294bdde8a8b45f81b8ee9d8ffb9c0f404eaec622451b`
MD5	`9f2c18d019595dd893074b180f1c0c57`
BLAKE2b-256	`a35c4d4ed42350fe57e2d0fd6502014732881fd418f029a92eca9bd547783ab3`