ANYKS Language Model (ALM)
Requirements
Install PyBind11
$ python3 -m pip install pybind11
Description of Methods
Methods:
- idw - Word ID retrieval method
- idt - Token ID retrieval method
- ids - Sequence ID retrieval method
Example:
>>> import alm
>>> alm.idw("hello")
1794085167
>>> alm.idw("<s>")
1
>>> alm.idw("</s>")
19
>>> alm.idw("<unk>")
3
>>> alm.idt("1424")
2
>>> alm.idt("hello")
0
>>> alm.idw("Living")
48384019276
>>> alm.idw("in")
2833
>>> alm.idw("the")
175734
>>> alm.idw("USA")
147770
>>> alm.ids([48384019276, 2833, 175734, 147770])
2514129976
Description
Name |
Description |
〈s〉 |
Sentence beginning token |
〈/s〉 |
Sentence end token |
〈url〉 |
URL-address token |
〈num〉 |
Number (arabic or roman) token |
〈unk〉 |
Unknown word token |
〈time〉 |
Time token (15:44:56) |
〈score〉 |
Score count token (4:3 ¦ 01:04) |
〈fract〉 |
Fraction token (5/20 ¦ 192/864) |
〈date〉 |
Date token (18.07.2004 ¦ 07/18/2004) |
〈abbr〉 |
Abbreviation token (1-й ¦ 2-е ¦ 20-я ¦ p.s ¦ p.s.) |
〈dimen〉 |
Dimensions token (200x300 ¦ 1920x1080) |
〈range〉 |
Range of numbers token (1-2 ¦ 100-200 ¦ 300-400) |
〈aprox〉 |
Approximate number token (~93 ¦ 95.86 ¦ 1020) |
〈anum〉 |
Pseudo-number token (combination of numbers and other symbols) (T34 ¦ 895-M-86 ¦ 39km) |
〈pcards〉 |
Symbols of the play cards (♠ ¦ ♣ ¦ ♥ ¦ ♦ ) |
〈punct〉 |
Punctuation token (. ¦ , ¦ ? ¦ ! ¦ : ¦ ; ¦ … ¦ ¡ ¦ ¿) |
〈route〉 |
Direction symbols (arrows) (← ¦ ↑ ¦ ↓ ¦ ↔ ¦ ↵ ¦ ⇐ ¦ ⇑ ¦ ⇒ ¦ ⇓ ¦ ⇔ ¦ ◄ ¦ ▲ ¦ ► ¦ ▼) |
〈greek〉 |
Symbols of the Greek alphabet (Α ¦ Β ¦ Γ ¦ Δ ¦ Ε ¦ Ζ ¦ Η ¦ Θ ¦ Ι ¦ Κ ¦ Λ ¦ Μ ¦ Ν ¦ Ξ ¦ Ο ¦ Π ¦ Ρ ¦ Σ ¦ Τ ¦ Υ ¦ Φ ¦ Χ ¦ Ψ ¦ Ω) |
〈isolat〉 |
Isolation/quotation token (( ¦ ) ¦ [ ¦ ] ¦ { ¦ } ¦ " ¦ « ¦ » ¦ „ ¦ “ ¦ ` ¦ ⌈ ¦ ⌉ ¦ ⌊ ¦ ⌋ ¦ ‹ ¦ › ¦ ‚ ¦ ’ ¦ ′ ¦ ‛ ¦ ″ ¦ ‘ ¦ ” ¦ ‟ ¦ ' ¦〈 ¦ 〉) |
〈specl〉 |
Special character token (_ ¦ @ ¦ # ¦ № ¦ © ¦ ® ¦ & ¦ § ¦ æ ¦ ø ¦ Þ ¦ – ¦ ‾ ¦ ‑ ¦ — ¦ ¯ ¦ ¶ ¦ ˆ ¦ ˜ ¦ † ¦ ‡ ¦ • ¦ ‰ ¦ ⁄ ¦ ℑ ¦ ℘ ¦ ℜ ¦ ℵ ¦ ◊ ¦ \ ) |
〈currency〉 |
Symbols of world currencies ($ ¦ € ¦ ₽ ¦ ¢ ¦ £ ¦ ₤ ¦ ¤ ¦ ¥ ¦ ℳ ¦ ₣ ¦ ₴ ¦ ₸ ¦ ₹ ¦ ₩ ¦ ₦ ¦ ₭ ¦ ₪ ¦ ৳ ¦ ƒ ¦ ₨ ¦ ฿ ¦ ₫ ¦ ៛ ¦ ₮ ¦ ₱ ¦ ﷼ ¦ ₡ ¦ ₲ ¦ ؋ ¦ ₵ ¦ ₺ ¦ ₼ ¦ ₾ ¦ ₠ ¦ ₧ ¦ ₯ ¦ ₢ ¦ ₳ ¦ ₥ ¦ ₰ ¦ ₿ ¦ ұ) |
〈math〉 |
Mathematical operation token (+ ¦ - ¦ = ¦ / ¦ * ¦ ^ ¦ × ¦ ÷ ¦ − ¦ ∕ ¦ ∖ ¦ ∗ ¦ √ ¦ ∝ ¦ ∞ ¦ ∠ ¦ ± ¦ ¹ ¦ ² ¦ ³ ¦ ½ ¦ ⅓ ¦ ¼ ¦ ¾ ¦ % ¦ ~ ¦ · ¦ ⋅ ¦ ° ¦ º ¦ ¬ ¦ ƒ ¦ ∀ ¦ ∂ ¦ ∃ ¦ ∅ ¦ ∇ ¦ ∈ ¦ ∉ ¦ ∋ ¦ ∏ ¦ ∑ ¦ ∧ ¦ ∨ ¦ ∩ ¦ ∪ ¦ ∫ ¦ ∴ ¦ ∼ ¦ ≅ ¦ ≈ ¦ ≠ ¦ ≡ ¦ ≤ ¦ ≥ ¦ ª ¦ ⊂ ¦ ⊃ ¦ ⊄ ¦ ⊆ ¦ ⊇ ¦ ⊕ ¦ ⊗ ¦ ⊥ ¦ ¨) |
Methods:
- setZone - User zone set method
Example:
>>> import alm
>>> alm.setZone("com")
Methods:
- clear - Method clear all data
- setAlphabet - Method set alphabet
- getAlphabet - Method get alphabet
Example:
>>> import alm
>>> alm.getAlphabet()
'abcdefghijklmnopqrstuvwxyz'
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.getAlphabet()
'abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя'
>>> alm.clear()
>>> alm.getAlphabet()
'abcdefghijklmnopqrstuvwxyz'
Methods:
- setUnknown - Method set unknown word
Example:
>>> import alm
>>> alm.setUnknown("word")
Methods:
- getUnknown - Method extraction unknown word
Example:
>>> import alm
>>> alm.setUnknown("word")
>>> alm.getUnknown()
'word'
Methods:
- sentences - Sentences generation method
- readLM - Method for reading data from arpa file
- sentencesToFile - Method for assembling a specified number of sentences and writing to a file
Example:
>>> import alm
>>> def sentencesFn(text):
... print("Sentences:", text)
... return True
...
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.sentences(sentencesFn)
Sentences: <s> В общем </s>
Sentences: <s> С лязгом выкатился и остановился возле мальчика </s>
Sentences: <s> У меня нет </s>
Sentences: <s> Я вообще не хочу </s>
Sentences: <s> Да и в общем </s>
Sentences: <s> Не могу </s>
Sentences: <s> Ну в общем </s>
Sentences: <s> Так что я вообще не хочу </s>
Sentences: <s> Потому что я вообще не хочу </s>
Sentences: <s> Продолжение следует </s>
Sentences: <s> Неожиданно из подворотни в олега ударил яркий прожектор патрульный трактор </s>
>>> alm.sentencesToFile(5, "./result.txt")
Read binary file of language model
Example:
>>> import alm
>>> import json
>>> def sentencesFn(text):
... print("Sentences:", text)
... return True
...
>>> meta = {
... "aes": 128,
... "name": "Test",
... "author": "Test",
... "lictype": "MIT"
... }
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.alm', json.dumps(meta))
>>> alm.sentences(sentencesFn)
Sentences: <s> В общем </s>
Sentences: <s> С лязгом выкатился и остановился возле мальчика </s>
Sentences: <s> У меня нет </s>
Sentences: <s> Я вообще не хочу </s>
Sentences: <s> Да и в общем </s>
Sentences: <s> Не могу </s>
Sentences: <s> Ну в общем </s>
Sentences: <s> Так что я вообще не хочу </s>
Sentences: <s> Потому что я вообще не хочу </s>
Sentences: <s> Продолжение следует </s>
Sentences: <s> Неожиданно из подворотни в олега ударил яркий прожектор патрульный трактор </s>
>>> alm.sentencesToFile(5, "./result.txt")
Binary container metadata
{
"aes": 128,
"name": "Name dictionary",
"author": "Name author",
"lictype": "License type",
"lictext": "License text",
"contacts": "Contacts data",
"password": "Password if needed",
"copyright": "Copyright author"
}
Description:
- aes - AES Encryption Size (128, 192, 256) bits
- name - The dictionary name
- author - The author of the dictionary
- lictype - The license type
- lictext - The license text
- contacts - The author contact info
- password - An encryption password (if required), encryption is performed only when setting a password
- copyright - Copyright of the dictionary owner
Methods:
- findNgram - N-gram search method in text
Example:
>>> import alm
>>> def callbackFn(text):
... print(text)
...
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.findNgram("Особое место занимает чудотворная икона Лобзание Христа Иудою", callbackFn)
<s> Особое
Особое место
место занимает
занимает чудотворная
чудотворная икона
икона Лобзание
Лобзание Христа
Христа Иудою
Иудою </s>
>>>
Methods:
- setOption - Method for set module options
Example:
>>> import alm
>>> alm.setOption(alm.options_t.debug)
>>> alm.setOption(alm.options_t.mixdicts)
>>> alm.setOption(alm.options_t.onlyGood)
>>> alm.setOption(alm.options_t.confidence)
Methods:
- unsetOption - Disable module option method
Example:
>>> import alm
>>> alm.unsetOption(alm.options_t.debug)
>>> alm.unsetOption(alm.options_t.mixdicts)
>>> alm.unsetOption(alm.options_t.onlyGood)
>>> alm.unsetOption(alm.options_t.confidence)
Description
Name |
Description |
debug |
Flag debug mode |
mixdicts |
Flag allowing the use of words consisting of mixed dictionaries |
onlyGood |
Flag allowing to consider words from the white list only |
confidence |
Flag arpa file loading without pre-processing the words |
Methods:
- size - Method of obtaining the size of the N-gram
Example:
>>> import alm
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.size()
3
Methods:
- textToJson - Method to convert text to JSON
- isAllowApostrophe - Apostrophe permission check method
- switchAllowApostrophe - Method for permitting or denying an apostrophe as part of a word
Example:
>>> import alm
>>> def callbackFn(text):
... print(text)
...
>>> alm.isAllowApostrophe()
False
>>> alm.switchAllowApostrophe()
>>> alm.isAllowApostrophe()
True
>>> alm.textToJson("«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie", callbackFn)
[["«","On","nous","dit","qu'aujourd'hui","c'est","le","cas",",","encore","faudra-t-il","l'évaluer","»","l'astronomie"]]
Methods:
- jsonToText - Method to convert JSON to text
Example:
>>> import alm
>>> def callbackFn(text):
... print(text)
...
>>> alm.jsonToText('[["«","On","nous","dit","qu\'aujourd\'hui","c\'est","le","cas",",","encore","faudra-t-il","l\'évaluer","»","l\'astronomie"]]', callbackFn)
«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie
Methods:
- restore - Method for restore text from context
Example:
>>> import alm
>>> alm.restore(["«","On","nous","dit","qu\'aujourd\'hui","c\'est","le","cas",",","encore","faudra-t-il","l\'évaluer","»","l\'astronomie"])
"«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie"
Methods:
- allowStress - Method for allow using stress in words
- disallowStress - Method for disallow using stress in words
Example:
>>> import alm
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> def callbackFn(text):
... print(text)
...
>>> alm.textToJson('«Бе́лая стрела́» — согласно распространённой в 1990-е годы в России городской легенде, якобы специально организованная и подготовленная законспирированная правительственная спецслужба, сотрудники которой — бывшие и действовавшие милиционеры и спецназовцы, имеющие право на физическую ликвидацию особо опасных уголовных авторитетов и лидеров орудовавших в России ОПГ, относительно которых не представляется возможным привлечения их к уголовной ответственности законными методами[1][2][3]. Несмотря на отсутствие официальных доказательств существования организации и многочисленные опровержения со стороны силовых структур и служб безопасности[4], в российском обществе легенду считают основанной на подлинных фактах громких убийств криминальных авторитетов, совершённых в 1990-е годы, и не исключают существование реальной спецслужбы[5].', callbackFn)
[["«","Белая","стрела","»","—","согласно","распространённой","в","1990-е","годы","в","России","городской","легенде",",","якобы","специально","организованная","и","подготовленная","законспирированная","правительственная","спецслужба",",","сотрудники","которой","—","бывшие","и","действовавшие","милиционеры","и","спецназовцы",",","имеющие","право","на","физическую","ликвидацию","особо","опасных","уголовных","авторитетов","и","лидеров","орудовавших","в","России","ОПГ",",","относительно","которых","не","представляется","возможным","привлечения","их","к","уголовной","ответственности","законными","методами","[","1","]","[","2","]","[","3","]","."],["Несмотря","на","отсутствие","официальных","доказательств","существования","организации","и","многочисленные","опровержения","со","стороны","силовых","структур","и","служб","безопасности","[","4","]",",","в","российском","обществе","легенду","считают","основанной","на","подлинных","фактах","громких","убийств","криминальных","авторитетов",",","совершённых","в","1990-е","годы",",","и","не","исключают","существование","реальной","спецслужбы","[","5","]","."]]
>>> alm.jsonToText('[["«","Белая","стрела","»","—","согласно","распространённой","в","1990-е","годы","в","России","городской","легенде",",","якобы","специально","организованная","и","подготовленная","законспирированная","правительственная","спецслужба",",","сотрудники","которой","—","бывшие","и","действовавшие","милиционеры","и","спецназовцы",",","имеющие","право","на","физическую","ликвидацию","особо","опасных","уголовных","авторитетов","и","лидеров","орудовавших","в","России","ОПГ",",","относительно","которых","не","представляется","возможным","привлечения","их","к","уголовной","ответственности","законными","методами","[","1","]","[","2","]","[","3","]","."],["Несмотря","на","отсутствие","официальных","доказательств","существования","организации","и","многочисленные","опровержения","со","стороны","силовых","структур","и","служб","безопасности","[","4","]",",","в","российском","обществе","легенду","считают","основанной","на","подлинных","фактах","громких","убийств","криминальных","авторитетов",",","совершённых","в","1990-е","годы",",","и","не","исключают","существование","реальной","спецслужбы","[","5","]","."]]', callbackFn)
«Белая стрела» — согласно распространённой в 1990-е годы в России городской легенде, якобы специально организованная и подготовленная законспирированная правительственная спецслужба, сотрудники которой — бывшие и действовавшие милиционеры и спецназовцы, имеющие право на физическую ликвидацию особо опасных уголовных авторитетов и лидеров орудовавших в России ОПГ, относительно которых не представляется возможным привлечения их к уголовной ответственности законными методами [1] [2] [3].
Несмотря на отсутствие официальных доказательств существования организации и многочисленные опровержения со стороны силовых структур и служб безопасности [4], в российском обществе легенду считают основанной на подлинных фактах громких убийств криминальных авторитетов, совершённых в 1990-е годы, и не исключают существование реальной спецслужбы [5].
>>> alm.allowStress()
>>> alm.textToJson('«Бе́лая стрела́» — согласно распространённой в 1990-е годы в России городской легенде, якобы специально организованная и подготовленная законспирированная правительственная спецслужба, сотрудники которой — бывшие и действовавшие милиционеры и спецназовцы, имеющие право на физическую ликвидацию особо опасных уголовных авторитетов и лидеров орудовавших в России ОПГ, относительно которых не представляется возможным привлечения их к уголовной ответственности законными методами[1][2][3]. Несмотря на отсутствие официальных доказательств существования организации и многочисленные опровержения со стороны силовых структур и служб безопасности[4], в российском обществе легенду считают основанной на подлинных фактах громких убийств криминальных авторитетов, совершённых в 1990-е годы, и не исключают существование реальной спецслужбы[5].', callbackFn)
[["«","Бе́лая","стрела́","»","—","согласно","распространённой","в","1990-е","годы","в","России","городской","легенде",",","якобы","специально","организованная","и","подготовленная","законспирированная","правительственная","спецслужба",",","сотрудники","которой","—","бывшие","и","действовавшие","милиционеры","и","спецназовцы",",","имеющие","право","на","физическую","ликвидацию","особо","опасных","уголовных","авторитетов","и","лидеров","орудовавших","в","России","ОПГ",",","относительно","которых","не","представляется","возможным","привлечения","их","к","уголовной","ответственности","законными","методами","[","1","]","[","2","]","[","3","]","."],["Несмотря","на","отсутствие","официальных","доказательств","существования","организации","и","многочисленные","опровержения","со","стороны","силовых","структур","и","служб","безопасности","[","4","]",",","в","российском","обществе","легенду","считают","основанной","на","подлинных","фактах","громких","убийств","криминальных","авторитетов",",","совершённых","в","1990-е","годы",",","и","не","исключают","существование","реальной","спецслужбы","[","5","]","."]]
>>> alm.jsonToText('[["«","Бе́лая","стрела́","»","—","согласно","распространённой","в","1990-е","годы","в","России","городской","легенде",",","якобы","специально","организованная","и","подготовленная","законспирированная","правительственная","спецслужба",",","сотрудники","которой","—","бывшие","и","действовавшие","милиционеры","и","спецназовцы",",","имеющие","право","на","физическую","ликвидацию","особо","опасных","уголовных","авторитетов","и","лидеров","орудовавших","в","России","ОПГ",",","относительно","которых","не","представляется","возможным","привлечения","их","к","уголовной","ответственности","законными","методами","[","1","]","[","2","]","[","3","]","."],["Несмотря","на","отсутствие","официальных","доказательств","существования","организации","и","многочисленные","опровержения","со","стороны","силовых","структур","и","служб","безопасности","[","4","]",",","в","российском","обществе","легенду","считают","основанной","на","подлинных","фактах","громких","убийств","криминальных","авторитетов",",","совершённых","в","1990-е","годы",",","и","не","исключают","существование","реальной","спецслужбы","[","5","]","."]]', callbackFn)
«Бе́лая стрела́» — согласно распространённой в 1990-е годы в России городской легенде, якобы специально организованная и подготовленная законспирированная правительственная спецслужба, сотрудники которой — бывшие и действовавшие милиционеры и спецназовцы, имеющие право на физическую ликвидацию особо опасных уголовных авторитетов и лидеров орудовавших в России ОПГ, относительно которых не представляется возможным привлечения их к уголовной ответственности законными методами [1] [2] [3].
Несмотря на отсутствие официальных доказательств существования организации и многочисленные опровержения со стороны силовых структур и служб безопасности [4], в российском обществе легенду считают основанной на подлинных фактах громких убийств криминальных авторитетов, совершённых в 1990-е годы, и не исключают существование реальной спецслужбы [5].
>>> alm.disallowStress()
>>> alm.textToJson('«Бе́лая стрела́» — согласно распространённой в 1990-е годы в России городской легенде, якобы специально организованная и подготовленная законспирированная правительственная спецслужба, сотрудники которой — бывшие и действовавшие милиционеры и спецназовцы, имеющие право на физическую ликвидацию особо опасных уголовных авторитетов и лидеров орудовавших в России ОПГ, относительно которых не представляется возможным привлечения их к уголовной ответственности законными методами[1][2][3]. Несмотря на отсутствие официальных доказательств существования организации и многочисленные опровержения со стороны силовых структур и служб безопасности[4], в российском обществе легенду считают основанной на подлинных фактах громких убийств криминальных авторитетов, совершённых в 1990-е годы, и не исключают существование реальной спецслужбы[5].', callbackFn)
[["«","Белая","стрела","»","—","согласно","распространённой","в","1990-е","годы","в","России","городской","легенде",",","якобы","специально","организованная","и","подготовленная","законспирированная","правительственная","спецслужба",",","сотрудники","которой","—","бывшие","и","действовавшие","милиционеры","и","спецназовцы",",","имеющие","право","на","физическую","ликвидацию","особо","опасных","уголовных","авторитетов","и","лидеров","орудовавших","в","России","ОПГ",",","относительно","которых","не","представляется","возможным","привлечения","их","к","уголовной","ответственности","законными","методами","[","1","]","[","2","]","[","3","]","."],["Несмотря","на","отсутствие","официальных","доказательств","существования","организации","и","многочисленные","опровержения","со","стороны","силовых","структур","и","служб","безопасности","[","4","]",",","в","российском","обществе","легенду","считают","основанной","на","подлинных","фактах","громких","убийств","криминальных","авторитетов",",","совершённых","в","1990-е","годы",",","и","не","исключают","существование","реальной","спецслужбы","[","5","]","."]]
>>> alm.jsonToText('[["«","Белая","стрела","»","—","согласно","распространённой","в","1990-е","годы","в","России","городской","легенде",",","якобы","специально","организованная","и","подготовленная","законспирированная","правительственная","спецслужба",",","сотрудники","которой","—","бывшие","и","действовавшие","милиционеры","и","спецназовцы",",","имеющие","право","на","физическую","ликвидацию","особо","опасных","уголовных","авторитетов","и","лидеров","орудовавших","в","России","ОПГ",",","относительно","которых","не","представляется","возможным","привлечения","их","к","уголовной","ответственности","законными","методами","[","1","]","[","2","]","[","3","]","."],["Несмотря","на","отсутствие","официальных","доказательств","существования","организации","и","многочисленные","опровержения","со","стороны","силовых","структур","и","служб","безопасности","[","4","]",",","в","российском","обществе","легенду","считают","основанной","на","подлинных","фактах","громких","убийств","криминальных","авторитетов",",","совершённых","в","1990-е","годы",",","и","не","исключают","существование","реальной","спецслужбы","[","5","]","."]]', callbackFn)
«Белая стрела» — согласно распространённой в 1990-е годы в России городской легенде, якобы специально организованная и подготовленная законспирированная правительственная спецслужба, сотрудники которой — бывшие и действовавшие милиционеры и спецназовцы, имеющие право на физическую ликвидацию особо опасных уголовных авторитетов и лидеров орудовавших в России ОПГ, относительно которых не представляется возможным привлечения их к уголовной ответственности законными методами [1] [2] [3].
Несмотря на отсутствие официальных доказательств существования организации и многочисленные опровержения со стороны силовых структур и служб безопасности [4], в российском обществе легенду считают основанной на подлинных фактах громких убийств криминальных авторитетов, совершённых в 1990-е годы, и не исключают существование реальной спецслужбы [5].
Methods:
- addBadword - Method add bad word
- setBadwords - Method set words to blacklist
- getBadwords - Method get words in blacklist
Example:
>>> import alm
>>> alm.setBadwords(["hello", "world", "test"])
>>> alm.getBadwords()
{24227504, 1219922507, 1794085167}
>>> alm.addBadword("test2")
>>> alm.getBadwords()
{24227504, 5035487504, 1219922507, 1794085167}
Example:
>>> import alm
>>> alm.setBadwords({24227504, 1219922507, 1794085167})
>>> alm.getBadwords()
{24227504, 1219922507, 1794085167}
Methods:
- addGoodword - Method add good word
- setGoodwords - Method set words to whitelist
- getGoodwords - Method get words in whitelist
Example:
>>> import alm
>>> alm.setGoodwords(["hello", "world", "test"])
>>> alm.getGoodwords()
{24227504, 1219922507, 1794085167}
>>> alm.addGoodword("test2")
>>> alm.getGoodwords()
{24227504, 5035487504, 1219922507, 1794085167}
Example:
>>> import alm
>>> alm.setGoodwords({24227504, 1219922507, 1794085167})
>>> alm.getGoodwords()
{24227504, 1219922507, 1794085167}
Methods:
- setUserToken - Method for adding user token
- getUserTokens - User token list retrieval method
- getUserTokenId - Method for obtaining user token identifier
- getUserTokenWord - Method for obtaining a custom token by its identifier
Example:
>>> import alm
>>> alm.setUserToken("usa")
>>> alm.setUserToken("russia")
>>> alm.getUserTokenId("usa")
4188610529
>>> alm.getUserTokenId("russia")
47207634939
>>> alm.getUserTokens()
['usa', 'russia']
>>> alm.getUserTokenWord(4188610529)
'usa'
>>> alm.getUserTokenWord(47207634939)
'russia'
Methods:
- setUserTokenMethod - Method for set a custom token processing function
Example:
>>> import alm
>>> def fn(token, word):
... if token and (token == "<usa>"):
... if word and (word.lower() == "usa"):
... return True
... elif token and (token == "<russia>"):
... if word and (word.lower() == "russia"):
... return True
... return False
...
>>> alm.setUserToken("usa")
>>> alm.setUserToken("russia")
>>> alm.setUserTokenMethod("usa", fn)
>>> alm.setUserTokenMethod("russia", fn)
>>> alm.idw("usa")
346562990
>>> alm.idw("russia")
3602214519
>>> alm.getUserTokenWord(346562990)
'usa'
>>> alm.getUserTokenWord(3602214519)
'russia'
Methods:
- setWordPreprocessingMethod - Method for set the word preprocessing function
Example:
>>> import alm
>>> def run(word, context):
... if word == "возле": word = "около"
... return word
...
>>> alm.setOption(alm.options_t.debug)
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.setWordPreprocessingMethod(run)
>>> a = alm.perplexity("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
info: <s> Неожиданно из подворотни в олега ударил яркий прожектор патрульный трактор <punct> <punct> <punct> </s>
info: p( неожиданно | <s> ) = [2gram] 0.00250617 [ -2.60098900 ] / 0.99999999
info: p( из | неожиданно ...) = [3gram] 0.84584931 [ -0.07270700 ] / 1.00000081
info: p( подворотни | из ...) = [3gram] 0.73518561 [ -0.13360300 ] / 0.99999924
info: p( в | подворотни ...) = [3gram] 0.93193581 [ -0.03061400 ] / 0.99999960
info: p( олега | в ...) = [3gram] 0.72047846 [ -0.14237900 ] / 1.00000026
info: p( ударил | олега ...) = [3gram] 0.89971301 [ -0.04589600 ] / 1.00000043
info: p( яркий | ударил ...) = [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
info: p( прожектор | яркий ...) = [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
info: p( патрульный | прожектор ...) = [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
info: p( трактор | патрульный ...) = [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
info: p( <punct> | трактор ...) = [OOV] 0.00000000 [ -inf ] / 0.99999999
info: p( <punct> | <punct> ...) = [OOV] 0.00000000 [ -inf ] / 1.00000011
info: p( <punct> | <punct> ...) = [OOV] 0.00000000 [ -inf ] / 1.00000011
info: p( </s> | <punct> ...) = [1gram] 0.07816800 [ -1.10697100 ] / 1.00000011
info: 1 sentences, 13 words, 0 OOVs
info: 3 zeroprobs, logprob= -4.25945900 ppl= 2.01487019 ppl1= 2.12642805
info: <s> С лязгом выкатился и остановился около мальчика <punct> <punct> <punct> <punct> </s>
info: p( с | <s> ) = [2gram] 0.01301973 [ -1.88539800 ] / 0.99999999
info: p( лязгом | с ...) = [3gram] 0.21850984 [ -0.66052900 ] / 1.00000061
info: p( выкатился | лязгом ...) = [3gram] 0.92987592 [ -0.03157500 ] / 0.99999918
info: p( и | выкатился ...) = [3gram] 0.93211608 [ -0.03053000 ] / 0.99999926
info: p( остановился | и ...) = [3gram] 0.72065433 [ -0.14227300 ] / 0.99999975
info: p( около | остановился ...) = [1gram] 0.00003415 [ -4.46662200 ] / 1.00000027
info: p( мальчика | около ...) = [1gram] 0.00023364 [ -3.63146100 ] / 0.99999938
info: p( <punct> | мальчика ...) = [OOV] 0.00000000 [ -inf ] / 0.99999965
info: p( <punct> | <punct> ...) = [OOV] 0.00000000 [ -inf ] / 1.00000011
info: p( <punct> | <punct> ...) = [OOV] 0.00000000 [ -inf ] / 1.00000011
info: p( <punct> | <punct> ...) = [OOV] 0.00000000 [ -inf ] / 1.00000011
info: p( </s> | <punct> ...) = [1gram] 0.07816800 [ -1.10697100 ] / 1.00000011
info: 1 sentences, 11 words, 0 OOVs
info: 4 zeroprobs, logprob= -11.95535900 ppl= 9.91470774 ppl1= 12.21380039
>>> print(a.logprob)
-16.214818
Methods:
- setLogfile - Method of set the file for log output
- setOOvFile - Method set file for saving OOVs words
Example:
>>> import alm
>>> alm.setLogfile("./log.txt")
>>> alm.setOOvFile("./oov.txt")
Methods:
- perplexity - Perplexity calculation
- pplConcatenate - Method of combining perplexia
- pplByFiles - Method for reading perplexity calculation by file or group of files
Example:
>>> import alm
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> a = alm.perplexity("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
>>> print(a.logprob)
-8.238353
>>> print(a.oovs)
0
>>> print(a.words)
24
>>> print(a.sentences)
2
>>> print(a.zeroprobs)
7
>>> print(a.ppl)
2.135669866658319
>>> print(a.ppl1)
2.204269585673276
>>> b = alm.pplByFiles("./text.txt")
>>> c = alm.pplConcatenate(a, b)
>>> print(c.ppl)
7.384123548831112
Description
Name |
Description |
ppl |
The meaning of perplexity without considering the beginning of the sentence |
ppl1 |
The meaning of perplexion taking into account the beginning of the sentence |
oovs |
Count of oov words |
words |
Count of words in sentence |
logprob |
Word sequence frequency |
sentences |
Count of sequences |
zeroprobs |
Count of zero probs |
Methods:
- tokenization - Method for breaking text into tokens
Example:
>>> import alm
>>> def tokensFn(word, context, reset, stop):
... print(word, " => ", context)
... return True
...
>>> alm.switchAllowApostrophe()
>>> alm.tokenization("«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie", tokensFn)
« => []
On => ['«']
nous => ['«', 'On']
dit => ['«', 'On', 'nous']
qu'aujourd'hui => ['«', 'On', 'nous', 'dit']
c'est => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui"]
le => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est"]
cas => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le']
, => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas']
encore => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',']
faudra-t-il => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',', 'encore']
l'évaluer => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',', 'encore', 'faudra-t-il']
» => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',', 'encore', 'faudra-t-il', "l'évaluer"]
l'astronomie => ['«', 'On', 'nous', 'dit', "qu'aujourd'hui", "c'est", 'le', 'cas', ',', 'encore', 'faudra-t-il', "l'évaluer", '»']
Methods:
- setTokenizerFn - Method for set the function of an external tokenizer
Example:
>>> import alm
>>> def tokenizerFn(text, callback):
... word = ""
... context = []
... for letter in text:
... if letter == " " and len(word) > 0:
... if not callback(word, context, False, False): return
... context.append(word)
... word = ""
... elif letter == "." or letter == "!" or letter == "?":
... if not callback(word, context, True, False): return
... word = ""
... context = []
... else:
... word += letter
... if len(word) > 0:
... if not callback(word, context, False, True): return
...
>>> def tokensFn(word, context, reset, stop):
... print(word, " => ", context)
... return True
...
>>> alm.setTokenizerFn(tokenizerFn)
>>> alm.tokenization("Hello World today!", tokensFn)
Hello => []
World => ['Hello']
today => ['Hello', 'World']
Methods:
- fixUppers - Method for correcting registers in the text
- fixUppersByFiles - Method for correcting text registers in a text file
Example:
>>> import alm
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.fixUppers("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
'Неожиданно из подворотни в олега ударил яркий прожектор патрульный трактор??? С лязгом выкатился и остановился возле мальчика....'
>>> alm.fixUppersByFiles("./corpus", "./result.txt", "txt")
Methods:
- checkHypLat - Hyphen and latin character search method
Example:
>>> import alm
>>> alm.checkHypLat("Hello-World")
(True, True)
>>> alm.checkHypLat("Hello")
(False, True)
>>> alm.checkHypLat("Привет")
(False, False)
>>> alm.checkHypLat("так-как")
(True, False)
Methods:
- getUppers - Method for extracting registers for each word
Example:
>>> import alm
>>> alm.setOption(alm.options_t.confidence)
>>> alm.readLM('./lm.arpa')
>>> alm.idw("Living")
48384019276
>>> alm.idw("in")
2833
>>> alm.idw("the")
175734
>>> alm.idw("USA")
147770
>>> alm.getUppers([48384019276, 2833, 175734, 147770])
[1, 0, 0, 7]
Methods:
- urls - Method for extracting URL address coordinates in a string
Example:
>>> import alm
>>> alm.urls("This website: example.com was designed with ...")
{14: 25}
>>> alm.urls("This website: https://a.b.c.example.net?id=52#test-1 was designed with ...")
{14: 52}
>>> alm.urls("This website: https://a.b.c.example.net?id=52#test-1 and 127.0.0.1 was designed with ...")
{14: 52, 57: 66}
Methods:
- roman2Arabic - Method for translating Roman numerals to Arabic
Example:
>>> import alm
>>> alm.roman2Arabic("XVI")
16
Methods:
- rest - Method for correction and detection of words with mixed alphabets
- setSubstitutes - Method for set letters to correct words from mixed alphabets
- getSubstitutes - Method of extracting letters to correct words from mixed alphabets
Example:
>>> import alm
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.setSubstitutes({'p':'р','c':'с','o':'о','t':'т','k':'к','e':'е','a':'а','h':'н','x':'х','b':'в','m':'м'})
>>> alm.getSubstitutes()
{'a': 'а', 'b': 'в', 'c': 'с', 'e': 'е', 'h': 'н', 'k': 'к', 'm': 'м', 'o': 'о', 'p': 'р', 't': 'т', 'x': 'х'}
>>> str = "ПPИBETИК"
>>> str.lower()
'пpиbetик'
>>> alm.rest(str)
'приветик'
Methods:
- setTokensDisable - Method for set the list of forbidden tokens
- setTokensUnknown - Method for set the list of tokens cast to 〈unk〉
- setTokenDisable - Method for set the list of unidentifiable tokens
- setTokenUnknown - Method of set the list of tokens that need to be identified as 〈unk〉
- getTokensDisable - Method for retrieving the list of forbidden tokens
- getTokensUnknown - Method for extracting a list of tokens reducible to 〈unk〉
- setAllTokenDisable - Method for set all tokens as unidentifiable
- setAllTokenUnknown - The method of set all tokens identified as 〈unk〉
Example:
>>> import alm
>>> alm.idw("<date>")
6
>>> alm.idw("<time>")
7
>>> alm.idw("<abbr>")
5
>>> alm.idw("<math>")
9
>>> alm.setTokenDisable("date|time|abbr|math")
>>> alm.getTokensDisable()
{9, 5, 6, 7}
>>> alm.setTokensDisable({6, 7, 5, 9})
>>> alm.setTokenUnknown("date|time|abbr|math")
>>> alm.getTokensUnknown()
{9, 5, 6, 7}
>>> alm.setTokensUnknown({6, 7, 5, 9})
>>> alm.setAllTokenDisable()
>>> alm.getTokensDisable()
{2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23}
>>> alm.setAllTokenUnknown()
>>> alm.getTokensUnknown()
{2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23}
Methods:
- countAlphabet - Method of obtaining the number of letters in the dictionary
Example:
>>> import alm
>>> alm.getAlphabet()
'abcdefghijklmnopqrstuvwxyz'
>>> alm.countAlphabet()
26
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.countAlphabet()
59
Methods:
- countBigrams - Method get count bigrams
- countTrigrams - Method get count trigrams
- countGrams - Method get count N-gram by lm size
Example:
>>> import alm
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.countBigrams("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
12
>>> alm.countTrigrams("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
10
>>> alm.size()
3
>>> alm.countGrams("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???с лязгом выкатился и остановился возле мальчика....")
10
>>> alm.idw("неожиданно")
30444893210
>>> alm.idw("из")
4645
>>> alm.idw("подворотни")
7494072262
>>> alm.idw("в")
48
>>> alm.idw("Олега")
2431694341
>>> alm.idw("ударил")
54100711961
>>> alm.countBigrams([30444893210, 4645, 7494072262, 48, 2431694341, 54100711961])
5
>>> alm.countTrigrams([30444893210, 4645, 7494072262, 48, 2431694341, 54100711961])
4
>>> alm.countGrams([30444893210, 4645, 7494072262, 48, 2431694341, 54100711961])
4
Methods:
- arabic2Roman - Convert arabic number to roman number
Example:
>>> import alm
>>> alm.arabic2Roman(23)
'XXIII'
>>> alm.arabic2Roman("33")
'XXXIII'
Methods:
- setLocale - Method set locale (Default: en_US.UTF-8)
Example:
>>> import alm
>>> alm.setLocale("ru_RU.UTF-8")
Methods:
- setThreads - Method for set the number of threads (0 - all threads)
Example:
>>> import alm
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.setThreads(3)
>>> a = alm.pplByFiles("./text.txt")
>>> print(a.logprob)
-48201.29481399994
Methods:
- fti - Method for removing the fractional part of a number
Example:
>>> import alm
>>> alm.fti(5892.4892)
5892489200000
>>> alm.fti(5892.4892, 4)
58924892
Methods:
- context - Method for assembling text context from a sequence
Example:
>>> import alm
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.idw("неожиданно")
30444893210
>>> alm.idw("из")
4645
>>> alm.idw("подворотни")
7494072262
>>> alm.idw("в")
48
>>> alm.idw("Олега")
2431694341
>>> alm.idw("ударил")
54100711961
>>> alm.context([30444893210, 4645, 7494072262, 48, 2431694341, 54100711961])
'Неожиданно из подворотни в олега ударил'
Methods:
- findByFiles - Method search N-grams in a text file
Example:
>>> import alm
>>> alm.setOption(alm.options_t.debug)
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.findByFiles("./text.txt", "./result.txt")
info: <s> Кукай
сари кукай
сари японские
японские каллиграфы
каллиграфы я
я постоянно
постоянно навещал
навещал их
их тайно
тайно от
от людей
людей </s>
info: <s> Неожиданно из
Неожиданно из подворотни
из подворотни в
подворотни в Олега
в Олега ударил
Олега ударил яркий
ударил яркий прожектор
яркий прожектор патрульный
прожектор патрульный трактор
патрульный трактор
<s> С лязгом
С лязгом выкатился
лязгом выкатился и
выкатился и остановился
и остановился возле
остановился возле мальчика
возле мальчика
Methods:
- checkSequence - Sequence Existence Method
- checkByFiles - Method for checking if a sequence exists in a text file
Example:
>>> import alm
>>> alm.setOption(alm.options_t.debug)
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.checkSequence("Неожиданно из подворотни в олега ударил")
(True, 0)
>>> alm.checkSequence("<s> Сегодня сыграл и в Олега ударил яркий прожектор патрульный трактор с корпоративным сектором </s>")
(True, 0)
>>> alm.checkSequence("<s> Сегодня сыграл и в Олега ударил яркий прожектор патрульный трактор с корпоративным сектором </s>", True)
(False, 0)
>>> alm.checkSequence("<s> в Олега ударил яркий </s>")
(True, 0)
>>> alm.checkSequence("<s> в Олега ударил яркий </s>", True)
(True, 0)
>>> alm.checkSequence("от госсекретаря США")
(True, 7)
>>> alm.checkSequence("от госсекретаря США", True)
(False, 0)
>>> alm.idw("от")
5586
>>> alm.idw("госсекретаря")
10074609004
>>> alm.idw("США")
338449
>>> alm.checkSequence([5586, 10074609004, 338449])
(True, 7)
>>> alm.checkSequence([5586, 10074609004, 338449], True)
(False, 0)
>>> alm.checkSequence(["от", "госсекретаря", "США"])
(True, 7)
>>> alm.checkSequence(["от", "госсекретаря", "США"], True)
(False, 0)
>>> alm.checkByFiles("./text.txt", "./result.txt")
info: 1999 | YES | Какой-то период времени мы вообще не общались
info: 2000 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.С лязгом выкатился и остановился возле мальчика.
info: 2001 | YES | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
info: 2002 | NO | кукай <unk> <unk> сари кукай <unk> <unk> сари японские каллиграфы я постоянно навещал их тайно от людей
info: 2003 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
info: 2004 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
info: 2005 | YES | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
info: 2006 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
info: 2007 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.с лязгом выкатился и остановился возле мальчика.
All texts: 2007
Exists texts: 1359
Not exists texts: 648
>>> alm.checkByFiles("./corpus", "./result.txt", False, "txt")
info: 1999 | YES | Какой-то период времени мы вообще не общались
info: 2000 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.С лязгом выкатился и остановился возле мальчика.
info: 2001 | YES | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
info: 2002 | NO | кукай <unk> <unk> сари кукай <unk> <unk> сари японские каллиграфы я постоянно навещал их тайно от людей
info: 2003 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
info: 2004 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
info: 2005 | YES | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
info: 2006 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
info: 2007 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.с лязгом выкатился и остановился возле мальчика.
All texts: 2007
Exists texts: 1359
Not exists texts: 648
>>> alm.checkByFiles("./corpus", "./result.txt", True, "txt")
info: 2000 | NO | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
info: 2001 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.С лязгом выкатился и остановился возле мальчика.
info: 2002 | NO | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
info: 2003 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
info: 2004 | NO | кукай <unk> <unk> сари кукай <unk> <unk> сари японские каллиграфы я постоянно навещал их тайно от людей
info: 2005 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
info: 2006 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
info: 2007 | NO | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.с лязгом выкатился и остановился возле мальчика.
All texts: 2007
Exists texts: 0
Not exists texts: 2007
Methods:
- check - String Check Method
- match - String Matching Method
- setAbbr - Method set abbreviation
Example:
>>> import alm
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.setSubstitutes({'p':'р','c':'с','o':'о','t':'т','k':'к','e':'е','a':'а','h':'н','x':'х','b':'в','m':'м'})
>>> alm.check("Дом-2", alm.check_t.home2)
True
>>> alm.check("Дом2", alm.check_t.home2)
False
>>> alm.check("Дом-2", alm.check_t.latian)
False
>>> alm.check("Hello", alm.check_t.latian)
True
>>> alm.check("прiвет", alm.check_t.latian)
True
>>> alm.check("Дом-2", alm.check_t.hyphen)
True
>>> alm.check("Дом2", alm.check_t.hyphen)
False
>>> alm.check("Д", alm.check_t.letter)
True
>>> alm.check("$", alm.check_t.letter)
False
>>> alm.check("-", alm.check_t.letter)
False
>>> alm.check("просtоквaшино", alm.check_t.similars)
True
>>> alm.match("my site http://example.ru, it's true", alm.match_t.url)
True
>>> alm.match("по вашему ip адресу 46.40.123.12 проводится проверка", alm.match_t.url)
True
>>> alm.match("мой адрес в формате IPv6: http://[2001:0db8:11a3:09d7:1f34:8a2e:07a0:765d]/", alm.match_t.url)
True
>>> alm.match("13-я", alm.match_t.abbr)
True
alm.match("13-я-й", alm.match_t.abbr)
False
alm.match("т.д", alm.match_t.abbr)
True
alm.match("т.п.", alm.match_t.abbr)
True
>>> alm.match("С.Ш.А.", alm.match_t.abbr)
True
>>> alm.setAbbr("сша")
>>> alm.match("США", alm.match_t.abbr)
True
>>> alm.match("Hello", alm.match_t.latian)
True
>>> alm.match("прiвет", alm.match_t.latian)
False
>>> alm.match("23424", alm.match_t.number)
True
>>> alm.match("hello", alm.match_t.number)
False
>>> alm.match("23424.55", alm.match_t.number)
False
>>> alm.match("23424", alm.match_t.decimal)
False
>>> alm.match("23424.55", alm.match_t.decimal)
True
>>> alm.match("23424,55", alm.match_t.decimal)
True
>>> alm.match("-23424.55", alm.match_t.decimal)
True
>>> alm.match("+23424.55", alm.match_t.decimal)
True
>>> alm.match("+23424.55", alm.match_t.anumber)
True
>>> alm.match("15T-34", alm.match_t.anumber)
True
>>> alm.match("hello", alm.match_t.anumber)
False
>>> alm.match("hello", alm.match_t.allowed)
True
>>> alm.match("évaluer", alm.match_t.allowed)
False
>>> alm.match("13", alm.match_t.allowed)
True
>>> alm.match("Hello-World", alm.match_t.allowed)
True
>>> alm.match("Hello", alm.match_t.math)
False
>>> alm.match("+", alm.match_t.math)
True
>>> alm.match("=", alm.match_t.math)
True
>>> alm.match("Hello", alm.match_t.upper)
True
>>> alm.match("hello", alm.match_t.upper)
False
>>> alm.match("hellO", alm.match_t.upper)
False
>>> alm.match("a", alm.match_t.punct)
False
>>> alm.match(",", alm.match_t.punct)
True
>>> alm.match(" ", alm.match_t.space)
True
>>> alm.match("a", alm.match_t.space)
False
>>> alm.match("a", alm.match_t.special)
False
>>> alm.match("±", alm.match_t.special)
True
>>> alm.match("[", alm.match_t.isolation)
True
>>> alm.match("a", alm.match_t.isolation)
False
Methods:
- delInText - Method for delete letter in text
Example:
>>> import alm
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.delInText("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор??? с лязгом выкатился и остановился возле мальчика....", alm.wdel_t.punct)
'неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор с лязгом выкатился и остановился возле мальчика'
>>> alm.delInText("hello-world-hello-world", alm.wdel_t.hyphen)
'helloworldhelloworld'
>>> alm.delInText("неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор??? с лязгом выкатился и остановился возле мальчика....", alm.wdel_t.broken)
'неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор с лязгом выкатился и остановился возле мальчика'
>>> alm.delInText("«On nous dit qu'aujourd'hui c'est le cas, encore faudra-t-il l'évaluer» l'astronomie", alm.wdel_t.broken)
'On nous dit quaujourdhui cest le cas encore faudra-t-il lvaluer lastronomie'
Methods:
- countsByFiles - Method for counting the number of n-grams in a text file
Example:
>>> import alm
>>> alm.setOption(alm.options_t.debug)
>>> alm.setOption(alm.options_t.confidence)
>>> alm.setAlphabet("abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюя")
>>> alm.readLM('./lm.arpa')
>>> alm.countsByFiles("./text.txt", "./result.txt", 3)
info: 0 | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
info: 10 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
info: 10 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
info: 0 | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
info: 10 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
Counts 3grams: 471
>>> alm.countsByFiles("./corpus", "./result.txt", 2, "txt")
info: 19 | Так как эти яйца жалко есть а хочется все больше любоваться их можно покрыть лаком даже прозрачным лаком для ногтей
info: 12 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор.с лязгом выкатился и остановился возле мальчика.
info: 12 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор!С лязгом выкатился и остановился возле мальчика.
info: 10 | кукай <unk> <unk> сари кукай <unk> <unk> сари японские каллиграфы я постоянно навещал их тайно от людей
info: 12 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор???С лязгом выкатился и остановился возле мальчика....
info: 12 | Неожиданно из подворотни в Олега ударил яркий прожектор патрульный трактор?С лязгом выкатился и остановился возле мальчика.
info: 27 | Сегодня яичницей никто не завтракал как впрочем и вчера на ближайшем к нам рынке мы ели фруктовый салат со свежевыжатым соком как в старые добрые времена в Бразилии
Counts 2grams: 20270
Description
N-gram size |
Description |
1 |
language model size |
2 |
bigram |
3 |
trigram |