Analyzing Russian text to identify and extract borrowed words from other languages. The package uses the pymorphy2 library to normalize words, and it includes a dictionary of borrowed words for comparison.
Project description
russborrow
This is a test version of the russborrow module.
Overview
russborrow is a Python module for analyzing Russian text to identify and extract borrowed words from other languages. The module uses the Pymorphy2 package to normalize words and a dictionary of borrowed words.
Usage
The russborrow module provides various options for analyzing text and extracting borrowed words.
Make sure to import it in your script or Python environment:
import russborrow
Note: If a variable is initialized (e.g., borrowed in the examples), it will store the returned object. If no variable is used, the module will work with provided input and output files. If neither a variable is initialized nor an output file is provided, the code won’t crash, but no output or return will be given.
Option 1: Analyze Text String
Example 1:
borrowed = russborrow.extract("""
Значимость этих проблем настолько очевидна, что синтетическое тестирование
требует анализа экспериментов, поражающих по своей масштабности и грандиозности.
""")
Example 2:
text_to_analyze = """
Значимость этих проблем настолько очевидна, что синтетическое тестирование
требует анализа экспериментов, поражающих по своей масштабности и грандиозности.
"""
borrowed = russborrow.extract(text_to_analyze)
Option 2: Analyze Text File
Example 1, Standard Usage with ~:
file_path = '~/Desktop/text.txt'
borrowed = russborrow.extract(file_path)
Example 2, Windows Path:
file_path = r"C:\Users\Username\Documents\file.txt"
borrowed = russborrow.extract(file_path)
Example 3 (File in the same directory as the code):
borrowed = russborrow.extract('text.txt')
Option 3: Provide Output File Path
Specify an output file path to create a file containing the analysis results. The provided file path must end with either .txt or .csv. If the specified file already exists, it will be overwritten.
Example 3:
output_path = '~/Desktop/newoutput.txt'
borrowed = russborrow.extract(string, output_path)
Returned Object Attributes
The russborrow.extract function returns an object of the Borrowed class with the following attributes:
borrowed.len: Total number of words in the text.
borrowed.bor: Number of borrowed words in the provided text.
borrowed.percent: Percentage of borrowed words in the provided text.
borrowed.dict: A dictionary containing normalized versions of borrowed words as keys with the following values:
value[‘Repeats’]: Count of the word (normalized version) in the text.
value[‘Value’]: Description of the word.
value[‘Origin’]: Language of origin of the borrowed word.
value[‘Instances’]: List of all borrowed words before normalization found in the original text that have the normalized version as the key.
Note: Object attributes have no setters.
Used Resources
Pymorphy2
Pymorphy2 is a Python package for morphological analysis and inflection. It is used in the russborrow module to normalize words for comparrison with dictionary.
Borrowed Words Dictionary
The dictionary used for identifying borrowed words is sourced from Wiktionary. It is stored in the borrowed_dictionary.csv file within the russborrow module. The dictionary format includes the following columns:
Key: Borrowed word
Value: Description of the word
Origin: Language of origin
Example entry in the dictionary:
гламур, — glamer, от gramarye «магия, заклинание», Из гэльского (шотландского)
Exclusion Note: The word “они”” — 鬼 «демон» (демоны-людоеды, умеющие обращаться в людей) has been intentionally excluded from the dictionary for the following reasons
Pymorphy and its resources recognize the word “они” solely as a pronoun (местоимение) without a noun form (существительное).
Retaining “они” in the dictionary leads the program to classify the highly common pronoun “они” as a borrowed word.
According to Wiktionary, the usage of “они” as a borrowed word is infrequent. If your text focuses on Japanese folklore or demons, it is advisable to manually verify the output for accuracy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file russborrow-0.0.9a1.tar.gz.
File metadata
- Download URL: russborrow-0.0.9a1.tar.gz
- Upload date:
- Size: 153.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbff845526cfc6cfe2cad389f414992beda1c7cf376cacf0b0566bef782d6a2d
|
|
| MD5 |
6086eeb70a5b9937b200221c86031a4a
|
|
| BLAKE2b-256 |
16be77916b239cbbec8abc802c4b53ab00386b4596b2875f5f11e5a2c858c333
|
File details
Details for the file russborrow-0.0.9a1-py3-none-any.whl.
File metadata
- Download URL: russborrow-0.0.9a1-py3-none-any.whl
- Upload date:
- Size: 155.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c58a631d65508374c33b6ebc7238f41b7133dbb258ccc2fd9a99d3e0f041e556
|
|
| MD5 |
c443b738f514105523fcd6c58f2d8d72
|
|
| BLAKE2b-256 |
a93d24ad86c2495b8fb15c93dd454de21f7f0ec5433fb3c713d57423abf2e4a1
|