Corpus library
Project description
Corpus ============
Video Lectures
============
[<img src="https://github.com/StarlangSoftware/Corpus/blob/master/video.jpg" width="50%">](https://youtu.be/xTrdKY5uI08)
For Developers
============
You can also see [Python](https://github.com/starlangsoftware/Corpus-Py), [C](https://github.com/starlangsoftware/Corpus-C), [Java](https://github.com/starlangsoftware/Corpus), [C++](https://github.com/starlangsoftware/Corpus-CPP), [Swift](https://github.com/starlangsoftware/Corpus-Swift), [Js](https://github.com/starlangsoftware/Corpus-Js), or [C#](https://github.com/starlangsoftware/Corpus-CS) repository.
## Requirements
* [Python 3.7 or higher](#python)
* [Git](#git)
### Python
To check if you have a compatible version of Python installed, use the following command:
python -V
You can find the latest version of Python [here](https://www.python.org/downloads/).
### Git
Install the [latest version of Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
## Pip Install
pip3 install NlpToolkit-Corpus-Cy
## Download Code
In order to work on code, create a fork from GitHub page.
Use Git for cloning the code to your local or below line for Ubuntu:
git clone <your-fork-git-link>
A directory called Corpus will be created. Or you can use below link for exploring the code:
git clone https://github.com/olcaytaner/Corpus-Cy.git
## Open project with Pycharm IDE
Steps for opening the cloned project:
* Start IDE
* Select **File | Open** from main menu
* Choose `Corpus-Cy` file
* Select open as project option
* Couple of seconds, dependencies will be downloaded.
Detailed Description
============
+ [Corpus](#corpus)
+ [TurkishSplitter](#turkishsplitter)
## Corpus
To store a corpus in memory
a = Corpus("derlem.txt")
If this corpus is split with dots but not in sentences
Corpus(self, fileName=None, splitterOrChecker=None)
The number of sentences in the corpus
sentenceCount(self) -> int
To get ith sentence in the corpus
getSentence(self, index: int) -> Sentence
## TurkishSplitter
TurkishSplitter class is used to split the text into sentences in accordance with the . rules of Turkish.
split(self, line: str) -> list
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlptoolkit_corpus_cy-1.0.24.tar.gz
(798.2 kB
view details)
File details
Details for the file nlptoolkit_corpus_cy-1.0.24.tar.gz.
File metadata
- Download URL: nlptoolkit_corpus_cy-1.0.24.tar.gz
- Upload date:
- Size: 798.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0978d315acb5b59fda3edeee73db906f503302e453a27423742a41ef945cac7d
|
|
| MD5 |
4dafe46d2f28b65d2a627d9ae00a3055
|
|
| BLAKE2b-256 |
39ac921f37bfc4aac8c80e59aa242ae883d5db875ef5f1aee99aecbb835a8473
|