Code Switch is a NLP tool can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
Project description
Code Switch
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
Supported Code-Mixed Language
We used LinCE dataset for training multilingual BERT model using huggingface transformers. LinCE
has four language mixed data. We took three of it spanish-english
, hindi-english
and nepali-english
. Hope we will train and add other language and task too.
- Spanish-English(spa-eng)
- Hindi-English(hin-eng)
- Nepali-English(nep-eng)
Language Code
spa-eng
for spanish-englishhin-eng
for hindi-englishnep-eng
for nepali-english
Installation
pip install codeswitch
Dependency
- pytorch >=1.6.0
Training Details
- All three(lid, ner, pos) sequence tagging model was trainend with huggingface token classification
- Sentiment Analysis Model trained with huggingface text classification
- You can find every model and evaluation results here
Features & Supported Language
- Language Identification
- spanish-english
- hindi-english
- nepali-english
- POS
- spanish-english
- hindi-english
- NER
- spanish-english
- hindi-english
- Sentiment Analysis
- spanish-english
Language Identification
from codeswitch.codeswitch import LanguageIdentification
lid = LanguageIdentification('spa-eng')
# for hindi-english use 'hin-eng',
# for nepali-english use 'nep-eng'
text = "" # your code-mixed sentence
result = lid.identify(text)
print(result)
POS Tagging
from codeswitch.codeswitch import POS
pos = POS('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence
result = pos.tag(text)
print(result)
NER Tagging
from codeswitch.codeswitch import NER
ner = NER('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence
result = ner.tag(text)
print(result)
Sentiment Analysis
from codeswitch.codeswitch import SentimentAnalysis
sa = SentimentAnalysis('spa-eng')
sentence = "El perro le ladraba a La Gatita .. .. lol #teamlagatita en las playas de Key Biscayne este Memorial day"
result = sa.analyze(sentence)
print(result)
# [{'label': 'LABEL_1', 'score': 0.9587041735649109}]
Acknowledgement
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file codeswitch-1.1.tar.gz
.
File metadata
- Download URL: codeswitch-1.1.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
142f2dca14af151f0204adf3eb9e30d310fb954dd2280e422bf1381c661a06eb
|
|
MD5 |
e8fba48a823ddd2f1099e50f2f0b86e9
|
|
BLAKE2b-256 |
ed146c1c61a4f09dee52945bcc3cd426cb0653cdd255b161a37e9413fce4cb0f
|
File details
Details for the file codeswitch-1.1-py3-none-any.whl
.
File metadata
- Download URL: codeswitch-1.1-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
4bb98f5829041cc8106bef66d2003ed6278b48b7070de822573a0122648a6e83
|
|
MD5 |
018bfa0a1e1d11f2b4070bb88f3fdabc
|
|
BLAKE2b-256 |
470fc75df3b85d0464b5a38cb37661cc63e11635d5ffb590d0b4d44ff36b68c4
|