Code Switch is a NLP tool can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
Project description
Code Switch
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
Supported Code-Mixed Language
We used LinCE dataset for training multilingual BERT model using huggingface transformers. LinCE
has four language mixed data. We took three of it spanish-english
, hindi-english
and nepali-english
. Hope we will train and add other language and task too.
- Spanish-English(spa-eng)
- Hindi-English(hin-eng)
- Nepali-English(nep-eng)
Language Code
spa-eng
for spanish-englishhin-eng
for hindi-englishnep-eng
for nepali-english
Installation
pip install codeswitch
Dependency
- pytorch >=1.6.0
Training Details
- All three(lid, ner, pos) sequence tagging model was trainend with huggingface token classification
- Sentiment Analysis Model trained with huggingface text classification
- You can find every model and evaluation results here
Features & Supported Language
- Language Identification
- spanish-english
- hindi-english
- nepali-english
- POS
- spanish-english
- hindi-english
- NER
- spanish-english
- hindi-english
- Sentiment Analysis
- spanish-english
Language Identification
from codeswitch.codeswitch import LanguageIdentification
lid = LanguageIdentification('spa-eng')
# for hindi-english use 'hin-eng',
# for nepali-english use 'nep-eng'
text = "" # your code-mixed sentence
result = lid.identify(text)
print(result)
POS Tagging
from codeswitch.codeswitch import POS
pos = POS('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence
result = pos.tag(text)
print(result)
NER Tagging
from codeswitch.codeswitch import NER
ner = NER('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence
result = ner.tag(text)
print(result)
Sentiment Analysis
from codeswitch.codeswitch import SentimentAnalysis
sa = SentimentAnalysis('spa-eng')
sentence = "El perro le ladraba a La Gatita .. .. lol #teamlagatita en las playas de Key Biscayne este Memorial day"
result = sa.analyze(sentence)
print(result)
# [{'label': 'LABEL_1', 'score': 0.9587041735649109}]
Acknowledgement
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
codeswitch-1.1.tar.gz
(4.2 kB
view details)
Built Distribution
File details
Details for the file codeswitch-1.1.tar.gz
.
File metadata
- Download URL: codeswitch-1.1.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 142f2dca14af151f0204adf3eb9e30d310fb954dd2280e422bf1381c661a06eb |
|
MD5 | e8fba48a823ddd2f1099e50f2f0b86e9 |
|
BLAKE2b-256 | ed146c1c61a4f09dee52945bcc3cd426cb0653cdd255b161a37e9413fce4cb0f |
File details
Details for the file codeswitch-1.1-py3-none-any.whl
.
File metadata
- Download URL: codeswitch-1.1-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bb98f5829041cc8106bef66d2003ed6278b48b7070de822573a0122648a6e83 |
|
MD5 | 018bfa0a1e1d11f2b4070bb88f3fdabc |
|
BLAKE2b-256 | 470fc75df3b85d0464b5a38cb37661cc63e11635d5ffb590d0b4d44ff36b68c4 |