A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
Reason this release was yanked:
Installation issues fixed in v1.0.1.
Project description
Introduction
CAMeL Tools is suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
Please use GitHub Issues to report a bug or if you need help using CAMeL Tools.
Installation
You will need Python 3.6 and above (64-bit).
Linux/macOS
Install using pip
pip install camel-tools
# or run the following if you already have camel_tools installed
pip install --upgrade --force-reinstall camel-tools
Install from source
# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools
# Install from source
pip install .
# or run the following if you already have camel_tools installed
pip install --upgrade --force-reinstall .
Installing data
First, download either the Full data zip or the Light data zip (see Datasets for a comparison).
Unzip the file and then move and rename the unzipped directory to
~/.camel_tools
. If installed correctly, there should be a direct path to
~/.camel_tools/data
.
Alternatively, if you would like to install the data in a different location,
you need to set the CAMELTOOLS_DATA
environment variable to the desired
path.
Add the following to your .bashrc
, .zshrc
, .profile
,
etc:
export CAMELTOOLS_DATA=/path/to/camel_tools_data
Again, data
should be a subdirectory of the path set in
CAMELTOOLS_DATA
.
Windows
Note: CAMeL Tools has been tested on Windows 10. The Dialect Identification component is not available on Windows at this time.
Install using pip
pip install camel-tools -f https://download.pytorch.org/whl/torch_stable.html
# or run the following if you already have camel_tools installed
pip install --upgrade --force-reinstall -f https://download.pytorch.org/whl/torch_stable.html camel-tools
Install from source
# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools
# Install from source
pip install -f https://download.pytorch.org/whl/torch_stable.html .
pip install --upgrade --force-reinstall -f https://download.pytorch.org/whl/torch_stable.html .
Installing data
First, download either the Full data zip or the Light data zip (see Datasets for a comparison).
Unzip the file and then move and rename the unzipped directory to
C:\Users\your_user_name\AppData\Roaming\camel_tools
.
If installed correctly, there should be a direct path to
C:\Users\your_user_name\AppData\Roaming\camel_tools\data
.
Alternatively, if you would like to install the data in a different location,
you need to set the CAMELTOOLS_DATA
environment variable to the desired
path. Below are the instructions to do so (on Windows 10):
Press the Windows button and type
env
.Click on Edit the system environment variables (Control panel).
Click on the Environment Variables… button.
Click on the New… button under the User variables panel.
Type
CAMELTOOLS_DATA
in the Variable name input box and the desired data path in Variable value. Alternatively, you can browse for the data directory by clicking on the Browse Directory… button.Click OK on all the opened windows.
Again, data
should be a subdirectory of the path set in
CAMELTOOLS_DATA
.
Datasets
We provide two data distributions for use with CAMeL Tools: Full and Light.
While the Full archive provides data for all components in CAMeL Tools, the Light archive contains data for use with the morphological analyzer, the MLE Disambiguator, and any other components that depend on them only.
Below is a table comparing the feature set included in each release.
Full |
Light |
|
---|---|---|
Size |
1.8 GB |
19 MB |
Morphology |
✓ |
✓ |
Disambiguation |
✓ |
✓ |
Taggers |
✓ |
✓ |
Tokenization |
✓ |
✓ |
Dialect Identification |
✓ |
|
Sentiment Analysis |
✓ |
|
Named Entity Recognition |
✓ |
Documentation
You can find the full online documentation here for both the command-line tools and the Python API.
Alternatively, you can build your own local copy of the documentation as follows:
# Install dependencies
pip install sphinx recommonmark sphinx-rtd-theme
# Go to docs subdirectory
cd docs
# Build HTML docs
make html
This should compile all the HTML documentation in to docs/build/html.
Citation
If you find CAMeL Tools useful in your research, please cite our paper:
@inproceedings{obeid-etal-2020-camel,
title = "{CAM}e{L} Tools: An Open Source Python Toolkit for {A}rabic Natural Language Processing",
author = "Obeid, Ossama and
Zalmout, Nasser and
Khalifa, Salam and
Taji, Dima and
Oudah, Mai and
Alhafni, Bashar and
Inoue, Go and
Eryani, Fadhl and
Erdmann, Alexander and
Habash, Nizar",
booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://www.aclweb.org/anthology/2020.lrec-1.868",
pages = "7022--7032",
abstract = "We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.",
language = "English",
ISBN = "979-10-95546-34-4",
}
License
CAMeL Tools is available under the MIT license. See the LICENSE file for more info.
Contribute
If you would like to contribute to CAMeL Tools, please read the CONTRIBUTE.rst file.
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.