Evaluates the linguistic and structural quality of scientific texts.
Project description
# Confopy
Asserting the linguistic and structural quality of scientific texts. Written in Python.
Name origin: Confopy := Conform + Python
# Installation
sudo apt-get install python-pdfminer
# lxml==2.3.2
sudo pip install -U lxml
sudo pip install -U numpy
sudo pip install -U pyyaml nltk
sudo pip install -U pyenchant # spell checking
sudo pip install -U pattern
#sudo pip install -U pyparsing # for nltk_contrib
# Install nltk_contrib:
#cd confopy/contrib/nltk_contrib
#python setup.py build
#sudo python setup.py install
# Getting a corpus
Confopy needs a corpus (collection of language data) to run.
For German (TIGER treebank):
1. Go to:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/htmlicense.html
2. Accept the license and download TIGER-XML Release 2.2:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/download/tigercorpus-2.2.xml.tar.gz
3. Unpack the archive into confopy/localization/de/corpus\_de/
4. Run the patch tiger\_release\_aug07.corrected.16012013\_patch.py in the same folder
5. Verify that the generated file is named exactly like in confopy/config.py
# Python 3
* The package python-pdfminer only works with python 2.4 or newer, but not with python 3
# Unicode errors
* Configure terminal to use unicode!
* For Python devs:
http://docs.python.org/2/howto/unicode.html#the-unicode-type
* Convert the TIGER Treebank Version 1 file
"tiger_release_july03.penn"
to utf-8 encoding before using Confopy!
Asserting the linguistic and structural quality of scientific texts. Written in Python.
Name origin: Confopy := Conform + Python
# Installation
sudo apt-get install python-pdfminer
# lxml==2.3.2
sudo pip install -U lxml
sudo pip install -U numpy
sudo pip install -U pyyaml nltk
sudo pip install -U pyenchant # spell checking
sudo pip install -U pattern
#sudo pip install -U pyparsing # for nltk_contrib
# Install nltk_contrib:
#cd confopy/contrib/nltk_contrib
#python setup.py build
#sudo python setup.py install
# Getting a corpus
Confopy needs a corpus (collection of language data) to run.
For German (TIGER treebank):
1. Go to:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/htmlicense.html
2. Accept the license and download TIGER-XML Release 2.2:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/download/tigercorpus-2.2.xml.tar.gz
3. Unpack the archive into confopy/localization/de/corpus\_de/
4. Run the patch tiger\_release\_aug07.corrected.16012013\_patch.py in the same folder
5. Verify that the generated file is named exactly like in confopy/config.py
# Python 3
* The package python-pdfminer only works with python 2.4 or newer, but not with python 3
# Unicode errors
* Configure terminal to use unicode!
* For Python devs:
http://docs.python.org/2/howto/unicode.html#the-unicode-type
* Convert the TIGER Treebank Version 1 file
"tiger_release_july03.penn"
to utf-8 encoding before using Confopy!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Confopy-0.1.4.tar.gz
(35.8 kB
view details)
File details
Details for the file Confopy-0.1.4.tar.gz.
File metadata
- Download URL: Confopy-0.1.4.tar.gz
- Upload date:
- Size: 35.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc6faeb2243d3a0ce7117dfead9ea8664bb8ade456b077b21efc7d139dfa6742
|
|
| MD5 |
8787582c5ad944e4f100948c6353313c
|
|
| BLAKE2b-256 |
befebed0e70fc289bb4df9a171199170db547f7706ebba9d3141706067a7a503
|