Evaluates the linguistic and structural quality of scientific texts.
Project description
Confopy
=======
Asserts the linguistic and structural quality of scientific texts.
Confopy is a command-line tool that accepts one or multiple PDF documents and prints textual reports.
Currently it only works for German papers.
Name origin: Confopy := Conform + Python
Installation
============
Installation using pypi (preferred)
-----------------------------------
sudo pip install -U Confopy
Launch Confopy with
confopy --help
confopy -r document your_paper.pdf
Manual installation
-------------------
Dependencies:
sudo apt-get install python-pdfminer
sudo pip install -U lxml
sudo pip install numpy==1.6.2
sudo pip install pyyaml nltk==3.0.0
sudo pip install pyenchant==1.6.5
sudo pip install pattern==2.6
Launch Confopy with
python confopy/ --help
python confopy/ -r document your_paper.pdf
Usage
=====
$ confopy -h
usage: confopy [-h] [-l LANGUAGE] [-lx] [-ml] [-o OUTFILE] [-r REPORT] [-rl]
[-ul] [-vl] [-x]
[file [file ...]]
Language and structure checker for scientific documents.
positional arguments:
file Document file to analyze (PDF).
optional arguments:
-h, --help show this help message and exit
-l LANGUAGE, --language LANGUAGE
Language to use for PDF extraction and document
analysis. Default: de
-lx, --latex Tell the specified report to format output as LaTeX
(if supported by the report).
-ml, --metriclist Lists all available metrics by language and exits.
-o OUTFILE, --outfile OUTFILE
File to write the output too. Default: terminal
(stdout).
-r REPORT, --report REPORT
Analyses the given document according to the specified
report.
-rl, --reportlist Lists all available reports by language and exits.
-ul, --rulelist Lists all rules and exits.
-vl, --validate Validates a given XML against the XSD for the Confopy
data model.
-x, --xml Converts the PDF file(s) to Confopy XML (structure
orientated).
Getting a corpus
================
Confopy needs a corpus (collection of language data) to run.
For German (TIGER treebank):
Automated download:
1. Go to
<your python package directory>/confopy/localization/de/corpus\_de/
2. Execute the script
tiger_dl_patch.py
within that folder
Manual download:
1. Go to:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/htmlicense.html
2. Accept the license and download TIGER-XML Release 2.2:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/download/tigercorpus-2.2.xml.tar.gz
3. Unpack the archive into confopy/localization/de/corpus\_de/
4. Run the patch tiger\_release\_aug07.corrected.16012013\_patch.py in the same folder
5. Verify that the generated file is named exactly like in confopy/config.py
Python 3
========
* The package python-pdfminer only works with python 2.4 or newer, but not with python 3
Known Issues and Workarounds
===============================
enchant.errors.DictNotFoundError: Dictionary for language 'de_DE' could not be found
------------------------------------------------------------------------------------
Install the German aspell package. E.g. on Ubuntu 16.04:
```
sudo apt install aspell-de
```
Unicode errors
--------------
* Configure terminal to use unicode!
* For Python devs:
http://docs.python.org/2/howto/unicode.html#the-unicode-type
* Convert the TIGER Treebank file
"tiger_release_aug07.corrected.16012013.xml"
to utf-8 encoding before using Confopy!
=======
Asserts the linguistic and structural quality of scientific texts.
Confopy is a command-line tool that accepts one or multiple PDF documents and prints textual reports.
Currently it only works for German papers.
Name origin: Confopy := Conform + Python
Installation
============
Installation using pypi (preferred)
-----------------------------------
sudo pip install -U Confopy
Launch Confopy with
confopy --help
confopy -r document your_paper.pdf
Manual installation
-------------------
Dependencies:
sudo apt-get install python-pdfminer
sudo pip install -U lxml
sudo pip install numpy==1.6.2
sudo pip install pyyaml nltk==3.0.0
sudo pip install pyenchant==1.6.5
sudo pip install pattern==2.6
Launch Confopy with
python confopy/ --help
python confopy/ -r document your_paper.pdf
Usage
=====
$ confopy -h
usage: confopy [-h] [-l LANGUAGE] [-lx] [-ml] [-o OUTFILE] [-r REPORT] [-rl]
[-ul] [-vl] [-x]
[file [file ...]]
Language and structure checker for scientific documents.
positional arguments:
file Document file to analyze (PDF).
optional arguments:
-h, --help show this help message and exit
-l LANGUAGE, --language LANGUAGE
Language to use for PDF extraction and document
analysis. Default: de
-lx, --latex Tell the specified report to format output as LaTeX
(if supported by the report).
-ml, --metriclist Lists all available metrics by language and exits.
-o OUTFILE, --outfile OUTFILE
File to write the output too. Default: terminal
(stdout).
-r REPORT, --report REPORT
Analyses the given document according to the specified
report.
-rl, --reportlist Lists all available reports by language and exits.
-ul, --rulelist Lists all rules and exits.
-vl, --validate Validates a given XML against the XSD for the Confopy
data model.
-x, --xml Converts the PDF file(s) to Confopy XML (structure
orientated).
Getting a corpus
================
Confopy needs a corpus (collection of language data) to run.
For German (TIGER treebank):
Automated download:
1. Go to
<your python package directory>/confopy/localization/de/corpus\_de/
2. Execute the script
tiger_dl_patch.py
within that folder
Manual download:
1. Go to:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/htmlicense.html
2. Accept the license and download TIGER-XML Release 2.2:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/download/tigercorpus-2.2.xml.tar.gz
3. Unpack the archive into confopy/localization/de/corpus\_de/
4. Run the patch tiger\_release\_aug07.corrected.16012013\_patch.py in the same folder
5. Verify that the generated file is named exactly like in confopy/config.py
Python 3
========
* The package python-pdfminer only works with python 2.4 or newer, but not with python 3
Known Issues and Workarounds
===============================
enchant.errors.DictNotFoundError: Dictionary for language 'de_DE' could not be found
------------------------------------------------------------------------------------
Install the German aspell package. E.g. on Ubuntu 16.04:
```
sudo apt install aspell-de
```
Unicode errors
--------------
* Configure terminal to use unicode!
* For Python devs:
http://docs.python.org/2/howto/unicode.html#the-unicode-type
* Convert the TIGER Treebank file
"tiger_release_aug07.corrected.16012013.xml"
to utf-8 encoding before using Confopy!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Confopy-0.4.11.tar.gz
(48.3 kB
view details)
File details
Details for the file Confopy-0.4.11.tar.gz
.
File metadata
- Download URL: Confopy-0.4.11.tar.gz
- Upload date:
- Size: 48.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5409185b0a5bd89837ed75f2c4378b9d889439878a67d4884644b9e7e2c82072 |
|
MD5 | ead4cbb9facd4ed9ff24ffbf41b6f440 |
|
BLAKE2b-256 | c3036d429315bd3382c89ce4c78b2bf460cf7db98b18b5225532a6107f1b7628 |