No project description provided
Reason this release was yanked:
incomplete
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
Be sure to clone recursively:
git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.4.tar.gz
(11.3 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fec77c713bec5b3db38f8e59a61725c7b67b59859b4f4ed32cd73b548767f705 |
|
MD5 | 19e63fa36a786003fce570799805f75b |
|
BLAKE2b-256 | e24540d5cac7ce405d1b9b6c6aa6eee4c7076f240c6c0b137ce1a9b51b04b022 |
Close
Hashes for pyautocorpus-0.1.4-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15448f5cb89daf4abcaaab9cf058e9209a87c30d23353cf3285417f0eb93207a |
|
MD5 | d87b62cde1574328151a838240410a97 |
|
BLAKE2b-256 | aa21fef8700d3f089523739b56aad69e5eba20c62204272f43a9e1768ddaf2c4 |
Close
Hashes for pyautocorpus-0.1.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3ff22460aef502664068575603cc6636061d9ac998fc4d6adaf8ce60c6f527e |
|
MD5 | b1dc7f0129649d1e25a3cc452330922f |
|
BLAKE2b-256 | a214dbb26487958e8807be9e253cbf1d44efdcb7649b775d619439d17255638e |
Close
Hashes for pyautocorpus-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7904f62ad42df781c7e070b3c57b49d970373afd6059b72b953cda4e1130f6ad |
|
MD5 | 460635d56b58d09fec8612cecd818c9e |
|
BLAKE2b-256 | 560a407b78289392f4a4c580e5c09df8c7d3675faa4d3e18bbf3eb4adb04aa2c |
Close
Hashes for pyautocorpus-0.1.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd3317c1bcd6fc6246e531f5ee5c7aea62a45eb95314ab2e8d0b1cf855318c6b |
|
MD5 | 800bf37645ca5267f0539282e4a808f4 |
|
BLAKE2b-256 | 527e8464e19aea769a0f79d30cb1aac283c8f53303d7143adcd3363744944723 |
Close
Hashes for pyautocorpus-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76000d3db1fb8c04ef2f73014442f4505b44cf452d85dfcc1d97f91303b170b6 |
|
MD5 | a4b131fab9f996292d3b1ffecace1cfd |
|
BLAKE2b-256 | e8fd16cd8fc386008095a22641f1827236604d189a8f51c8b9b051e6770bea9d |
Close
Hashes for pyautocorpus-0.1.4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3f0f82ec48de0254cbddd054b5fa3e6e35c314d5d99c5b84f9e5589bd9ab2ba |
|
MD5 | 4180ad409513e1f9c85075b6343ed186 |
|
BLAKE2b-256 | b9b72fd836f79e6f209e67f1ad3b97094701bc29bd623b737840532c285cf43d |
Close
Hashes for pyautocorpus-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eabdbd29a0aace4bbb0784d1fc9aa6429bd90054370c7bb85097e2c9f4a02b45 |
|
MD5 | dc84e5e01ebbf8670469c19fceb866a4 |
|
BLAKE2b-256 | 01d63865754de16e4d86e030e4fbceb90dac37ee694c0236bada545c472c0e56 |