No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
Be sure to clone recursivly:
git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.2.tar.gz
(11.3 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.2-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a7d0e1c057a6f47179d267b1df8dfbd8b9d361926aa5a8ede6a79a341a8992f |
|
MD5 | 04afb3b6eaf0c82dc1b8f70646af1d39 |
|
BLAKE2b-256 | 378de529ac1e5e041408c52dbbc568af5ad83f0380229b7e055c6a4f72529c8f |
Close
Hashes for pyautocorpus-0.1.2-pp36-pypy36_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdeb3dd1c3f7a1907d6694a0ba4008585b7bfbda525d9fb10026923191691ad0 |
|
MD5 | 2c8ebea58b4ad94b6b4feda7a4f6214f |
|
BLAKE2b-256 | 23cd7f3ed6726398f12890a0b0374db4af18c3d099ce770468972284722cdff8 |
Close
Hashes for pyautocorpus-0.1.2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4785cb2160c528b7668d174cf8a802a6dccc4dea7ad79f4de0326ba129bdc0e |
|
MD5 | 80dda41ae7a1f961a514e5974fe4d9c2 |
|
BLAKE2b-256 | 05d9a77ab2aaada8aba838ec19c37c9404fbebc374fde6f99fd27f819d75df58 |
Close
Hashes for pyautocorpus-0.1.2-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9590e4ea1df8d36bf9e2e34e8d85037d01198e8f86af7d6c7bcb2f343ded85b0 |
|
MD5 | 5293925cd4fed4c1708884091fd23477 |
|
BLAKE2b-256 | 27dd58e84e1bb024cfe40231a1a27ae762880e289802c38e85d9416880b2061f |
Close
Hashes for pyautocorpus-0.1.2-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7616dc884684fdcc62bd56ab52135068e70dd8745ebf422df28ea809b27f547b |
|
MD5 | 26f7754ca21bb8f97eba1ab91874766f |
|
BLAKE2b-256 | f9b2804b9f1aeeaab612ef4c72291aa1da0e474387c31ef6831b4ccdce3bd333 |
Close
Hashes for pyautocorpus-0.1.2-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a38b14c5668af578b16d6c9972b51536724b58b9b1af546f443bb14e9e7fb8d4 |
|
MD5 | e803b95dbad660817da1bbe673cfdaef |
|
BLAKE2b-256 | 72b57c4abf2ea2a510c7da953343058e3036a3ccf53bc54152059019cec12ecb |
Close
Hashes for pyautocorpus-0.1.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3d32975901e84be94c7e1fdb84ca29fe9aad9961c1d530d0c2452c94336562f |
|
MD5 | e2c68a493a6dc2a7d8b2a89fcf8b9711 |
|
BLAKE2b-256 | 157f9c3a9e34d878419e8c7352b0be1100c02e68827d369f84ad9107f63583b3 |
Close
Hashes for pyautocorpus-0.1.2-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 974435b171f4ad3cdccc92237026278576e158da1521afc650fe3bc31068892f |
|
MD5 | 3894379f5621cdf6b724e9d7fb816418 |
|
BLAKE2b-256 | 7e0219744295feb0a36adf8b83d36d4df4386c876eaf183a153f8578b9cc0e4c |
Close
Hashes for pyautocorpus-0.1.2-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7eaed52ec68353a1a2b43ae5159f0a7ce8005db3974ffb8985c57f36faaf4a9b |
|
MD5 | 757b69fa2b38569623d498483e4e8d35 |
|
BLAKE2b-256 | 0bf5eacc8aed0be93896cf7f31dccd40a01d56a026f33a0d723537a8d12fea70 |
Close
Hashes for pyautocorpus-0.1.2-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68a80197230427a96d90b9233327a51bc208ae1a4e851acdb26b440fff3ed86a |
|
MD5 | d5828e052ac2bbba73668b8595868670 |
|
BLAKE2b-256 | dca8e28db9310d21ca0459ebfd3e01a36bbc665ffb64312feaf6f20e2842ac17 |
Close
Hashes for pyautocorpus-0.1.2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d44a724c085ea165869752cca6a94b213d7d1ff8b93307102326267d260b2e69 |
|
MD5 | ce1c278d6b6bc08703be256b27093544 |
|
BLAKE2b-256 | 4842c42fffd52730accdd5afcfe11f228a8d629ef06c4dce5949a0458cf001dd |
Close
Hashes for pyautocorpus-0.1.2-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83507d7abf87f52b53a34d816c80f124595c03759cec775e4250b96b807b6532 |
|
MD5 | ef3f330a12765b27582bd35dd9f99df3 |
|
BLAKE2b-256 | c7d091c89e734379cc092f768e25cf63abb1aac19d7ff7f6b741f82d47bee762 |
Close
Hashes for pyautocorpus-0.1.2-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f91f2daa726c432df2beb31dba2ed9cf01f68dde2c6a98f06168bc35a6428120 |
|
MD5 | ae257bdbe52ec68aced300491f0c4778 |
|
BLAKE2b-256 | 2d8a3b7158e06df7450792231e03c42d3e15d9a8afc69b579d349c4edb531540 |
Close
Hashes for pyautocorpus-0.1.2-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90a1f554f263989d62a343635c989f7ae5e22f8ce06b55347c3a7ae204747636 |
|
MD5 | fac14f79c09ad2af454829a5cd1361f4 |
|
BLAKE2b-256 | 744d70f1dfbec049c488d42e9bac03ed3635dfc98411f334c4e9a1c4d3b5b199 |
Close
Hashes for pyautocorpus-0.1.2-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94bc5aa05a81c7d7df9602a8094c836f3b295743882d2afdb5d08b0cba053be6 |
|
MD5 | 246cf51db19ab4027262a5f272b9e0a1 |
|
BLAKE2b-256 | 7605efbe0d1f57e53beac01f94779b8f28e14eb70b3795875103a9258aa6666c |
Close
Hashes for pyautocorpus-0.1.2-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | faa923a0e42e3bb526c614656f44ce18cca28a7e13c040f057bb7f14b47cf087 |
|
MD5 | 1df9ba817bb4fb112fc4f7ca33d020a3 |
|
BLAKE2b-256 | 857e14993f213e228425ced8d1b6d8d7913be7c91baa8562fd9fce33f7a42c15 |
Close
Hashes for pyautocorpus-0.1.2-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bbef6b01fcb9e0c2246b2a402b9cfe7c95a6059c6d897a6449e57bed8a1135f |
|
MD5 | 694b6062b2c2b54f8ff8ef717dfb86bc |
|
BLAKE2b-256 | 056d53af067059a20927024524e47c5cdf87118c1725570196b6d7bb5e888fc3 |
Close
Hashes for pyautocorpus-0.1.2-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ddc7bfed25d500b4c294455938a8117ee511f49329bc58bbe62800d2aa4f0a1 |
|
MD5 | 29003cb8373b05f3742a529234d888a2 |
|
BLAKE2b-256 | 6d63be583140f614fc934192968127e0f7867bd53156fd3ded5745e67bdbfd3b |