No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.textify(x)
Installing
From pypi:
Coming soon
From source:
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
pyautocorpus.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.0.1.tar.gz
(7.8 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.0.1-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4263c78ff9b98a15a6ac8c8bae2cb75aae2cf96dcd332e1f34ab7432de75782 |
|
MD5 | 48a59cd6ba672ef2f9f97a61cd6b0b12 |
|
BLAKE2b-256 | 3472277fa852bd4c094023c067e96cbe6cd59592e5319e6e2dca31565c70e0ec |
Close
Hashes for pyautocorpus-0.0.1-pp36-pypy36_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf8937e97e36fb0c53611a045e02e6074fbca9ff1c1e09b88c6aad3f8468ba83 |
|
MD5 | 932325e44da2437ed5be8b69a147adcf |
|
BLAKE2b-256 | 9c2bc9ae564dfb99a79bd4807a03294a4c86ddc549e5f88ff4eace2696898282 |
Close
Hashes for pyautocorpus-0.0.1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb5391d1b090926676b01363a78d3d1dcc6110559c165a2cf57931de54a44586 |
|
MD5 | ffb2dc16209d94c0a735aed649e5e115 |
|
BLAKE2b-256 | 82f2c77c170c742ab8ddbf0219974dd3707f98c3b3ac74e685f429964a2c4996 |
Close
Hashes for pyautocorpus-0.0.1-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb11acd4dd69a8bfe454d6ec30c0fe28ffe5fc588aee3ce7e4810eaf77d9e508 |
|
MD5 | e585bfa874c62aea113c54cbe60011aa |
|
BLAKE2b-256 | 157b998d1b5508f7261d0c090e915a76aa2f36aa70f517f18de58510cad595e2 |
Close
Hashes for pyautocorpus-0.0.1-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8009a5a29d152a5c0642b0d46c7b61a5362c5903a9d73ec018c235ac948b6e7e |
|
MD5 | f6797afd47f3d75766ce0a36ce8736a5 |
|
BLAKE2b-256 | dec1ed6a8936fc9aa6cac77a83345ca75a8d8a61c0b37bdc8c8729eb9ecfed33 |
Close
Hashes for pyautocorpus-0.0.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b84e1ec4623732002001787872d75b9fa2b8e2c5c97973f67c32f30ae8a38466 |
|
MD5 | f1c43f0c67798d88c1551444e38a43a8 |
|
BLAKE2b-256 | 0cbd85d1a6e020f14a114c49f428b148900f3b221a827248618f27e79093c197 |
Close
Hashes for pyautocorpus-0.0.1-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 190076263eb85f4dc5b89aaa50a635c6f8f91e7594555f94b67c0e147f4d2445 |
|
MD5 | f5ddaefc4a557a14ac294cac636a2063 |
|
BLAKE2b-256 | fa691dce554e93a3e7b2b043bbe9fbbdc0700dbba03d95a3b7faff1e65a353b1 |
Close
Hashes for pyautocorpus-0.0.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2916ac1ad023121dcadfd7c9cf79af32b8e35f3287e29aa61c4158d1b627aaa7 |
|
MD5 | 31a3a564d8a394f392824650095600ef |
|
BLAKE2b-256 | fde9edd98c26b26ea66b4563cc4db90529906f2c610f71f80957228cbed8d3f0 |
Close
Hashes for pyautocorpus-0.0.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53f0e9f84ddd2fd7e4472bb5c0606bc2f3614324c08258e8414ff0634e0249a1 |
|
MD5 | dcf15dba8c3c6c39129ec62f2f2e943e |
|
BLAKE2b-256 | 50fc7a1752ac54f03e2290ad5b07a6463d6da1efd65c1ee9e1a2caa1c0dd3aec |
Close
Hashes for pyautocorpus-0.0.1-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b33b3800185f58c4fd2ffdad4cffa2c366aa54d83f8f988b829e0ef5eb1c427 |
|
MD5 | 6db2e2b7c33027d8306a0611be98acf9 |
|
BLAKE2b-256 | 970c935d3519df3f12088f654b7b1d54985afbae5bebe73dc7830de40d262df3 |
Close
Hashes for pyautocorpus-0.0.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bca31c702ba52272aadd6064851da22ae27942762d7ad6cd1aa6006fb8a60c6 |
|
MD5 | a23507b6c18cd5dc359bad54d46d0adc |
|
BLAKE2b-256 | 6746df02cb45665245d118c49b7e1c419d876ad2afe4e14f15e979c70d9c1134 |
Close
Hashes for pyautocorpus-0.0.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a856e264a7e6bf5592de785ed48932603e4d0639f571258324e87d86527d08d7 |
|
MD5 | 19f5513327b6ed82d67b36542986a5cd |
|
BLAKE2b-256 | 1e3ad6087705a7c4fecbdcd9023f279d42ebe493d371b89e93c72b523d968212 |
Close
Hashes for pyautocorpus-0.0.1-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d392516e46977757056dbcfbc08f6dc070b50e65d6a48ee05aa592810899063 |
|
MD5 | d361098e0b94d5928e4e7448e152b2b0 |
|
BLAKE2b-256 | cda5abf0f204a3c53536cf8aa41fe27b1d330b8b41975f401409a3228eff8305 |
Close
Hashes for pyautocorpus-0.0.1-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 007ef6ad54d01a635666e3ba3c6730fd2a42b6d7b7cdb36f9691dd367ee06720 |
|
MD5 | 76f559463f70ed7dde429fed82a54f59 |
|
BLAKE2b-256 | 4476f03db662b6b217532876d4db0fc5e61c70ee0d9c7c522bea13c94282f1e8 |