No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
Be sure to clone recursively:
git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.6.tar.gz
(11.3 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.6-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1478b7966a694bcb0427ee23c7ef8948adea03b68feaa073164a800570b3f89c |
|
MD5 | bf0b61901bb18dbe9bab4a835874884c |
|
BLAKE2b-256 | b09b088ac9c8b085a11323218cbfdab0903cbe847955741f41d2e97ae28eaf0d |
Close
Hashes for pyautocorpus-0.1.6-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94845606fb666fbd6744cd9ab0721fadffb428bbc0c279f82ddc97587b45ead1 |
|
MD5 | e7d046cc1e64d90172fe704b0b348f14 |
|
BLAKE2b-256 | 2ef6e049264d443a76e26220bd7b760d92095c40d3c12d2c12b611cc95df07f9 |
Close
Hashes for pyautocorpus-0.1.6-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f6be9dbdf05c0edf45dae2368b8defb3e524be09843646547a16d8a5949d7ec |
|
MD5 | 51d6c1d0bbba3a67ec5b773e6d2f618a |
|
BLAKE2b-256 | 5606de35d23be00013133dc5a62e9e8c873f1867f5cc8502a30053e881f10178 |
Close
Hashes for pyautocorpus-0.1.6-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f32b1efb3fd6db911adb1ac97bc2414c82a46c401b9271d45973b777b78a968 |
|
MD5 | 0fc955d85ef4e06123bfede062cebbd0 |
|
BLAKE2b-256 | 675fcf9c15bff502848b2dc2d7f19ae53a2a6c9153b2ac3988620391e7e7e629 |
Close
Hashes for pyautocorpus-0.1.6-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73f248bcd0b3b3a728cccb8852b6f77956006b4123bd3d077c94964c217ed98b |
|
MD5 | b4acc5a9f5efa06a1b86055cb32cc6ce |
|
BLAKE2b-256 | 68652804619f9bbeea2e9281638e7273ec90632779bf2c620f78de8b97f9006b |
Close
Hashes for pyautocorpus-0.1.6-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2ac75e3d06b118aa536fa4e839ae984788a2bcf642063a2e681cdbd0a165448 |
|
MD5 | 95dd4543ce59fe28768fb22c60920582 |
|
BLAKE2b-256 | a66f942e6ba387cc729595b741427486dc5b6e8350fb028c064ab60be1efd09c |
Close
Hashes for pyautocorpus-0.1.6-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c66996e1b9fb58929cef995617061d2e1275b6da33e99b607919e8a2900a646 |
|
MD5 | 12193a1bb7158e3610da2a0ae2674404 |
|
BLAKE2b-256 | 1a5012e5a01b0a2deaefa52041d8171e33126696c264e7c853cbf8bf288df328 |
Close
Hashes for pyautocorpus-0.1.6-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b78b49eb0a40555627a43e7c43a5ce7768d64275fad287ef01ae80aab822193 |
|
MD5 | 49f8e21b9ce8dc9edfbb50281736eb92 |
|
BLAKE2b-256 | 4d02cafc1c021ec29bbbcaa0380f454b4ffeee2cb3ac7c9aba9d6d63ad177730 |
Close
Hashes for pyautocorpus-0.1.6-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d976a54eb7c1f9015d200ff73f09215b03ad5cdb59fee8930ce11567944ee452 |
|
MD5 | b15f368919101f6febfc4ba5d76bb326 |
|
BLAKE2b-256 | cf7ea524c395c7663073ae618bc30a2233e033f87cdec423baac6ebd67c79b9f |
Close
Hashes for pyautocorpus-0.1.6-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09e4b51701ff2f15feb9f7c1321ba44299964d9cea7892638d97d3893309be6b |
|
MD5 | e920d49055f11c24c72c953fe9cd8f46 |
|
BLAKE2b-256 | b574b0f5c759f5a387994bacb8d1b0c4f2fa8ab67ba1493d2904d27909878a63 |
Close
Hashes for pyautocorpus-0.1.6-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdd9aa51fc158f534b800e2635f820fabb880ea54d668026abf0e2f8250f65ac |
|
MD5 | 839b899d26acb76ccb3ce6e2da98034b |
|
BLAKE2b-256 | ade31fbee58522b30cb2c8a39d128ed40629e4f16d696b2a344101e890ab160a |
Close
Hashes for pyautocorpus-0.1.6-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9cfd823e4734ca6e8ea4add84d9ded496bdbedfa72032ffe3931d0e63d717beb |
|
MD5 | 2b43c0813ba3a6e7e8a8be0647406161 |
|
BLAKE2b-256 | cb0eb31debe89c5d265924afaffa0d8d60db2dadaa166c109a1faa357112eb2d |
Close
Hashes for pyautocorpus-0.1.6-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf1796c62910e69015aa14723a5b9a7be582994e3db4381476df343ff6b39a36 |
|
MD5 | 285d330154ba066545cfe2b84a0fee05 |
|
BLAKE2b-256 | e79de1950c83dcf2d1e52b9136a47e11082cfa89b465da6af6d729b32b6356d5 |
Close
Hashes for pyautocorpus-0.1.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01f18e1223491258605d2371c50c6fa6827fce4a3c14379e01d401d5d07647e6 |
|
MD5 | acaf10c45d7347d37f852952091be834 |
|
BLAKE2b-256 | 3409066032d29bf8afdd793d1a6938b423568ae3c89edae49e0c38c1f331fd1e |
Close
Hashes for pyautocorpus-0.1.6-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9dbba5254733c22177b3035279c8bc2554eadb8880a0fc6ed3991c00fadf222 |
|
MD5 | 022c367ed905dedf8ff5372f1c8c44e3 |
|
BLAKE2b-256 | 51edb59451c7836ed0d65f6fb262f7594f5235ab88cdb8e60ae999600f801c8c |
Close
Hashes for pyautocorpus-0.1.6-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c075825b1b0af72f00462b3c73bc8a8b6e6689f65750052fcc395e9cb14a9eaa |
|
MD5 | 85d50fd1a6a8b3201dba2fd1096116f8 |
|
BLAKE2b-256 | a384849a14b562f3bd7031d8d9f6aa151df00c5eff5b75777481d09d760caac4 |
Close
Hashes for pyautocorpus-0.1.6-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21228bb1ecdf156faf9f8d5058f8e91ebd27dfa1fdc884f408d14341336b947b |
|
MD5 | 3e186b1ed4e1f4f7b8d6ddd8a83cb94a |
|
BLAKE2b-256 | 507cdc0e4410213bdbc507dbcaa7c12fabe9d4b24fc611f0be9747787d05b4e1 |
Close
Hashes for pyautocorpus-0.1.6-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d37b18281f6124ef3780757b532a86879be9156a2d072bce60ee3a22d28aae84 |
|
MD5 | 17db7fff77571640ff1fed267eb4cf9c |
|
BLAKE2b-256 | 593beb60cc0cfe6c9011636ea57002d88533e8ff1d7df986730b6ee7c35bd27b |
Close
Hashes for pyautocorpus-0.1.6-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5ff2ce8171136784ab0664be9f4be895529e0df1a15018db24c1c59c63dfc57 |
|
MD5 | 9ea06e2c4012f12e1b15f96c63566179 |
|
BLAKE2b-256 | 51b19f337614b9aaee3bb728cc50fc1940670c0b5c1440b4b522da8b78965ef7 |
Close
Hashes for pyautocorpus-0.1.6-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c3a6cba1fb409e62e1d4f9dbb92063730e3f3e776aad81c01fc2193fe1e8c3a |
|
MD5 | 82162585065044209e408489bd88d39b |
|
BLAKE2b-256 | 0b9299e290e8cdf9a3fec05c0eb1465facfe6049d7560837f535876b89f6f2f2 |