No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
Be sure to clone recursively:
git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.9.tar.gz
(10.8 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.9-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d473e1782cebeecb9dd2f603500e65b957d238f855f36e748dc98fbbfd8bfdec |
|
MD5 | 8b0c765baacaec4ffc6d671484828741 |
|
BLAKE2b-256 | 66c5d5f6b1f4c040b6035eec0890de5d8362b22c3b1d7368453498bef822d919 |
Close
Hashes for pyautocorpus-0.1.9-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b02ebf9d7721aaa4f0e6233f73142df8a4fa42aeb8cfed556be11fa1e0a04f4b |
|
MD5 | 2e65fdb276a0f3d480ffc242961b6c2d |
|
BLAKE2b-256 | d55a8dfc908427acaac64ee2dee822b69c22483a36a7455a9d34fcb0e838bf69 |
Close
Hashes for pyautocorpus-0.1.9-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0af8d56d26d1c2b30ceac41fc2ab8f1c1a7307d3dc00d386646269a0a4b118d |
|
MD5 | afcb69c0dafdba0468f85ba8e0b9d43b |
|
BLAKE2b-256 | 10f55c3b2ee4d12c435b5faece1a4dc46bc445435cc528f293246847e69f6d16 |
Close
Hashes for pyautocorpus-0.1.9-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b02af8390f03f99051470158696c37eac956ce963a7d677c740c4a545a279fab |
|
MD5 | 6a2df839c35dc554e706b505ff18abf3 |
|
BLAKE2b-256 | 463875e5b4c058c65d92dc942c4e2bc77688e311e43361ddff2f0cca19015cb7 |
Close
Hashes for pyautocorpus-0.1.9-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e54b788e91d9a7ae3510fe6b00aa8c6495d7fdfa6f0415658d5781ea1cc1334 |
|
MD5 | 46b3e727cd30f3cc6420ec9277616a62 |
|
BLAKE2b-256 | 551775ff42225fedc6d6796e2774c3552aa960117fa206677f9fcbfd83aa7ef4 |
Close
Hashes for pyautocorpus-0.1.9-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7073f0c323a1cd66dcad34d7ce929e6b0228898ce261a305fce9f71636780926 |
|
MD5 | c14aae0c0d63bd2b7f5500b5ff792b65 |
|
BLAKE2b-256 | 10d8e4c411bf621f7e368ede925c210c52c7af5681f8c3dedc6a1ceb0ecaae65 |
Close
Hashes for pyautocorpus-0.1.9-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 440c9e96dea7287cdc74c3ce2ba06f61cfe71aba1b933a47e55f4843b04df4cb |
|
MD5 | 39ee15be90a628c097e6943bba564e4b |
|
BLAKE2b-256 | ed842411f61b574edf6ff70d2fe753bc718cca550b6285471339662b6f11d7ad |
Close
Hashes for pyautocorpus-0.1.9-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3f9f6ff767e1876e3b7a283454c06d3d843edcd9ecdf86bb200c089759acd65 |
|
MD5 | 3761e3adba66722fafb0f73f25af5306 |
|
BLAKE2b-256 | 9798b383f3f399d93c6013df1a1233e3350f29f43dbd3780d611d81fb1e80534 |
Close
Hashes for pyautocorpus-0.1.9-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e17dd75eb6bf79750afae014bfa3c68afdd129a129b5e38011d68a52cb65524d |
|
MD5 | d16ac08674eea132c98b2d99f08dd0ce |
|
BLAKE2b-256 | 863f7a42ab44f8282c08a12f72b06aa55cf9585c219bfb6331dec4d90d82ad8e |
Close
Hashes for pyautocorpus-0.1.9-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa1188661ed9c141e7c8029c35a55274e4dd22e63237e5693a03f8c9a3b8801e |
|
MD5 | 29109ca5216ca2381dc37bf7a8a3e84d |
|
BLAKE2b-256 | 72311452750246fd8ad7e20a38956344b0823f0e8aa5b37a864add1e8ec6bc52 |
Close
Hashes for pyautocorpus-0.1.9-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f14c7b003b2ed1b1c3200c7b91974338ca1b5cf8d5709fa1c498e63ed5c3774 |
|
MD5 | 10affecb32dfe7659bcbeeb9f716edd9 |
|
BLAKE2b-256 | 51e2f14dc24218c05891bb5d9a1a5548fed893af6ac83f5bea214a6e5518698b |
Close
Hashes for pyautocorpus-0.1.9-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a66d1cb69d013dcd5d068aca56707dc25cd52a678445d22900cfb35d2f8fbaf |
|
MD5 | 21565e5781366c9242c2f5a4ccca6fc0 |
|
BLAKE2b-256 | dda5045d136983aa1415cc63e8008e6f69e87f0d825ed22cb833b7554840a82e |
Close
Hashes for pyautocorpus-0.1.9-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99fd9e52e860513c6e638b176513b8cff1ce78733dda81bf517c2639ee243224 |
|
MD5 | 7f58b79322f2112b2d44c131256cb80d |
|
BLAKE2b-256 | 4820b1c43e809e5cc22376268a62cc5f1f28594f485e55869909904e6ac42bfc |
Close
Hashes for pyautocorpus-0.1.9-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 374c80500d87c7fe23a5dd24cb640ed45814c85c2a61738e14004b7fddd1d05e |
|
MD5 | 6383afce4a557b966738a3fcd79b2191 |
|
BLAKE2b-256 | fa44243e7d706f6d5cf530a630ef3d503774f29195e786d9960c24fde0586daf |
Close
Hashes for pyautocorpus-0.1.9-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bea87f26165b9cc9cc16e6e30497817660e8f81de650c29059db624459ec3166 |
|
MD5 | cd1a215dc579d96eb407f4ca9de332d9 |
|
BLAKE2b-256 | 9aac3b784de24184afa8eed56c6bc3de12f8879c45b0abeff2647963b32edec6 |
Close
Hashes for pyautocorpus-0.1.9-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf6eaf22ceac8a08156c8ee99971eec76492ff2e9be8325ec4858c3ab70a3753 |
|
MD5 | ebb09b1bda33332ac490d9f317d5030d |
|
BLAKE2b-256 | bf9d32db8612d8df583aa3b6ba6090b311bef94b7c21fb10bffb9609a0a46864 |
Close
Hashes for pyautocorpus-0.1.9-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9d6383e67633c229b60793d26a3099c6ae9e0bad2124c7370eda3dc830f8b73 |
|
MD5 | 250b2a77b09b5b36fef59da8622fc9fd |
|
BLAKE2b-256 | 4c7470483d6dcd27961078727dfd41bd005d614df0d55a833d78dd87c148af58 |
Close
Hashes for pyautocorpus-0.1.9-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4932e9f9b0891d7e9fe9ed52f4f0bc20d2c15c6199a351a92c35fe4bf188c4f |
|
MD5 | f8bde97b31bf4a487da4714d0ab07fa9 |
|
BLAKE2b-256 | c2ae64f909a73846ac49ff921299abd6467f634320cc4dd03ecd19acceb645d5 |
Close
Hashes for pyautocorpus-0.1.9-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6345b6b123d859066ae6cc69e58fbe410a7ee9f8c9ba150b8cf62e889191b73 |
|
MD5 | 305839f73d851072b2efa08e68f4aec0 |
|
BLAKE2b-256 | 977c7218560fba8cc589581e16895431060792a8786e944f673d7fcbbdf8929e |
Close
Hashes for pyautocorpus-0.1.9-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88a56052198e92d149feb67470c6211890be2e59d62ac3ce901a5567ce3c06db |
|
MD5 | 14e7dd72c012ace960f3cdc20fd143cf |
|
BLAKE2b-256 | 2ef3b0302bc4f44cbbf9885cd65ebad05b5f70f46fdbf4a9f0da29f3dbf985d7 |