No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
Be sure to clone recursively:
git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.11.tar.gz
(10.8 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.11-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4fa89259aed70ac2da0065321928d882fb2b4cb889ca8e59bde3f88461c6338 |
|
MD5 | 94a43a8d97a12a39cf27fd63301e94d8 |
|
BLAKE2b-256 | a4006ad2c01c4f770911fd1c0bb55bbe0a1ffdddf2d6e5b445ae4832b5ae893f |
Close
Hashes for pyautocorpus-0.1.11-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d539f34d42aaff6fa2e8327899d105ee3373b507d613e40b442de2ef58ef2f45 |
|
MD5 | a3e95015828ecb630ebe09b8a7b63feb |
|
BLAKE2b-256 | 9874033fc9530423d3e0cb487364a769c90f6f9428f4d3aa9c0f9845d43a90ec |
Close
Hashes for pyautocorpus-0.1.11-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f146373112839a5f9252329e056714992184b24c05ba643d276e1fe70bd887e |
|
MD5 | 11b96350bea69a2d33d3ff1c1c4872a0 |
|
BLAKE2b-256 | d458bf0d819c9a43c6b6f8bb594621e8bda4ec86dc490ae731a920b611f11cad |
Close
Hashes for pyautocorpus-0.1.11-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d8e8655348da4322f92aa33d4e508f6a5626737d7df99a7ba13ce4cb57eeaf6 |
|
MD5 | bc3d3a8281f51b824f5e20b763f99b86 |
|
BLAKE2b-256 | 5cb91acec485788ab96d2b1951cf35d1adac103c8f2c0b8d7b6749d170cffe3e |
Close
Hashes for pyautocorpus-0.1.11-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7582567235f12a18075a49bdf05797128e8b06fd675839c670472ba6a3ae9d05 |
|
MD5 | 78bb1b7676d00a0870cd5fba55b4c0f9 |
|
BLAKE2b-256 | 20dc7a098a78b49e1044f3a47ec965874425d4f9e02ecbdb0491bfb9090ffd7b |
Close
Hashes for pyautocorpus-0.1.11-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1accb0000165bd16f25e7cf59b2328bd102b16fc958510fe0c442707435381f9 |
|
MD5 | 01db7214a31122e3e6e8a3bb3c5844d1 |
|
BLAKE2b-256 | d1b1d5be391854a2b45f5efa894513995f1a3f03a8f583f38751ccf3dd874e11 |
Close
Hashes for pyautocorpus-0.1.11-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e040127a5f248e5b6846c3e0646c4fb4ed6abd5f09042c5960ced905ed584e9a |
|
MD5 | b9d49580aae312502ee877a46cbd8aa7 |
|
BLAKE2b-256 | d08d917f472d55ec3bb7992eca28beccd5dc51f4a03c3f2845a8c0b7373a06db |
Close
Hashes for pyautocorpus-0.1.11-cp38-cp38-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d323a59b006b60ece68dc23c10472181e749660a6833d478ad100729775e7d3d |
|
MD5 | 663cf8a29740f018de22d7c897d7a9d4 |
|
BLAKE2b-256 | 9e91f511106452b51e128d9dc73d034cac6a2ab6e929e7ea28a6689513889710 |
Close
Hashes for pyautocorpus-0.1.11-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 662dbbc2a640542d0d5f5fb5e8761be8e2f2ae93ca6f24ae87ab8e2ee1b6c440 |
|
MD5 | e9aa7158a8b40c874aac1f4524dd8d41 |
|
BLAKE2b-256 | 96fb87c655eb28f6c00eb325be919ed9e09a731b8a7e5423d462cc4deb089d99 |
Close
Hashes for pyautocorpus-0.1.11-cp37-cp37m-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a015f5f9aab131f442eb55499d4d91a684a94da897912d3c98e0bbec8552970c |
|
MD5 | 4e63d7c244e829f646fd32e7297750d8 |
|
BLAKE2b-256 | 91df4bcf4bf61be5d45c11d0edd38875b5a503b43945cd7c9e0440f4b93d16b8 |