Skip to main content

No project description provided

Project description

PyAutoCorpus

A python interface to the excellent AutoCorpus library.

Right now, it only supports the wiki markup textify function, which strips out markup. From my benchmarks, this ends up being ~40x faster than methods to strip markup using other libraries:

mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc

where:

  • mwparserfromhell is mwparserfromhell.parse(x).strip_code()
  • wikitextparser is wikitextparser.parse(x).plain_text()
  • pyautocorpus is pyautocorpus.Textifier().textify(x)

Installing

From pypi:

pip install pyautocorpus

From source:

Be sure to clone recursively:

git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git

You will first need the pcre library installed.

python setup.py install

Usage

Example:

import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'

Known issues

  • Windows is not yet supported

Credits

AutoCorpus

Contributors to this repository:

  • Sean MacAvaney (University of Glasgow)
  • Thomas Jänich (University of Glasgow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocorpus-0.1.13.tar.gz (10.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyautocorpus-0.1.13-cp314-cp314-win_amd64.whl (6.9 kB view details)

Uploaded CPython 3.14Windows x86-64

pyautocorpus-0.1.13-cp314-cp314-macosx_10_15_universal2.whl (45.2 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.13-cp313-cp313-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.13Windows x86-64

pyautocorpus-0.1.13-cp313-cp313-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.13-cp312-cp312-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.12Windows x86-64

pyautocorpus-0.1.13-cp312-cp312-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.13-cp311-cp311-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.11Windows x86-64

pyautocorpus-0.1.13-cp311-cp311-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.13-cp310-cp310-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.10Windows x86-64

pyautocorpus-0.1.13-cp310-cp310-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.13-cp39-cp39-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.9Windows x86-64

pyautocorpus-0.1.13-cp39-cp39-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file pyautocorpus-0.1.13.tar.gz.

File metadata

  • Download URL: pyautocorpus-0.1.13.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pyautocorpus-0.1.13.tar.gz
Algorithm Hash digest
SHA256 b2ff0ebbc38ca3ed869a23ea219b0bc8ede3cf938394828e9401058bb0657213
MD5 5db02ed6ed436fcfcf7405153b4f5043
BLAKE2b-256 d6904e3b3c09e57db8e0aecddb2f2f4b4d2b4b094a82950107dfcebd5d1d0ffb

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 d0c262a8dab565bb893c7c898c00ef628ee544a5381dc787ed0f2b90ee2edf59
MD5 0cba83c13ca55065be2580fd1c681d5e
BLAKE2b-256 68f531458c037905762640ec2a8ef0faf69731c0eb7ce760f4501990689495e0

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 719b71c58e5f5ef06ea7fd8d3c5b97e59d7269ec139bd633b330a09f3b9dcb3a
MD5 30985e1cf25ece65e8629ee3c6c07b1c
BLAKE2b-256 aabe5a04c48d9d4d372171f2a01aa532d2d16fd2ca38274f417c0656561b003a

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 25f1e54de2f83d966d81b54cb204581317bbc4f336fd3d2720b13cfce3232f5d
MD5 8a1f029d0110089b93360eced4fb9ccc
BLAKE2b-256 9b9c952dca075ecc6a920c51c6a00dc3e3d34a38d52b09cf664f63e069f8c4a2

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 5150102a062af1126b4723a9b5755ff4c94f7aae3bda4139eb888c36b2bb9124
MD5 1c3c306897347d6a0b8cb618a2179c7f
BLAKE2b-256 8f9d023d01cba63f0dc10ee62ebe109cacbee231b23fec0d60be8be24c0d9afe

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f6d9014044097ccba5b1bac53d0699fdff329b57911b359a6ad84072abefda53
MD5 6aa31b4a13fb6033ea406c06fcbbe109
BLAKE2b-256 2d4a86f4392f4606ca01c7ea42276c55df0383ae572c82efaaac3bab8a1076f0

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 f554f148c310e2f43bc56aac1466e1e23ae16e11cb5708e37b67e02c4dc97401
MD5 d23f32940c8677e1eee507b663f075ff
BLAKE2b-256 179d4988a662ae8a9fa7f2219e9eefea283a223cc730635ccfe24210dccbf8c0

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5bb6ca578ff357c1dc382e0dc3626635a72d404b169bc17550df76338181c53e
MD5 98e7afe51426814a63c51087ae36def1
BLAKE2b-256 7d20c60d4baa077f7bb6438def087782b190b46b2f4b61d08c51f78979299434

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 8c04ccc52acf751bd5e574a397590c85ce851594d5d5fda5acc66f775b20b8fb
MD5 47b93bb216674e8632c127bdc1c6d6c3
BLAKE2b-256 ed7ce118a151cee31c30d2bd424e16d19074be5bfd8742608fec1e86360a797d

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d52b317997d9d8cc42b0f47741f901f74eb25e8e363259501bb397d611870327
MD5 06671dc3e360e213554252c0d3feae00
BLAKE2b-256 6ff4dcd5a0ce62e858f0634d40f69ce1177820510cdd7bb2035502ba05bfac91

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 8e0f773fc355d21197dbca16549218b0848b6d3b52c472de4396988dbc2ae777
MD5 f305c390a9a0fd3f4d25acc07cca5142
BLAKE2b-256 42a094f9ce93ccb9bf877f33c1d5c7df8876f8d1ee7dbfa38003eda5d07575a8

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 fda4e9f6316011663dbe5c7fe8a4cf017c25c749e27d4f764a56a555c3e171d4
MD5 c45efbcb8c1c18515a2287078cbc9c5c
BLAKE2b-256 a7fdeca6eb99c42b8bed2d2d07045c22c09863ec3cdc7094f272c8f52b441a37

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.13-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.13-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 f90ef5ef895845e08feccf15ce16eb74114b1b042d2183ad38ba9c8ef79126db
MD5 feccfe03978f73e093d70361e844fee6
BLAKE2b-256 043376f9f508845950b0a9f66d97db961cad480bfc675b77da546244213f65c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page