Skip to main content

No project description provided

Project description

PyAutoCorpus

A python interface to the excellent AutoCorpus library.

Right now, it only supports the wiki markup textify function, which strips out markup. From my benchmarks, this ends up being ~40x faster than methods to strip markup using other libraries:

mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc

where:

  • mwparserfromhell is mwparserfromhell.parse(x).strip_code()
  • wikitextparser is wikitextparser.parse(x).plain_text()
  • pyautocorpus is pyautocorpus.Textifier().textify(x)

Installing

From pypi:

pip install pyautocorpus

From source:

Be sure to clone recursively:

git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git

You will first need the pcre library installed.

python setup.py install

Usage

Example:

import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'

Known issues

  • Windows is not yet supported

Credits

AutoCorpus

Contributors to this repository:

  • Sean MacAvaney (University of Glasgow)
  • Thomas Jänich (University of Glasgow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocorpus-0.1.12.tar.gz (10.8 kB view hashes)

Uploaded Source

Built Distributions

pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (197.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (196.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (197.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (196.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (198.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (197.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp311-cp311-win_amd64.whl (7.1 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (380.1 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (373.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp311-cp311-macosx_10_9_universal2.whl (46.8 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.12-cp310-cp310-win_amd64.whl (7.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.9 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (373.2 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp310-cp310-macosx_11_0_x86_64.whl (24.6 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ x86-64

pyautocorpus-0.1.12-cp39-cp39-win_amd64.whl (7.2 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.8 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (373.0 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp39-cp39-macosx_11_0_x86_64.whl (24.6 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ x86-64

pyautocorpus-0.1.12-cp38-cp38-win_amd64.whl (7.3 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.8 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (373.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp38-cp38-macosx_11_0_x86_64.whl (24.6 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ x86-64

pyautocorpus-0.1.12-cp37-cp37m-win_amd64.whl (7.2 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.5 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl (372.6 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp37-cp37m-macosx_11_0_x86_64.whl (24.6 kB view hashes)

Uploaded CPython 3.7m macOS 11.0+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page