Skip to main content

Web Data Extraction Library Written in Python

Project description

Build Status

Wextracto is a toolkit for command-line web data extraction.

Installation

$ pip install wextracto

Kicking the Tyres

$ echo -e "[wex]\nsitemaps=wex.sitemaps:urls_from_sitemaps" > entry_points.txt
$ wex "http://www.ebay.com/robots.txt"

Documentation

The documentation can be found here:

http://wextracto.readthedocs.org/en/latest/index.html

Release History

0.8.3 (2015-09-23)

  • Fix bug in HTTP decode caused by magic bytes handling.

0.8.2 (2015-09-21)

  • Add magic_bytes to Response for more reliable wex.http:decode behaviour.

0.7.9 (2015-08-18)

  • Re-worked encoding for HTML to pre-parse

0.7 (2015-06-04)

  • Better proxy support

0.4 (2015-02-12)

  • Now we flatten labels and values.

  • href and src become href_url and src_url.

0.3 (2014-12-29)

  • Some API changes + switch to “tab-separated JSON”.

0.2.2 (2014-10-24)

  • Uploaded sdist to PyPI for “pip install wextracto” simplicity.

0.1 (2014-10-16)

  • Initial release as open source

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Wextracto-0.8.3.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

Wextracto-0.8.3-py2.py3-none-any.whl (45.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file Wextracto-0.8.3.tar.gz.

File metadata

  • Download URL: Wextracto-0.8.3.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for Wextracto-0.8.3.tar.gz
Algorithm Hash digest
SHA256 7acbd1e401f2297b103bb80a803e38dd5909a88acfb704245549469ec19052bd
MD5 33bbaa2ebcb6e9585ad07e5e2b8d6702
BLAKE2b-256 ccbdd20f9ffc0c9d755ddbf10c7587df0a23fdcbbbb99f5a65108aebc5269843

See more details on using hashes here.

File details

Details for the file Wextracto-0.8.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for Wextracto-0.8.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f2892c2d1fb566f1bea0eb17436fb0ed2177d5098aa2c721d87f5a5c92c2ba01
MD5 572e30da32dbb21a7c3d1b7e7e779bf7
BLAKE2b-256 3f35bedcf04ebdf29b86d9f809394a54c1d65d3549a68e1d50e92e89d313193e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page