Skip to main content

Extract the information represented in any HTML table

Project description

Tablextract

This Python 3 library extracts the information represented in any HTML table. This project has been developed in the context of the paper TOMATE: On extracting information from HTML tables.

How to install

You can install this library via pip using: pip install tablextract

Usage

>>> from tablextract import tables
>>> tables('http://example.com/tables')
[]

Further information will be written soon.

Changes

v1.0.0

Released on Jan 24, 2019.

  • Before using Selenium, geckodriver is automatically downloaded for Linux, Windows and Mac OS.
  • The Firefox process is closed automatically when the process ends.
  • Geckodriver quit is called instead of close.
  • Side-projects has been moved from this core project to tablextract-server and datamart.

v0.0.1

Released on Jan 22, 2019.

  • Initial package upload.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablextract-1.0.0.tar.gz (13.0 kB view hashes)

Uploaded Source

Built Distribution

tablextract-1.0.0-py3-none-any.whl (19.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page