Skip to main content

Python library to extract data from HTML Tables with rowspan

Project description

py-html-table Package

This is a simple package which uses beautifulSoup to extract HTML Table data along which has rowspan.

Installation

pip install py-html-table

Declare

import py_html_table.py_html_table as pyht

Parameters

Parameter Meaning Sample Values
table python variable containing html code of table any variable name
begin No.of rows to begin scrapping. Starts from 0 2
col Total No.of columns in the table. Starts from 1 5
output Type of output that you need list (or) dataframe (or) csv
raw 'Y' to get exact content inside table cell. 'N' to get only text 'Y' or 'N'

NOTE: All variable names has to be provided as input to the package

Usage Example

import requests
from bs4 import BeautifulSoup
import requests_html
import lxml.html as lh
import py_html_table.py_html_table as pyht

url = 'https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States'
session = requests_html.HTMLSession()
r = session.get(url)
content = BeautifulSoup(r.content, 'lxml')
all_tables = content.select(".wikitable")
table = all_tables[0]
col = 9
begin = 2
output = 'dataframe'
raw = 'N'
pyht.extract(table,begin,col,output,raw)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_html_table-0.0.8.tar.gz (2.7 kB view details)

Uploaded Source

Built Distribution

py_html_table-0.0.8-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file py_html_table-0.0.8.tar.gz.

File metadata

  • Download URL: py_html_table-0.0.8.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for py_html_table-0.0.8.tar.gz
Algorithm Hash digest
SHA256 3fc62224f858b6ae34171d9bbd9e2ebf385c5f3c79e4b842fcba6ccd6a93ff1a
MD5 f0331d2237dfd836301818884259269e
BLAKE2b-256 13d52cd768d4b4aff7c45cc81e7280e367e2e21bf339320c7c1e26987c74d8d1

See more details on using hashes here.

File details

Details for the file py_html_table-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: py_html_table-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for py_html_table-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8926da1e707e840a81e2c50d712328e5d860bfbd14f06b78c4b5b9e06f45ac95
MD5 3c795ad9a8b536407cd4f657fe6009ee
BLAKE2b-256 fb02eab4addb6f11bbe762e5f8b7b3d515993de4fedfd2c23611ee10d1841564

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page