Skip to main content

Python library to extract data from HTML Tables with rowspan

Project description

py-html-table Package

This is a simple package which uses beautifulSoup to extract HTML Table data along which has rowspan.

Installation

pip install 'beautifulsoup4==4.5.3'
pip install py_html_table

Declare

import py_html_table

Parameters

Parameter Meaning Sample Values
table python variable containing html code of table any variable name
col Total No.of columns in the table. Starts from 1 5
begin No.of rows to begin scrapping. Starts from 0 2
output Type of output that you need list (or) dataframe (or) csv

NOTE: All variable names has to be provided as input to the package

Usage Example

import requests
from bs4 import BeautifulSoup
import requests_html
import lxml.html as lh
import py_html_table.py_html_table as pyht

url = 'https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States'
session = requests_html.HTMLSession()
r = session.get(url)
content = BeautifulSoup(r.content, 'lxml')
all_tables = content.select(".wikitable")
table = all_tables[0]
col = 9
begin = 2
output = 'dataframe'
raw = 'N'
py_html_table.extract(table,begin,col,output,raw)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_html_table-0.0.4.tar.gz (2.7 kB view details)

Uploaded Source

Built Distribution

py_html_table-0.0.4-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file py_html_table-0.0.4.tar.gz.

File metadata

  • Download URL: py_html_table-0.0.4.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for py_html_table-0.0.4.tar.gz
Algorithm Hash digest
SHA256 131ed241e7f90095ac2a7a2b703f85bda6bc4c7ce0d9010b864101c2d3c1654f
MD5 e7f960d0146427b501c6c324c441ad34
BLAKE2b-256 771e55a9068c0fedef676b8888b23449c5315140cf1ebd93adf68f3dd7566aa9

See more details on using hashes here.

File details

Details for the file py_html_table-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: py_html_table-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for py_html_table-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 aef297086b4548ce54fe0caa56c3a0611b962c3b41136a6955abf1104029b47a
MD5 54620b81991cc6625ffbda31f0d0fda9
BLAKE2b-256 cbcd7147000b21a6b6376e48b7bc58d3c7184b58e19f417cb8c9fe52ea4dafe6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page