Python library to extract data from HTML Tables with rowspan
Project description
py-html-table Package
This is a simple package which uses beautifulSoup to extract HTML Table data along which has rowspan.
Installation
pip install 'beautifulsoup4==4.5.3'
pip install py_html_table
Declare
import py_html_table
Parameters
Parameter | Meaning | Sample Values |
---|---|---|
table | python variable containing html code of table | any variable name |
col | Total No.of columns in the table. Starts from 1 | 5 |
begin | No.of rows to begin scrapping. Starts from 0 | 2 |
output | Type of output that you need | list (or) dataframe (or) csv |
NOTE: All variable names has to be provided as input to the package
Usage Example
import requests
from bs4 import BeautifulSoup
import requests_html
import lxml.html as lh
import py_html_table.py_html_table as pyht
url = 'https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States'
session = requests_html.HTMLSession()
r = session.get(url)
content = BeautifulSoup(r.content, 'lxml')
all_tables = content.select(".wikitable")
table = all_tables[0]
col = 9
begin = 2
output = 'dataframe'
raw = 'N'
py_html_table.extract(table,begin,col,output,raw)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for py_html_table-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aef297086b4548ce54fe0caa56c3a0611b962c3b41136a6955abf1104029b47a |
|
MD5 | 54620b81991cc6625ffbda31f0d0fda9 |
|
BLAKE2b-256 | cbcd7147000b21a6b6376e48b7bc58d3c7184b58e19f417cb8c9fe52ea4dafe6 |