Python library to extract data from HTML Tables with rowspan
Project description
py-html-table Package
This is a simple package which uses beautifulSoup to extract HTML Table data along which has rowspan.
Installation
pip install py-html-table
Declare
import py_html_table.py_html_table as pyht
Parameters
Parameter | Meaning | Sample Values |
---|---|---|
table | python variable containing html code of table | any variable name |
begin | No.of rows to begin scrapping. Starts from 0 | 2 |
col | Total No.of columns in the table. Starts from 1 | 5 |
output | Type of output that you need | list (or) dataframe (or) csv |
raw | 'Y' to get exact content inside table cell. 'N' to get only text | 'Y' or 'N' |
NOTE: All variable names has to be provided as input to the package
Usage Example
import requests
from bs4 import BeautifulSoup
import requests_html
import lxml.html as lh
import py_html_table.py_html_table as pyht
url = 'https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States'
session = requests_html.HTMLSession()
r = session.get(url)
content = BeautifulSoup(r.content, 'lxml')
all_tables = content.select(".wikitable")
table = all_tables[0]
col = 9
begin = 2
output = 'dataframe'
raw = 'N'
pyht.extract(table,begin,col,output,raw)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for py_html_table-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8926da1e707e840a81e2c50d712328e5d860bfbd14f06b78c4b5b9e06f45ac95 |
|
MD5 | 3c795ad9a8b536407cd4f657fe6009ee |
|
BLAKE2b-256 | fb02eab4addb6f11bbe762e5f8b7b3d515993de4fedfd2c23611ee10d1841564 |