Python library to extract data from HTML Tables with rowspan
Project description
py-html-table Package
This is a simple package which uses beautifulSoup to extract HTML Table data along which has rowspan.
Installation
pip install 'beautifulsoup4==4.5.3'
pip install py_html_table\
Declare
import py_html_table\
Parameters
Parameter | Meaning | Sample Values |
---|---|---|
table | python variable containing html code of table | any variable name |
col | Total No.of columns in the table. Starts from 1 | 5 |
begin | No.of rows to begin scrapping. Starts from 0 | 2 |
output | Type of output that you need | list (or) dataframe (or) csv |
Usage Example
import requests
from bs4 import BeautifulSoup
import requests_html
import lxml.html as lh
import py_html_table\
url = 'https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States'\
session = requests_html.HTMLSession()
r = session.get(url)
content = BeautifulSoup(r.content, 'lxml')
all_tables = content.select(".wikitable")\
col = 9 # Total No.of columns in the table. Starts from 1
begin = 2 # No.of rows to begin scrapping. Starts from 0
output = 'dataframe' # options - list, dataframe, csv
raw = 'N' # 'Y' -> complete HTML content. 'N' -> only text inside HTML\
py_html_table.extract(table,begin,col,output,raw)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for py_html_table-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7aff87160867dc00e80b23f831b1907a0026cbc3c5dc0dfdb07dfe3a3ff5f42 |
|
MD5 | 34d0d1a7aca484dbda3e5b0587693ff4 |
|
BLAKE2b-256 | 05782e25cb83f3d6b5ede7d82709d07d78a54e4f6085a7f94cefafb4e69a6230 |