Python library to extract data from HTML Tables with rowspan
Project description
py-html-table Package
This is a simple package which uses beautifulSoup to extract HTML Table data along which has rowspan.
Installation
pip install py-html-table
Declare
import py_html_table.py_html_table as pyht
Parameters
Parameter | Meaning | Sample Values |
---|---|---|
table | python variable containing html code of table | any variable name |
begin | No.of rows to begin scrapping. Starts from 0 | 2 |
col | Total No.of columns in the table. Starts from 1 | 5 |
output | Type of output that you need | list (or) dataframe (or) csv |
raw | 'Y' to get exact content inside table cell. 'N' to get only text | 'Y' or 'N' |
NOTE: All variable names has to be provided as input to the package
Usage Example
import requests
from bs4 import BeautifulSoup
import requests_html
import lxml.html as lh
import py_html_table.py_html_table as pyht
url = 'https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States'
session = requests_html.HTMLSession()
r = session.get(url)
content = BeautifulSoup(r.content, 'lxml')
all_tables = content.select(".wikitable")
table = all_tables[0]
col = 9
begin = 2
output = 'dataframe'
raw = 'N'
pyht.extract(table,begin,col,output,raw)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file py_html_table-0.0.8.tar.gz
.
File metadata
- Download URL: py_html_table-0.0.8.tar.gz
- Upload date:
- Size: 2.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fc62224f858b6ae34171d9bbd9e2ebf385c5f3c79e4b842fcba6ccd6a93ff1a |
|
MD5 | f0331d2237dfd836301818884259269e |
|
BLAKE2b-256 | 13d52cd768d4b4aff7c45cc81e7280e367e2e21bf339320c7c1e26987c74d8d1 |
File details
Details for the file py_html_table-0.0.8-py3-none-any.whl
.
File metadata
- Download URL: py_html_table-0.0.8-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8926da1e707e840a81e2c50d712328e5d860bfbd14f06b78c4b5b9e06f45ac95 |
|
MD5 | 3c795ad9a8b536407cd4f657fe6009ee |
|
BLAKE2b-256 | fb02eab4addb6f11bbe762e5f8b7b3d515993de4fedfd2c23611ee10d1841564 |