A lightweight tool for table extraction from HTML pages.
Project description
boxfish: lightweight table extraction from HTML
What is it?
Boxfish is a lightweight tool for table extraction from HTML pages.
Main features
- Easy configuration. No knowledge of CSS or Xpaths required.
- Fast table extraction to CSV files.
- Integration of
requests
andselenium
.
Quick start
import boxfish as bf
import pandas as pd
# Define table layout of an url with strings from two rows.
aurl = ""
row1 = ""
row2 = ""
# Build a configuration
aconfig = bf.build(url=aurl, rows = [row1, row2])
# Extract a table
data = bf.extract(aconfig, url=aurl)
# View results
df = pd.DataFrame(data)
df.head()
Where to get it?
Boxfish is available on Pypi and Github.
pip install boxfish
Dependencies
The main dependencies are:
- BeautifulSoup4, a Python library for pulling data out of HTML and XML files.
- lxml, a powerful and Pythonic XML processing library.
- Requests, a simple, yet elegant, HTTP library.
- Selenium, automated web browser interaction from Python.
License
Boxfish is available with an MIT license.
Limitations
Boxfish extracts text from HTML. To see if the HTML file contains the text of interest, open the page in a browser, then access the HTML in the developer tools via Cntrl+Shift+ I.
Documentation
Full documentation is available here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
boxfish-0.1.2.tar.gz
(27.6 kB
view details)
Built Distribution
boxfish-0.1.2-py3-none-any.whl
(30.7 kB
view details)
File details
Details for the file boxfish-0.1.2.tar.gz
.
File metadata
- Download URL: boxfish-0.1.2.tar.gz
- Upload date:
- Size: 27.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 623cd4d507f255e9299b80ae0a3ff8d8b52388245b86cab102c55b08968c1152 |
|
MD5 | 49a2e27fb32bb9060003509c9a620d93 |
|
BLAKE2b-256 | e61edf51a537cca5bb1facda648e836f029ee923cfa743734cc6da6fa9cdc9d5 |
File details
Details for the file boxfish-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: boxfish-0.1.2-py3-none-any.whl
- Upload date:
- Size: 30.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a6958437343290d653f3bb07b595d55d2e4b75f4770b562f643806f8270618b |
|
MD5 | 4079d37d1c28ee525f2bef1cfd1f5607 |
|
BLAKE2b-256 | 6da95ad8e613959e0cfa335dcefb6dbf810606eb2751d9f34bc3486975e2da25 |