Skip to main content

Effortlessly convert HTML tables to JSON with this Python-based tool.

Project description

HTML Table to JSON Converter

Leia em Português

Python pandas BeautifulSoup

Description

This project is a converter that reads a table from an HTML file and transforms it into a JSON file. It uses the pandas and BeautifulSoup libraries to perform the conversion efficiently and in a structured manner. The project follows Clean Architecture, Clean Code, and SOLID principles, ensuring modular, readable, and maintainable code.

Project Structure

html-table-to-json/
├── src/
│   ├── main.py
│   ├── services/
│   │   ├── html_parser.py
│   │   ├── json_converter.py
│   │   └── file_handler.py
│   └── utils/
│       └── logger.py
├── requirements.txt
├── README.md
├── LICENSE.md
└── .gitignore

Installation

Prerequisites

  • Python 3.6 or higher
  • Pip (Python package manager)

Steps

  1. Clone the repository:

    git clone https://github.com/your-username/html-table-to-json.git
    cd html-table-to-json
    
  2. Create a virtual environment:

    python -m venv venv
    
  3. Activate the virtual environment:

    • On Windows:
      venv\Scripts\activate
      
    • On macOS/Linux:
      source venv/bin/activate
      
  4. Install dependencies:

    pip install -r requirements.txt
    

Usage

To convert a table from an HTML file to JSON, follow these steps:

  1. Navigate to the src directory:

    cd src
    
  2. Run the main.py script, passing the path of the HTML file as an argument:

    python main.py path/to/your/file.html
    
  3. The resulting JSON will be saved to output/out.json.

Example

If you have an HTML file named table.html in the root of the project, run:

python main.py ../table.html

The JSON will be saved in output/out.json and logs will be stored in logs/html_table_to_json.log.

Code Structure

main.py

Entry point of the program. Coordinates reading the HTML file, parsing, conversion, and writing the output JSON.

services/file_handler.py

Contains functions for reading and writing files.

services/html_parser.py

Contains functions for parsing HTML using BeautifulSoup.

services/json_converter.py

Contains functions for converting the HTML table to JSON using Pandas.

utils/logger.py

Configures and initializes the logger to record important events and errors.

Contribution

  1. Fork the project
  2. Create a branch for your feature (git checkout -b feature/new-feature)
  3. Commit your changes (git commit -m 'Add new feature')
  4. Push to the branch (git push origin feature/new-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License. See the LICENSE.md file for more details.

Contact

For more information, contact via email at thiagoarturschumann@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html-table-to-json-0.1.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

html_table_to_json-0.1.0-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file html-table-to-json-0.1.0.tar.gz.

File metadata

  • Download URL: html-table-to-json-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for html-table-to-json-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8a2647408c8c127d6ecdd49712c822b47e6ab39b8840fe47f2cb2ccdee253d5b
MD5 f8f82c09b7920e38de20ad936836dbd2
BLAKE2b-256 1dc2835fb4e0bf880fa0c02774039e9700c158ad6f2f30128831c3d911bba621

See more details on using hashes here.

File details

Details for the file html_table_to_json-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for html_table_to_json-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0823c9ff8415a3a4e603875dc802da56f4a49856d84ca725c124cc2be42ffa8a
MD5 21790517016ea9511e0d4f549cca7d00
BLAKE2b-256 b4cd2087787e39e67e615b2fdc89c6c7d1a1a2f8af0b694086c139bebbdb8465

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page