Skip to main content

A python library and CLI tool to convert PDF files to CSV files.

Project description

PDF to CSV Converter

This project provides a tool to convert tables from PDF files into CSV format using the Docling library. It extracts tables from PDFs and saves them as CSV files, optionally reversing text for right-to-left languages.

How It Works

  1. PDF Input: Provide the path to the PDF file you want to convert.
  2. Table Extraction: The tool uses Docling's DocumentConverter to extract tables from the PDF.
  3. DataFrame Conversion: Each extracted table is converted into a pandas DataFrame.
  4. Optional Text Reversal: If the rtl option is enabled, text in the DataFrame is reversed.
  5. CSV Output: The DataFrames are saved as CSV files in the specified output directory.

Dependencies

This project heavily depends on the Docling library for PDF table extraction. Ensure you have it installed before running the converter.

TODO:

  • Convert datatype to numeric
  • [ ]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2csv-0.1.0.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2csv-0.1.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file pdf2csv-0.1.0.tar.gz.

File metadata

  • Download URL: pdf2csv-0.1.0.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for pdf2csv-0.1.0.tar.gz
Algorithm Hash digest
SHA256 909f764be18a5bf115be585a0fab0596cfc1844e2438c44d937031060b734f24
MD5 09605552ba2bb9c287b3c98c96a8ee02
BLAKE2b-256 2688277bdfc10602e3abfd3363bf1259ad5acbfc95d203f6962e82dd6d0a07f6

See more details on using hashes here.

File details

Details for the file pdf2csv-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdf2csv-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for pdf2csv-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96e427cc2a2808abdbe33c39e753eb97035fbc0fa54a46f29bd8d4fbb23981b8
MD5 31efea6be13bc371964c2e227610d204
BLAKE2b-256 f981d698b1939ff3b3062aa938538407a4e394b3b308b96c6c49d350b8fe5b69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page