Skip to main content

Convert PDF to Markdown using Azure AI Document Intelligence and upload to S3. Provided by the Credeed AI team, it can be used for AI Agent to understand PDF documents.

Project description

credeed-pdf-to-markdown

Convert PDF files into Markdown format using Azure AI Document Intelligence and store results in AWS S3.

Supports both online PDF URLs and local PDF files. Output Markdown file will be publicly accessible via a generated S3 URL.

Provided by the Credeed AI team, it can be used for AI Agent to understand PDF documents.


About Credeed

Credeed is an AI-powered platform that helps SMEs build strong credit profiles, making it easier to access funding and grow their business. Tailored for SMEs, Credeed offers an AI-augmented, hyper-personalised experience in your credit-building journey.

Keywords: KYC Report, Financial Health, Financial Risk, Credit Profile, Risk Assessment, Risk Management, Sanctions Compliance, Intelligence Platform, Company Risk, Identity Verification


Features

  • Extract text and layout from PDF using Azure AI Document Intelligence
  • Auto-convert to clean Markdown format
  • Upload Markdown to Amazon S3 with public access
  • Supports both PDF URLs and local file uploads
  • Usable as a Python library, CLI tool, or Flask API

Installation

pip install credeed-pdf-to-markdown

Example Usage

from credeed_pdf_to_markdown import PdfToMarkdownConverter

converter = PdfToMarkdownConverter(
    azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/",
    azure_key="<your-azure-key>",
    aws_access_key="<aws-access-key>",
    aws_secret_key="<aws-secret-key>",
    s3_bucket="<your-s3-bucket>",
    s3_region="ap-southeast-1"
)

# From PDF URL
markdown_url = converter.convert_from_url("https://example.com/sample.pdf")
print("Markdown URL:", markdown_url)

# From local file
markdown_url = converter.convert_from_file("sample.pdf")
print("Markdown URL:", markdown_url)

As a CLI Tool

# From PDF URL
credeed-pdf-to-markdown --url https://example.com/sample.pdf

# From local PDF file
credeed-pdf-to-markdown --file /path/to/your.pdf

As a Flask API

# Start the API:
python app.py

# POST request to:
http://127.0.0.1:5000/convert

# With File Upload:
curl -X POST http://127.0.0.1:5000/convert \
     -F "pdf_file=@your.pdf"

# With PDF URL:
curl -X POST http://127.0.0.1:5000/convert \
     -F "pdf_url=https://example.com/your.pdf"

Required AWS S3 Bucket Permissions

This tool uploads Markdown files and local PDFs to S3 with public read access. Ensure your bucket has a policy like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicRead",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}

License

MIT © 2025 Credeed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

credeed_pdf_to_markdown-0.1.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

credeed_pdf_to_markdown-0.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file credeed_pdf_to_markdown-0.1.0.tar.gz.

File metadata

  • Download URL: credeed_pdf_to_markdown-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for credeed_pdf_to_markdown-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fec3d7bd6ede630bc9c65d5ce5b4a1da7a469dc64e202e94ef121a4d2d932b8b
MD5 f980cca69c76ccadf21b908ea5df55a9
BLAKE2b-256 ac097fc236f36deb6a2457e9c6393c8839e493ca3f99502dad534538c11eff1a

See more details on using hashes here.

File details

Details for the file credeed_pdf_to_markdown-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for credeed_pdf_to_markdown-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c56925a066c11857570ce98a70c2a855551afb01936ad4ef775201e40d191a06
MD5 0c6a0765f252091554761b128dd5a671
BLAKE2b-256 431e9ebd92d0f8818d71f1d3fbd106155c8fc7ac6cad08afa8475a86064030bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page