Convert PDF to Markdown using Azure AI Document Intelligence and upload to S3. Provided by the Credeed AI team, it can be used for AI Agent to understand PDF documents.
Project description
credeed-pdf-to-markdown
Convert PDF files into Markdown format using Azure AI Document Intelligence and store results in AWS S3.
Supports both online PDF URLs and local PDF files. Output Markdown file will be publicly accessible via a generated S3 URL.
Provided by the Credeed AI team, it can be used for AI Agent to understand PDF documents.
About Credeed
Credeed is an AI-powered platform that helps SMEs build strong credit profiles, making it easier to access funding and grow their business. Tailored for SMEs, Credeed offers an AI-augmented, hyper-personalised experience in your credit-building journey.
Keywords: KYC Report, Financial Health, Financial Risk, Credit Profile, Risk Assessment, Risk Management, Sanctions Compliance, Intelligence Platform, Company Risk, Identity Verification
Features
- Extract text and layout from PDF using Azure AI Document Intelligence
- Auto-convert to clean Markdown format
- Upload Markdown to Amazon S3 with public access
- Supports both PDF URLs and local file uploads
- Usable as a Python library, CLI tool, or Flask API
Installation
pip install credeed-pdf-to-markdown
Example Usage
from credeed_pdf_to_markdown import PdfToMarkdownConverter
converter = PdfToMarkdownConverter(
azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/",
azure_key="<your-azure-key>",
aws_access_key="<aws-access-key>",
aws_secret_key="<aws-secret-key>",
s3_bucket="<your-s3-bucket>",
s3_region="ap-southeast-1"
)
# From PDF URL
markdown_url = converter.convert_from_url("https://example.com/sample.pdf")
print("Markdown URL:", markdown_url)
# From local file
markdown_url = converter.convert_from_file("sample.pdf")
print("Markdown URL:", markdown_url)
As a CLI Tool
# From PDF URL
credeed-pdf-to-markdown --url https://example.com/sample.pdf
# From local PDF file
credeed-pdf-to-markdown --file /path/to/your.pdf
As a Flask API
# Start the API:
python app.py
# POST request to:
http://127.0.0.1:5000/convert
# With File Upload:
curl -X POST http://127.0.0.1:5000/convert \
-F "pdf_file=@your.pdf"
# With PDF URL:
curl -X POST http://127.0.0.1:5000/convert \
-F "pdf_url=https://example.com/your.pdf"
Required AWS S3 Bucket Permissions
This tool uploads Markdown files and local PDFs to S3 with public read access. Ensure your bucket has a policy like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicRead",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::your-bucket-name/*"
}
]
}
License
MIT © 2025 Credeed
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file credeed_pdf_to_markdown-0.1.0.tar.gz.
File metadata
- Download URL: credeed_pdf_to_markdown-0.1.0.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fec3d7bd6ede630bc9c65d5ce5b4a1da7a469dc64e202e94ef121a4d2d932b8b
|
|
| MD5 |
f980cca69c76ccadf21b908ea5df55a9
|
|
| BLAKE2b-256 |
ac097fc236f36deb6a2457e9c6393c8839e493ca3f99502dad534538c11eff1a
|
File details
Details for the file credeed_pdf_to_markdown-0.1.0-py3-none-any.whl.
File metadata
- Download URL: credeed_pdf_to_markdown-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c56925a066c11857570ce98a70c2a855551afb01936ad4ef775201e40d191a06
|
|
| MD5 |
0c6a0765f252091554761b128dd5a671
|
|
| BLAKE2b-256 |
431e9ebd92d0f8818d71f1d3fbd106155c8fc7ac6cad08afa8475a86064030bc
|