Skip to main content

Convert files from various sources (SharePoint, S3, Azure Blob, etc.) to Markdown and upload to destinations (Google Drive, SharePoint, etc.).

Project description

Ws-Mark-Flow AI Converter

Convert files from various sources (SharePoint, S3, Azure Blob, etc.) to Markdown and upload to destinations (Google Drive, SharePoint, etc.).

Features

  • Multi-source support: SharePoint, S3, Azure Blob Storage (extensible)
  • Multi-destination support: Google Drive, SharePoint, S3 (extensible)
  • File conversion: PDF, DOCX, PPTX, XLSX, CSV, images, and more → Markdown
  • Incremental conversion: Only converts files not already in destination
  • Job persistence: MongoDB-backed job storage for resumable pipelines
  • REST API: FastAPI-based API for job management
  • Progress tracking: Real-time conversion progress and statistics

Architecture

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│   Source    │────▶│  Converter   │────▶│  Destination  │
│ (SharePoint)│     │ (MarkItDown) │     │(Google Drive) │
└─────────────┘     └──────────────┘     └───────────────┘
                           │
                    ┌──────▼──────┐
                    │   MongoDB   │
                    │ (Job Store) │
                    └─────────────┘

Installation

# Install dependencies
uv pip install -r requirements.txt

# Copy environment file
cp .env.example .env
# Edit .env with your MongoDB URI

# Run with auto-reload
uvicorn src.app:app --reload --port 8000

API Documentation

Supported Integrations

Sources

  • SharePoint (sharepoint): Microsoft Graph API
  • More coming: S3, Azure Blob, Local filesystem

Destinations

  • Google Drive (google_drive): Google Drive API v3
  • More coming: SharePoint, S3, Azure Blob

Supported File Types

Converted using Microsoft MarkItDown, Docling or LLM-based analysis for complex PDFs & images.

  • Documents: PDF, DOCX, DOC, RTF, TXT
  • Presentations: PPTX, PPT
  • Spreadsheets: XLSX, XLS, CSV
  • Web: HTML, XML, JSON, YAML
  • Images: PNG, JPG, GIF, BMP, TIFF (OCR)

Configuration

Main Environment Variables

Variable Default Description
AUTH_USERNAME admin Basic auth username
AUTH_PASSWORD yourpassword Basic auth password
MONGODB_URI mongodb://localhost:27017 MongoDB connection string
MONGODB_DATABASE converter Database name
TEMP_DIR ./.data/converter Temporary file storage

Development

🔖 requirements

  • install uv venv package management
py -m pip install --upgrade uv
# create venv
uv venv
# activate venv
#win: .venv/Scripts/activate
#linux: source .venv/bin/activate
  • project requirements update
uv pip install --upgrade -r requirements.txt
  • build tools
uv pip install --upgrade setuptools build twine 

🪛 build

  • clean dist and build package
if (Test-Path ./dist) {rm ./dist -r -force}; `
python -m build && twine check dist/*
  • linux/mac
[ -d ./dist ] && rm -rf ./dist
python -m build && twine check dist/*

📦 test / 🧪 debugger

Install the package in editable project location

uv pip install -U -e .
uv pip show ws-mark-flow

code quality tools

# .\src\robot
uv pip install -U scanreq prospector[with_everything]
## unused requirements
scanreq -r requirements.txt -p ./src
## style/linting
prospector ./src -t pylint -t pydocstyle
## code quality/complexity
prospector ./src -t vulture -t mccabe -t mypy 
## security
prospector ./src -t dodgy -t bandit
## package
prospector ./src -t pyroma

✈️ publish

  • pypi

    twine upload --verbose dist/* 
    

Docker

  • Build the Docker image (override version at build time if needed)
docker build -t ws-mark-flow ./app

# Copy environment file
cp .env.example ./app/.env
# Edit .env 

docker run -p 80:80 --env-file ./app/.env ws-mark-flow
# use host.docker.internal for MongoDB connection from container to host
docker run --add-host=host.docker.internal:host-gateway -p 80:80 --env-file ./app/.env ws-mark-flow

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ws_mark_flow-0.0.5.tar.gz (56.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ws_mark_flow-0.0.5-py3-none-any.whl (65.1 kB view details)

Uploaded Python 3

File details

Details for the file ws_mark_flow-0.0.5.tar.gz.

File metadata

  • Download URL: ws_mark_flow-0.0.5.tar.gz
  • Upload date:
  • Size: 56.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ws_mark_flow-0.0.5.tar.gz
Algorithm Hash digest
SHA256 9c7cf4bcd6ac6abf5327f569b24e45709e5c3a36245851e928d6c36f1e3300e2
MD5 b8b9d254d0bb13c0be52fb7dca3ab257
BLAKE2b-256 69a335d049b10ee3c1c9b2829218a95071e379911e9c192ef0065fdbdb359e28

See more details on using hashes here.

File details

Details for the file ws_mark_flow-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: ws_mark_flow-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 65.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ws_mark_flow-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 91503ba3eab9be87782336fda28e535df6ddf2855a538ad20f8a2f6ef9b15e0f
MD5 868ca64513964e4f3445df1b06b11863
BLAKE2b-256 25b413639d0f45c6455393f57a03e9d63c3166e8da44db0688e088f7c464d482

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page