Convert files from various sources (SharePoint, S3, Azure Blob, etc.) to Markdown and upload to destinations (Google Drive, SharePoint, etc.).
Project description
Ws-Mark-Flow AI Converter
Convert files from various sources (SharePoint, S3, Azure Blob, etc.) to Markdown and upload to destinations (Google Drive, SharePoint, etc.).
Features
- Multi-source support: SharePoint, S3, Azure Blob Storage (extensible)
- Multi-destination support: Google Drive, SharePoint, S3 (extensible)
- File conversion: PDF, DOCX, PPTX, XLSX, CSV, images, and more → Markdown
- Incremental conversion: Only converts files not already in destination
- Job persistence: MongoDB-backed job storage for resumable pipelines
- REST API: FastAPI-based API for job management
- Progress tracking: Real-time conversion progress and statistics
Architecture
┌─────────────┐ ┌──────────────┐ ┌───────────────┐
│ Source │────▶│ Converter │────▶│ Destination │
│ (SharePoint)│ │ (MarkItDown) │ │(Google Drive) │
└─────────────┘ └──────────────┘ └───────────────┘
│
┌──────▼──────┐
│ MongoDB │
│ (Job Store) │
└─────────────┘
Installation
# Install dependencies
uv pip install -r requirements.txt
# Copy environment file
cp .env.example .env
# Edit .env with your MongoDB URI
# Run with auto-reload
uvicorn src.app:app --reload --port 8000
API Documentation
- API docs: http://localhost:8000/docs
- Redocly UI: http://localhost:8000/redoc
- OpenAPI spec: http://localhost:8000/openapi.json
Supported Integrations
Sources
- SharePoint (
sharepoint): Microsoft Graph API - More coming: S3, Azure Blob, Local filesystem
Destinations
- Google Drive (
google_drive): Google Drive API v3 - More coming: SharePoint, S3, Azure Blob
Supported File Types
Converted using Microsoft MarkItDown, Docling or LLM-based analysis for complex PDFs & images.
- Documents: PDF, DOCX, DOC, RTF, TXT
- Presentations: PPTX, PPT
- Spreadsheets: XLSX, XLS, CSV
- Web: HTML, XML, JSON, YAML
- Images: PNG, JPG, GIF, BMP, TIFF (OCR)
Configuration
Main Environment Variables
| Variable | Default | Description |
|---|---|---|
AUTH_USERNAME |
admin |
Basic auth username |
AUTH_PASSWORD |
yourpassword |
Basic auth password |
MONGODB_URI |
mongodb://localhost:27017 |
MongoDB connection string |
MONGODB_DATABASE |
converter |
Database name |
TEMP_DIR |
./.data/converter |
Temporary file storage |
Development
🔖 requirements
- install uv venv package management
py -m pip install --upgrade uv
# create venv
uv venv
# activate venv
#win: .venv/Scripts/activate
#linux: source .venv/bin/activate
- project requirements update
uv pip install --upgrade -r requirements.txt
- build tools
uv pip install --upgrade setuptools build twine
🪛 build
- clean dist and build package
if (Test-Path ./dist) {rm ./dist -r -force}; `
python -m build && twine check dist/*
- linux/mac
[ -d ./dist ] && rm -rf ./dist
python -m build && twine check dist/*
📦 test / 🧪 debugger
Install the package in editable project location
uv pip install -U -e .
uv pip show ws-mark-flow
code quality tools
# .\src\robot
uv pip install -U scanreq prospector[with_everything]
## unused requirements
scanreq -r requirements.txt -p ./src
## style/linting
prospector ./src -t pylint -t pydocstyle
## code quality/complexity
prospector ./src -t vulture -t mccabe -t mypy
## security
prospector ./src -t dodgy -t bandit
## package
prospector ./src -t pyroma
✈️ publish
-
twine upload --verbose dist/*
Docker
- Build the Docker image (override version at build time if needed)
docker build -t ws-mark-flow ./app
# Copy environment file
cp .env.example ./app/.env
# Edit .env
docker run -p 80:80 --env-file ./app/.env ws-mark-flow
# use host.docker.internal for MongoDB connection from container to host
docker run --add-host=host.docker.internal:host-gateway -p 80:80 --env-file ./app/.env ws-mark-flow
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ws_mark_flow-0.0.5.tar.gz.
File metadata
- Download URL: ws_mark_flow-0.0.5.tar.gz
- Upload date:
- Size: 56.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c7cf4bcd6ac6abf5327f569b24e45709e5c3a36245851e928d6c36f1e3300e2
|
|
| MD5 |
b8b9d254d0bb13c0be52fb7dca3ab257
|
|
| BLAKE2b-256 |
69a335d049b10ee3c1c9b2829218a95071e379911e9c192ef0065fdbdb359e28
|
File details
Details for the file ws_mark_flow-0.0.5-py3-none-any.whl.
File metadata
- Download URL: ws_mark_flow-0.0.5-py3-none-any.whl
- Upload date:
- Size: 65.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91503ba3eab9be87782336fda28e535df6ddf2855a538ad20f8a2f6ef9b15e0f
|
|
| MD5 |
868ca64513964e4f3445df1b06b11863
|
|
| BLAKE2b-256 |
25b413639d0f45c6455393f57a03e9d63c3166e8da44db0688e088f7c464d482
|