A scheduler-driven data transfer platform
Project description
FileFlow Agent
A modular, scheduler-driven data transfer platform built with Python. FileFlow automates the movement of files between configurable storage backends with support for cron scheduling, processing pipelines, deduplication, backup, and retention policies.
Features
- Multi-backend connectors — Local filesystem, SFTP, AWS S3, SCP, HDFS
- Advanced Job Configuration — Define standalone connection properties (Host, Port, User, Password) independently per job, enabling multiple distinct SFTP transfers
- Cron scheduling — APScheduler with per-job cron expressions
- Processing pipeline — Compress, decompress, and rename files in transit
- Deduplication — SQLite-backed tracking to prevent duplicate transfers
- Reliable backup & retention — Configurable backup directories with automatic strict retention cleanup
- Transfer verification — Size match, checksum, and existence checks
- Neumorphic Dashboard — Responsive, clean 'Soft UI' realtime monitoring and config management interface
- REST API — Health checks, transfer stats, job listing, and log streaming
Architecture
├── configs/ # YAML job definitions
│ ├── jobs.yaml
│ └── test_jobs.yaml
├── src/fileflow_agent/
│ ├── api/ # FastAPI endpoints + dashboard serving
│ ├── config/ # Pydantic models and settings loader
│ ├── connectors/ # Source/Destination connector implementations
│ ├── logging/ # Structured rotating logger
│ ├── processing/ # File processing pipeline
│ ├── scheduler/ # APScheduler integration
│ ├── services/ # Transfer, backup, retention, verification
│ ├── static/ # Dashboard frontend (HTML/CSS/JS)
│ ├── tracking/ # SQLite transfer history & deduplication
│ ├── utils/ # Checksum utilities
│ └── main.py # Application entrypoint
├── test_*.py # Unit and integration tests
├── .env.example
├── run.sh # Easy startup script
├── pyproject.toml
├── requirements.txt
└── README.md
Getting Started
Prerequisites
- Python 3.10+
pip
Installation & Workspace Setup
FileFlow Agent is designed as a standalone global Pip library. When you install it, it gives your system a new command-line tool fileflow.
# 1. Install via Pip (In a virtual environment or globally)
pip install fileflow-agent
# 2. Initialize a secure Workspace
# This creates localized databases, configuration templates, and log directories.
fileflow init ~/my_fileflow_workspace
# 3. Start the Agent from the configured workspace
fileflow start ~/my_fileflow_workspace --port 7345
Once running, open http://localhost:7345 to access the Neumorphic monitoring dashboard.
Configuration
The fileflow init command will automatically scaffold a .env and configs/jobs.yaml in your chosen workspace directory.
-
Environment Config (
~/my_fileflow_workspace/.env) Set your UI authentication credentials and global AWS/SFTP master keys if needed. -
Job Config (
~/my_fileflow_workspace/configs/jobs.yaml) (You can edit this file manually, or configure jobs entirely from the Web Dashboard without touching YAML!)
This is what a YAML job definition looks like:
jobs:
- job_id: daily_backup
enabled: true
schedule: "0 */6 * * *"
source:
type: local
path: /data/incoming
file_pattern: "*.csv"
destination:
type: s3
path: archive/csv
bucket: my-bucket
processing:
enabled: true
steps:
- compress
backup:
enabled: true
location: backups/daily
retention_days: 30
verification:
method: size_match
(You can also configure jobs entirely from the Web Dashboard without touching YAML!)
The built-in Neumorphic web dashboard provides:
| View | Description |
|---|---|
| Overview | Transfer stats (total, success, failed, duplicates) and recent transfer table |
| Configuration | Form-based job editor — add, edit, delete jobs and reload the scheduler live |
| System Logs | Real-time log viewer with auto-refresh |
API Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
GET |
/jobs |
List configured jobs |
GET |
/transfers |
Recent transfer records |
GET |
/stats/summary |
Aggregated transfer statistics |
GET |
/logs/recent |
Recent log entries |
GET |
/api/config |
Read raw YAML config |
POST |
/api/config |
Save config and reload scheduler |
Extending Connectors
Implement SourceConnector or DestinationConnector from connectors/base.py and register in connectors/factory.py:
from fileflow_agent.connectors.base import SourceConnector
class MySourceConnector(SourceConnector):
def list_files(self, path, pattern=None):
...
def download_file(self, remote_path, local_path):
...
def get_metadata(self, remote_path):
...
Contributing
Contributions are welcome. Please open an issue first to discuss what you'd like to change.
- Fork the repository (https://github.com/emoncse/fileflow)
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is open source and available under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fileflow_agent-0.4.0.tar.gz.
File metadata
- Download URL: fileflow_agent-0.4.0.tar.gz
- Upload date:
- Size: 50.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
441d1f93516e61e0f3bae5523a63bf4cc57d217a840272d1c97229b9b08bc79a
|
|
| MD5 |
6f4df2ed79adb59556ab386b7ab807fb
|
|
| BLAKE2b-256 |
838b3c9db107ffd73c166bde61caae3229d587945a87abb4d67f415396dd0863
|
Provenance
The following attestation bundles were made for fileflow_agent-0.4.0.tar.gz:
Publisher:
workflow.yml on emoncse/fileflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fileflow_agent-0.4.0.tar.gz -
Subject digest:
441d1f93516e61e0f3bae5523a63bf4cc57d217a840272d1c97229b9b08bc79a - Sigstore transparency entry: 1441011711
- Sigstore integration time:
-
Permalink:
emoncse/fileflow@7fc046045af120c6cbd04844538016bea7e456e4 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/emoncse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@7fc046045af120c6cbd04844538016bea7e456e4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file fileflow_agent-0.4.0-py3-none-any.whl.
File metadata
- Download URL: fileflow_agent-0.4.0-py3-none-any.whl
- Upload date:
- Size: 57.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0157d7987f931fbfd02496698bdcd766aaf32f6c4fb122b9ce23ef22c57c2371
|
|
| MD5 |
85e93b1549528007c564654bb6694bca
|
|
| BLAKE2b-256 |
5501f967c630f9a6864b78462d2a3b7a6db1e105a4066aeeec434e2291e7b21e
|
Provenance
The following attestation bundles were made for fileflow_agent-0.4.0-py3-none-any.whl:
Publisher:
workflow.yml on emoncse/fileflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fileflow_agent-0.4.0-py3-none-any.whl -
Subject digest:
0157d7987f931fbfd02496698bdcd766aaf32f6c4fb122b9ce23ef22c57c2371 - Sigstore transparency entry: 1441011860
- Sigstore integration time:
-
Permalink:
emoncse/fileflow@7fc046045af120c6cbd04844538016bea7e456e4 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/emoncse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@7fc046045af120c6cbd04844538016bea7e456e4 -
Trigger Event:
release
-
Statement type: