Skip to main content

A scheduler-driven data transfer platform

Project description

FileFlow Agent

A modular, scheduler-driven data transfer platform built with Python. FileFlow automates the movement of files between configurable storage backends with support for cron scheduling, processing pipelines, deduplication, backup, and retention policies.

Features

  • Multi-backend connectors — Local filesystem, SFTP, AWS S3, SCP, HDFS
  • Advanced Job Configuration — Define standalone connection properties (Host, Port, User, Password) independently per job, enabling multiple distinct SFTP transfers
  • Cron scheduling — APScheduler with per-job cron expressions
  • Processing pipeline — Compress, decompress, and rename files in transit
  • Deduplication — SQLite-backed tracking to prevent duplicate transfers
  • Reliable backup & retention — Configurable backup directories with automatic strict retention cleanup
  • Transfer verification — Size match, checksum, and existence checks
  • Neumorphic Dashboard — Responsive, clean 'Soft UI' realtime monitoring and config management interface
  • REST API — Health checks, transfer stats, job listing, and log streaming

Architecture

├── configs/                # YAML job definitions
│   ├── jobs.yaml
│   └── test_jobs.yaml
├── src/fileflow_agent/
│   ├── api/                # FastAPI endpoints + dashboard serving
│   ├── config/             # Pydantic models and settings loader
│   ├── connectors/         # Source/Destination connector implementations
│   ├── logging/            # Structured rotating logger
│   ├── processing/         # File processing pipeline
│   ├── scheduler/          # APScheduler integration
│   ├── services/           # Transfer, backup, retention, verification
│   ├── static/             # Dashboard frontend (HTML/CSS/JS)
│   ├── tracking/           # SQLite transfer history & deduplication
│   ├── utils/              # Checksum utilities
│   └── main.py             # Application entrypoint
├── test_*.py               # Unit and integration tests
├── .env.example
├── run.sh                  # Easy startup script
├── pyproject.toml
├── requirements.txt
└── README.md

Getting Started

Prerequisites

  • Python 3.10+
  • pip

Installation & Workspace Setup

FileFlow Agent is designed as a standalone global Pip library. When you install it, it gives your system a new command-line tool fileflow.

# 1. Install via Pip (In a virtual environment or globally)
pip install fileflow-agent

# 2. Initialize a secure Workspace
# This creates localized databases, configuration templates, and log directories.
fileflow init ~/my_fileflow_workspace

# 3. Start the Agent from the configured workspace
fileflow start ~/my_fileflow_workspace --port 8000

Once running, open http://localhost:8000 to access the Neumorphic monitoring dashboard.

Configuration

The fileflow init command will automatically scaffold a .env and configs/jobs.yaml in your chosen workspace directory.

  1. Environment Config (~/my_fileflow_workspace/.env) Set your UI authentication credentials and global AWS/SFTP master keys if needed.

  2. Job Config (~/my_fileflow_workspace/configs/jobs.yaml) (You can edit this file manually, or configure jobs entirely from the Web Dashboard without touching YAML!)

This is what a YAML job definition looks like:

jobs:
  - job_id: daily_backup
    enabled: true
    schedule: "0 */6 * * *"

    source:
      type: local
      path: /data/incoming
      file_pattern: "*.csv"

    destination:
      type: s3
      path: archive/csv
      bucket: my-bucket

    processing:
      enabled: true
      steps:
        - compress

    backup:
      enabled: true
      location: backups/daily
      retention_days: 30

    verification:
      method: size_match

(You can also configure jobs entirely from the Web Dashboard without touching YAML!)

The built-in Neumorphic web dashboard provides:

View Description
Overview Transfer stats (total, success, failed, duplicates) and recent transfer table
Configuration Form-based job editor — add, edit, delete jobs and reload the scheduler live
System Logs Real-time log viewer with auto-refresh

API Endpoints

Method Path Description
GET /health Health check
GET /jobs List configured jobs
GET /transfers Recent transfer records
GET /stats/summary Aggregated transfer statistics
GET /logs/recent Recent log entries
GET /api/config Read raw YAML config
POST /api/config Save config and reload scheduler

Extending Connectors

Implement SourceConnector or DestinationConnector from connectors/base.py and register in connectors/factory.py:

from fileflow_agent.connectors.base import SourceConnector

class MySourceConnector(SourceConnector):
    def list_files(self, path, pattern=None):
        ...

    def download_file(self, remote_path, local_path):
        ...

    def get_metadata(self, remote_path):
        ...

Publishing to PyPI

If you are a maintainer, you can package and release new versions to the official Python Package Index (PyPI).

  1. Update the version = "X.Y.Z" string inside pyproject.toml.
  2. Ensure you have the build tools installed:
    pip install build twine
    
  3. Remove old build artifacts and generate the new Wheel (.whl) and Source Tarball (.tar.gz):
    rm -rf build dist src/*.egg-info
    python -m build
    
  4. Upload the built packages to PyPI securely using Twine:
    twine upload dist/*
    
    (You will be prompted for your PyPI API token, which should be prefixed with pypi-)

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is open source and available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fileflow_agent-0.1.0.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fileflow_agent-0.1.0-py3-none-any.whl (49.0 kB view details)

Uploaded Python 3

File details

Details for the file fileflow_agent-0.1.0.tar.gz.

File metadata

  • Download URL: fileflow_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for fileflow_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b1ff112af1c7765bff9953c10514a2693125a3781612f2e8b9cb3fa683713f11
MD5 0155d85b1b2eb394f70daf4e66910f47
BLAKE2b-256 1004a957aa9971dea4f833052be63622053eb780ceaf0455d825ebf467825118

See more details on using hashes here.

File details

Details for the file fileflow_agent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fileflow_agent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 49.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for fileflow_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5cc6f34efe9ff3051ad1ade21b052b33e12473739059ccecd488c4c102bb4c1e
MD5 2244969e16203c5a8524fd7aba44998e
BLAKE2b-256 c02d18243a5e7e11eb1856f99663e10089013130d8029c842b37827341d4faf9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page