Skip to main content

Streaming ingestion pipeline for AMFI NAV and scheme master data

Project description

⚡ amfi-stream

Python License Arrow Status Built For

Streaming-first ingestion for AMFI mutual fund data.

Turn raw AMFI files into clean, schema-safe, analytics-ready Arrow tables — in parallel, without hacks.


🚀 The problem (you already know this)

AMFI data is:

  • inconsistent
  • semi-structured
  • painful to clean
  • not pipeline-friendly

Every existing tool assumes:

“just fetch and parse it”

That breaks at scale.


⚡ The shift

Don’t query AMFI. Ingest it properly.


🧩 What amfi-stream does

Raw AMFI Data
(NAV + Scheme Files)
        ↓
⚡ amfi-stream
(stream + sanitize + normalize)
        ↓
Arrow Tables (typed, clean)
        ↓
Polars / DuckDB / Pandas / Spark

✨ Why people switch

  • ⚡ Streaming instead of batch downloads
  • 🧼 Automatic normalization (no manual cleaning)
  • 🧱 Strong schema via Apache Arrow
  • 🧵 Parallel ingestion engine
  • 📊 Directly usable in analytics tools
  • 🐼 No Pandas dependency

🆚 Alternatives (quick reality check)

Tool Model Why it breaks
mfapi.in API calls One request per fund → slow
navpipe SDK Needs pre-known fund list
mftool Scraper Fragile, breaks silently
AMFI site Raw files No structure

amfi-stream:

✔ Dataset-level ingestion
✔ Streaming + parallel
✔ Schema enforced
✔ Built for pipelines


⚡ Quick start

from amfi_stream import (
    AMFIPipeline,
    stream_latest_nav,
    stream_scheme_master,
    stream_historical_nav
)

jobs = [
    stream_scheme_master(),
    stream_latest_nav(),
    stream_historical_nav("1-May-2025", "1-May-2026")
]

with AMFIPipeline(max_workers=4) as pipeline:
    result = pipeline.run(jobs)

print(result.latest_nav)

📦 Output

AMFIResult(
    scheme_master=pa.Table | None,
    latest_nav=pa.Table | None,
    historical_nav=pa.Table | None,
)

Typed. Predictable. Analytics-ready.


🏗 Architecture

URLs
 ↓
Streaming Engine
 ↓
Sanitizer
 ↓
Parser
 ↓
Arrow Tables
 ↓
Normalizers
 ↓
Pipeline Output

🔥 Design principles

  • Streaming > batch
  • Schema > guesswork
  • Arrow > DataFrame conversions
  • Deterministic > fragile parsing
  • Minimal > bloated

🔮 Coming soon

  • Derived analytics-ready columns
  • Enhanced schema layers
  • Faster historical ingestion

🤝 Contributing

If you’ve ever fought AMFI data, you already know why this exists.

Open areas:

  • Performance tuning
  • Enhanced schema creation
  • Benchmark comparison
  • Tests
  • Documentation and docstrings

⭐ If this helped you

Give it a star — it helps more people discover a better way to handle AMFI data.


📜 License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amfi_stream-0.3.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amfi_stream-0.3.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file amfi_stream-0.3.0.tar.gz.

File metadata

  • Download URL: amfi_stream-0.3.0.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amfi_stream-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fd99969c8e8b647b949591863f8ab2f25554a318e54bdf328c11a738c204a758
MD5 f20d2d880c148aa395ab18a4244eddda
BLAKE2b-256 62c448aac070d38e3afcde48abecb3b0fdb6ece20e0423d7e76511d2a89bf42e

See more details on using hashes here.

Provenance

The following attestation bundles were made for amfi_stream-0.3.0.tar.gz:

Publisher: publish.yml on MSM2002/amfi-stream

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file amfi_stream-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: amfi_stream-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amfi_stream-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b2cfc894e6b6710b7af7ecf5bfa23fbb09f6faa3ffe39eb79ed57def988638a
MD5 a1015de0abfc045152ee288746f9c143
BLAKE2b-256 4f82e3c6573c499f4c4eed1438fb0462bd5fbb36b829e53c6b6ae3acf34d9754

See more details on using hashes here.

Provenance

The following attestation bundles were made for amfi_stream-0.3.0-py3-none-any.whl:

Publisher: publish.yml on MSM2002/amfi-stream

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page