Skip to main content

Streaming ingestion pipeline for AMFI NAV and scheme master data

Project description

amfi-stream

Streaming-first ingestion pipeline for AMFI mutual fund data, built on Apache Arrow.

It transforms raw AMFI datasets into schema-safe, analytics-ready tables using a lightweight, parallel streaming engine.


What is amfi-stream

amfi-stream is a data ingestion layer that sits between AMFI data sources and analytics tools.

It is designed for:

  • Streaming ingestion of NAV and scheme master data
  • Automatic normalization of AMFI formats
  • Schema enforcement using Apache Arrow
  • Parallel data fetching and processing
  • Clean outputs for downstream analytics systems

Ecosystem overview

AMFI Data Sources (NAV, Scheme Files)
↓
amfi-stream
(Streaming Ingestion Engine)
↓
Sanitization + Normalization
(Arrow Schema Enforcement)
↓
Apache Arrow Tables
↓
Downstream Analytics Tools
(Polars / DuckDB / Pandas / Spark)

amfi-stream is a streaming ingestion and normalization layer, not a data API wrapper or analytics engine.


Ecosystem comparison

Solution Type Access Model Structure Multi-fund Support Streaming Cost Key Limitation
amfi-stream Ingestion pipeline Bulk streaming ingestion Arrow schema enforced Native dataset-level Yes Free Focused on ingestion, not APIs
mfapi.in API service REST endpoints JSON structured Client-side aggregation Limited Free Request-per-fund model
navpipe SDK Fund-code queries Polars output Requires fund list Yes Free Not dataset ingestion
mftool Library Scraping-based Partial Manual aggregation No Free Fragile parsing logic
AMFI India Portal Raw source File downloads None Post-processing required No Free Unstructured format

Core design principle

  • Most tools assume: Data is already structured and ready to consume.
  • amfi-stream assumes: Data is streamed, raw, and must be normalized deterministically before analysis.

Features

  • Streaming ingestion via HTTP (fsspec)
  • Automatic AMFI data sanitization
  • Schema enforcement using Apache Arrow
  • Parallel execution engine
  • Composable job-based architecture
  • Arrow-native outputs (no Pandas required)

Quick start

from amfi_stream import AMFIPipeline, stream_latest_nav, stream_scheme_master

jobs = [
    stream_scheme_master(),
    stream_latest_nav(),
]

with AMFIPipeline(max_workers=4) as pipeline:
    result = pipeline.run(jobs)

print(result.latest_nav)

Output Format

All outputs are returned as PyArrow tables:

AMFIResult(
    scheme_master=pa.Table,
    latest_nav=pa.Table,
    historical_nav=None  # coming soon
)

Architecture

URL Sources → Streaming Engine → Sanitizer → CSV Parser → Arrow Tables → Normalisers → Pipeline Output


Roadmap

  • Scheme Master ingestion
  • Latest NAV ingestion
  • Historical NAV ingestion

Design Philosophy

  • Streaming over batch processing
  • Schema-first ingestion
  • Apache Arrow as canonical format
  • Minimal dependencies
  • Deterministic, reproducible pipelines

Historical NAV (coming soon)

Historical NAV ingestion is the next planned feature.

It will enable:

  • Date-range based ingestion from AMFI
  • Chunked streaming over large time windows
  • Unified output with latest NAV schema
  • Full time-series dataset construction

Contributing

This project is released under the Apache 2.0 License, and contributions are welcome.

Areas where contributions are especially useful:

  • Historical NAV ingestion implementation
  • Performance improvements in ingestion engine
  • Additional normalization rules for AMFI formats
  • Test coverage expansion

License

Apache License 2.0


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amfi_stream-0.1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amfi_stream-0.1.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file amfi_stream-0.1.0.tar.gz.

File metadata

  • Download URL: amfi_stream-0.1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amfi_stream-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fbf11550af4ffc344a050b53a3703ae944d20b5380b15425081fbefff8421491
MD5 5db476439ab5262e2d5ce2fdd9215987
BLAKE2b-256 0383363e0f66ddfe5dc63c30049b670defeb2c6b1de29065f56a4d020c8e1edc

See more details on using hashes here.

Provenance

The following attestation bundles were made for amfi_stream-0.1.0.tar.gz:

Publisher: publish.yml on MSM2002/amfi-stream

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file amfi_stream-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: amfi_stream-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amfi_stream-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52f647592e13b510b8156f0dc64b05ede6142e25fc97d4c199eceabf9d9bd2fc
MD5 1df99c3268ff8219c9351ecd82ec5b3b
BLAKE2b-256 8f1a803bfd775f093af80c6dd473f82cc69416cc481bbb991764c8e7d546898e

See more details on using hashes here.

Provenance

The following attestation bundles were made for amfi_stream-0.1.0-py3-none-any.whl:

Publisher: publish.yml on MSM2002/amfi-stream

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page