Skip to main content

Streaming ingestion pipeline for AMFI NAV and scheme master data

Project description

amfi-stream

Streaming-first ingestion pipeline for AMFI mutual fund data, built on Apache Arrow.

It transforms raw AMFI datasets into schema-safe, analytics-ready tables using a lightweight, parallel streaming engine.


What is amfi-stream

amfi-stream is a data ingestion layer that sits between AMFI data sources and analytics tools.

It is designed for:

  • Streaming ingestion of NAV and scheme master data
  • Automatic normalization of AMFI formats
  • Schema enforcement using Apache Arrow
  • Parallel data fetching and processing
  • Clean outputs for downstream analytics systems

Ecosystem overview

AMFI Data Sources (NAV, Scheme Files)
↓
amfi-stream
(Streaming Ingestion Engine)
↓
Sanitization + Normalization
(Arrow Schema Enforcement)
↓
Apache Arrow Tables
↓
Downstream Analytics Tools
(Polars / DuckDB / Pandas / Spark)

amfi-stream is a streaming ingestion and normalization layer, not a data API wrapper or analytics engine.


Ecosystem comparison

Solution Type Access Model Structure Multi-fund Support Streaming Cost Key Limitation
amfi-stream Ingestion pipeline Bulk streaming ingestion Arrow schema enforced Native dataset-level Yes Free Focused on ingestion, not APIs
mfapi.in API service REST endpoints JSON structured Client-side aggregation Limited Free Request-per-fund model
navpipe SDK Fund-code queries Polars output Requires fund list Yes Free Not dataset ingestion
mftool Library Scraping-based Partial Manual aggregation No Free Fragile parsing logic
AMFI India Portal Raw source File downloads None Post-processing required No Free Unstructured format

Core design principle

  • Most tools assume: Data is already structured and ready to consume.
  • amfi-stream assumes: Data is streamed, raw, and must be normalized deterministically before analysis.

Features

  • Streaming ingestion via HTTP (fsspec)
  • Automatic AMFI data sanitization
  • Schema enforcement using Apache Arrow
  • Parallel execution engine
  • Composable job-based architecture
  • Arrow-native outputs (no Pandas required)

Quick start

from amfi_stream import AMFIPipeline, stream_latest_nav, stream_scheme_master, stream_historical_nav

jobs = [
    stream_scheme_master(),
    stream_latest_nav(),
    stream_historical_nav("1-May-2025", "1-May-2026")
]

with AMFIPipeline(max_workers=4) as pipeline:
    result = pipeline.run(jobs)

print(result.latest_nav)

Output Format

All outputs are returned as PyArrow tables:

AMFIResult(
    scheme_master=pa.Table | None,
    latest_nav=pa.Table | None,
    historical_nav=pa.Table | None,
)

Architecture

URL Sources → Streaming Engine → Sanitizer → CSV Parser → Arrow Tables → Normalisers → Pipeline Output


Coming Soon

We are introducing an enhanced output schema that extends raw AMFI NAV data with additional derived, analytics-ready columns.

These improvements will provide a more structured and computation-friendly dataset on top of the standard AMFI format, reducing the need for post-processing in downstream tools and improving compatibility with analytical workflows in Arrow-native environments.


Design Philosophy

  • Streaming over batch processing
  • Schema-first ingestion
  • Apache Arrow as canonical format
  • Minimal dependencies
  • Deterministic, reproducible pipelines

Contributing

This project is released under the Apache 2.0 License, and contributions are welcome.

Areas where contributions are especially useful:

  • Historical NAV ingestion implementation
  • Performance improvements in ingestion engine
  • Additional normalization rules for AMFI formats
  • Test coverage expansion

License

Apache License 2.0


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amfi_stream-0.2.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amfi_stream-0.2.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file amfi_stream-0.2.0.tar.gz.

File metadata

  • Download URL: amfi_stream-0.2.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amfi_stream-0.2.0.tar.gz
Algorithm Hash digest
SHA256 cd31586bfb4dc4478fe8b026c473a08976fe4550ee8dbfa4f9247ee28ccb2356
MD5 03c948a865f93bf3e763cd0feb0a0b40
BLAKE2b-256 ba22c0f45a9628a71fe774b6cab5a68970931ae6b72fab730ee4a2b1ce8638e7

See more details on using hashes here.

Provenance

The following attestation bundles were made for amfi_stream-0.2.0.tar.gz:

Publisher: publish.yml on MSM2002/amfi-stream

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file amfi_stream-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: amfi_stream-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amfi_stream-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe6c8cb30ffd4c768dd348b7ed81514e807080e457ad4955ff6435fd9f56e2b2
MD5 599598d956b121e5c60f39aadfd44200
BLAKE2b-256 324abfdff85f0abfc5ddca053a4ef8141a690caad1913fda46a429c60f33d81a

See more details on using hashes here.

Provenance

The following attestation bundles were made for amfi_stream-0.2.0-py3-none-any.whl:

Publisher: publish.yml on MSM2002/amfi-stream

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page