Skip to main content

package for map a list

Project description

functional_list

Functional programming for Python lists with Spark RDD-style transformations

Docs Ray Dask Asyncio PyArrow Pipeline Coverage

🎯 Overview

`functional_list` brings functional programming paradigms to Python lists, inspired by Apache Spark RDD operations. It provides both eager (`ListMapper`) and lazy (`LazyListMapper`) execution modes, making data transformations more expressive and chainable.

✨ Key Features

  • 🔗 Functional-style transformations: `map`, `filter`, `reduce`, `flat_map`, `reduce_by_key`, and more
  • ⚡ Multiple execution backends:
    • `Serial` - Simple sequential execution
    • `Local` - Multi-threaded or multi-process parallelization
    • `Async` - Asynchronous I/O operations
    • `Ray` - Distributed computing with Ray
    • `Dask` - Distributed computing with Dask
  • 💤 Lazy evaluation: Build transformation pipelines that execute only when needed
  • 📁 File I/O support: Read from CSV, JSON, JSONL, Parquet, and text files
  • 🚀 Cython-accelerated operations: Optional compiled extensions for performance-critical operations
  • 🐍 Fully typed: Complete type hints for better IDE support and type checking
  • 📦 Zero required dependencies: Install only what you need with optional extras

📋 Requirements

  • Python 3.10+ (Python 3.6-3.9 are not supported in recent versions)

📦 Installation

Basic Installation

pip install functional-list

Or using `uv`:

uv add functional-list

Installation with Optional Features

Install with specific backends or I/O support:

# For Ray distributed computing
pip install functional-list[ray]
# For Dask distributed computing
pip install functional-list[dask]
# For Parquet/CSV file I/O support
pip install functional-list[io]
# Install everything
pip install functional-list[all]

With `uv`:

uv add "functional-list[all]"

🚀 Quick Start

Basic Usage

from functional_list import ListMapper
# Create a ListMapper
numbers = ListMapper[int](1, 2, 3, 4, 5)
# Chain transformations
result = (
    numbers
    .map(lambda x: x * x)           # [1, 4, 9, 16, 25]
    .filter(lambda x: x % 2 == 0)   # [4, 16]
    .reduce(lambda x, y: x + y)     # 20
)
print(result)  # 20

Word Count Example

The classic MapReduce word count example:

from functional_list import ListMapper
# Given: a list of text documents
document = ListMapper[str](
    "python is good",
    "python is better than x",
    "python is the best",
)
# When: perform word count using functional transformations
word_counts = (
    document
    .flat_map(lambda line: line.split())      # Split into words
    .map(lambda word: (word, 1))              # Create (word, count) pairs
    .reduce_by_key(lambda x, y: x + y)        # Sum counts by word
)
# Then: result is a list of (word, count) tuples
print(word_counts)
# Output: [('than', 1), ('the', 1), ('best', 1), ('better', 1), 
#          ('good', 1), ('is', 3), ('python', 3), ('x', 1)]

Working with Standard List Operations

`ListMapper` maintains compatibility with Python's built-in list operations:

from functional_list import ListMapper
my_list = ListMapper[int](2, 4, 9, 13, 15, 20)
# Standard list operations work as expected
my_list.append(55)
print(my_list)  # [2, 4, 9, 13, 15, 20, 55]
# Indexing and slicing
print(my_list[0])     # 2
print(my_list[1:4])   # [4, 9, 13]
# Length
print(len(my_list))   # 7
# Chain functional operations
result = (
    my_list
    .map(lambda x: x * x)
    .filter(lambda x: x % 2 == 0)
    .reduce(lambda x, y: x + y)
)
print(result)  # 3720

💤 Lazy Evaluation

Use `LazyListMapper` for deferred execution - transformations are only computed when needed:

from functional_list import ListMapper
# Convert to lazy mode
lazy_pipeline = (
    ListMapper[int](1, 2, 3, 4, 5)
    .lazy()                              # Switch to lazy evaluation
    .map(lambda x: x * 2)
    .filter(lambda x: x > 5)
    .map(lambda x: x ** 2)
)
# No computation happens yet!
# Materialize the results
result = lazy_pipeline.collect()         # Now computation happens
print(result)  # [36, 64, 100]
# Or iterate (also materializes)
for item in lazy_pipeline:
    print(item)

⚡ Execution Backends

Choose the right backend for your workload:

Serial Backend (Default)

from functional_list import ListMapper
data = ListMapper[int](1, 2, 3, 4, 5)
result = data.map(lambda x: x * 2).collect()

Local Backend (Multi-threading/Multi-processing)

from functional_list import ListMapper, LocalBackend
data = ListMapper[int](range(1000))
# Use threading for I/O-bound tasks
result = data.map(
    lambda x: expensive_io_operation(x),
    backend=LocalBackend(use_threads=True, max_workers=10)
).collect()
# Use multiprocessing for CPU-bound tasks
result = data.map(
    lambda x: expensive_cpu_operation(x),
    backend=LocalBackend(use_processes=True, max_workers=4)
).collect()

Async Backend

from functional_list import ListMapper, AsyncBackend
import asyncio
async def async_fetch(url):
    # Your async code here
    pass
data = ListMapper[str](["url1", "url2", "url3"])
result = data.map(async_fetch, backend=AsyncBackend()).collect()

Ray Backend (Distributed Computing)

from functional_list import ListMapper, RayBackend
# Requires: pip install functional-list[ray]
data = ListMapper[int](range(10000))
result = data.map(
    lambda x: complex_computation(x),
    backend=RayBackend(num_cpus=8)
).collect()

Dask Backend (Distributed Computing)

from functional_list import ListMapper, DaskBackend
# Requires: pip install functional-list[dask]
data = ListMapper[int](range(10000))
result = data.map(
    lambda x: complex_computation(x),
    backend=DaskBackend(n_workers=4)
).collect()

📁 File I/O Operations

`functional_list` provides built-in support for reading data from various file formats:

Supported Formats

Format Description Requires
CSV Comma-separated values Built-in
JSON JSON arrays or objects Built-in
JSONL JSON Lines (one object per line) Built-in
Parquet Columnar storage format `pyarrow`
Text Plain text files Built-in

Reading CSV Files

from functional_list import ListMapper
from functional_list.io import CSVReadOptions
# Read CSV with custom options
users = ListMapper.from_csv(
    "users.csv",
    options=CSVReadOptions(
        skip_header=True,
        delimiter=",",
        encoding="utf-8"
    ),
    transform=lambda row: {
        "name": row[0],
        "age": int(row[1]),
        "email": row[2]
    }
)
# Process the data
adults = users.filter(lambda user: user["age"] >= 18)

Reading JSON Files

from functional_list import ListMapper
# Read JSON array
data = ListMapper.from_json("data.json")
# Read and transform
names = (
    ListMapper.from_json("users.json")
    .map(lambda user: user["name"])
    .filter(lambda name: len(name) > 3)
)

Reading JSONL Files

from functional_list import ListMapper
# Each line is a separate JSON object
events = ListMapper.from_jsonl("events.jsonl")
# Process streaming logs
errors = (
    events
    .filter(lambda e: e.get("level") == "ERROR")
    .map(lambda e: e["message"])
)

Reading Parquet Files

from functional_list import ListMapper
# Read entire Parquet file
data = ListMapper.from_parquet("data.parquet")
# Read specific columns only
users = ListMapper.from_parquet(
    "users.parquet",
    columns=["name", "age", "country"]
)
# Process efficiently
summary = (
    users
    .filter(lambda u: u["country"] == "USA")
    .map(lambda u: u["age"])
    .reduce(lambda x, y: x + y)
)

Reading Text Files

from functional_list import ListMapper
from functional_list.io import TextReadOptions
# Read with options
lines = ListMapper.from_text(
    "log.txt",
    options=TextReadOptions(
        strip_lines=True,      # Remove whitespace
        skip_empty=True,       # Skip empty lines
        encoding="utf-8"
    )
)
# Process log file
error_lines = (
    lines
    .filter(lambda line: "ERROR" in line)
    .map(lambda line: line.split("|"))
)

📚 Core API Reference

Transformation Methods

Method Description Example
`map(fn)` Apply function to each element `data.map(lambda x: x * 2)`
`filter(fn)` Keep elements where fn returns True `data.filter(lambda x: x > 0)`
`flat_map(fn)` Map and flatten results `data.flat_map(lambda x: [x, x*2])`
`reduce(fn)` Reduce to single value `data.reduce(lambda x, y: x + y)`
`reduce_by_key(fn)` Reduce grouped by key `pairs.reduce_by_key(lambda x, y: x + y)`
`group_by(fn)` Group elements by key function `data.group_by(lambda x: x % 2)`
`sort(key, reverse)` Sort elements `data.sort(key=lambda x: x)`
`distinct()` Remove duplicates `data.distinct()`
`take(n)` Take first n elements `data.take(10)`
`sample(n)` Random sample of n elements `data.sample(5)`

Aggregation Methods

Method Description Example
`count()` Count elements `data.count()`
`sum()` Sum numeric elements `data.sum()`
`mean()` Calculate mean `data.mean()`
`min()` Find minimum `data.min()`
`max()` Find maximum `data.max()`
`collect()` Materialize to list `lazy_data.collect()`

🎓 Advanced Examples

Processing Log Files

from functional_list import ListMapper
from datetime import datetime
# Parse and analyze log files
errors_by_hour = (
    ListMapper.from_text("app.log")
    .filter(lambda line: "ERROR" in line)
    .map(lambda line: line.split("|"))
    .map(lambda parts: {
        "timestamp": datetime.fromisoformat(parts[0]),
        "message": parts[2]
    })
    .map(lambda e: (e["timestamp"].hour, 1))
    .reduce_by_key(lambda x, y: x + y)
    .sort(key=lambda x: x[1], reverse=True)
)

ETL Pipeline

from functional_list import ListMapper
# Load from multiple sources
csv_users = ListMapper.from_csv("users.csv", transform=parse_user)
json_users = ListMapper.from_json("new_users.json")
# Combine and process
all_users = (
    csv_users
    .union(json_users)
    .distinct()
    .filter(lambda u: u["active"])
    .map(lambda u: enrich_user(u))
)
# Save results
all_users.to_json("processed_users.json")

Parallel Web Scraping

from functional_list import ListMapper, LocalBackend
import requests
def fetch_page(url):
    return requests.get(url).text
urls = ListMapper[str](
    "https://example.com/page1",
    "https://example.com/page2",
)
# Fetch pages in parallel
pages = urls.map(
    fetch_page,
    backend=LocalBackend(use_threads=True, max_workers=10)
)
# Extract data
results = pages.map(parse_html).flat_map(extract_links).distinct()

🔧 Performance Tips

  1. Choose the right backend: Use `LocalBackend` with threads for I/O-bound tasks, processes for CPU-bound
  2. Use lazy evaluation: Build pipelines with `.lazy()` to optimize execution
  3. Cache intermediate results: Use `.cache()` on expensive computations
  4. Batch operations: Combine multiple transformations before materializing
  5. Use Cython accelerators: Ensure extensions are compiled for numerical operations

🤝 Contributing

Contributions are welcome! Please check out our GitLab repository.

Development Setup

# Clone the repository
git clone https://gitlab.com/Tantelitiana22/list-function-python-project.git
cd list-function-python-project
# Install with development dependencies
uv sync --group dev --extra all
# Run tests
uv run pytest
# Run type checking
uv run mypy ./src/functional_list/
# Run linters
uv run flake8 ./src/functional_list/
uv run pylint ./src/functional_list/

📖 Documentation

Full Documentation (MkDocs)

Complete documentation is available at https://sensational-cobbler-2b96f1.netlify.app/ To run documentation locally:

uv sync --group dev
mkdocs serve -f documentation/mkdocs.yml

Quick API Reference

from functional_list import ListMapper
# List all methods
print(dir(ListMapper))
# Get documentation for a specific method
print(ListMapper.map.__doc__)
# Get help
help(ListMapper.reduce_by_key)

❓ FAQ & Troubleshooting

Why am I getting "module not found" errors for Ray/Dask?

You need to install the optional dependencies:

pip install functional-list[ray]  # For Ray
pip install functional-list[dask]  # For Dask
pip install functional-list[all]   # For everything

Can I use this with Python 3.9 or earlier?

No, `functional_list` requires Python 3.10+. Earlier versions are not supported.

How do I improve performance for large datasets?

  1. Use lazy evaluation: `.lazy()` to defer execution
  2. Choose appropriate backends (Ray/Dask for distributed computing)
  3. Use `.cache()` for intermediate results you'll reuse
  4. Ensure Cython extensions are compiled

Does this work with async functions?

Yes! Use the `AsyncBackend`:

from functional_list import ListMapper, AsyncBackend
async def async_operation(x):
    # Your async code
    pass
result = data.map(async_operation, backend=AsyncBackend())

📄 License

This project is licensed under the terms specified in the LICENSE file.

👤 Author

Andrianarivo Tantelitiana RAKOTOARIJAONA

🔗 Links


⭐ If you find this library useful, please consider giving it a star on GitLab!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

functional_list-0.2.1.tar.gz (28.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

functional_list-0.2.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (275.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

functional_list-0.2.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (261.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

functional_list-0.2.1-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (247.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

File details

Details for the file functional_list-0.2.1.tar.gz.

File metadata

  • Download URL: functional_list-0.2.1.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for functional_list-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a3354bbb508fcafdd86186aee97962ace0e474cdd6e7e6a14df76039cff48261
MD5 4e871f31aa476ac614c0f43fdab1dec1
BLAKE2b-256 752f698dff4b29ece54787667c2b5245b1bc4008846f017f9d21c9668d352969

See more details on using hashes here.

File details

Details for the file functional_list-0.2.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.

File metadata

  • Download URL: functional_list-0.2.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
  • Upload date:
  • Size: 275.6 kB
  • Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64, manylinux: glibc 2.5+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for functional_list-0.2.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
Algorithm Hash digest
SHA256 f3b739116aa8ede2a5d4de1347c8144ce2d4e507aec1775e212e28eda1482fd1
MD5 3c0cc9f312b4707c486735a07e37a841
BLAKE2b-256 08fc2fd515a86d65041e0ede6d9c9fbc074eedaf12ff27bae1ec10f15cfe2c5f

See more details on using hashes here.

File details

Details for the file functional_list-0.2.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.

File metadata

  • Download URL: functional_list-0.2.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
  • Upload date:
  • Size: 261.8 kB
  • Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64, manylinux: glibc 2.5+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for functional_list-0.2.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
Algorithm Hash digest
SHA256 447734661ef0d350719abe98f51c9382bd1c6836ace212437659d4cf5458a70a
MD5 b6d76dba75bf9b7b0214056e80d034b8
BLAKE2b-256 a3d1f39d062ac6442e37918c0d0ffdb97bb76a1437d9188482519f5a37726e06

See more details on using hashes here.

File details

Details for the file functional_list-0.2.1-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.

File metadata

  • Download URL: functional_list-0.2.1-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
  • Upload date:
  • Size: 247.1 kB
  • Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64, manylinux: glibc 2.5+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for functional_list-0.2.1-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
Algorithm Hash digest
SHA256 f061d0bf3f2fd73a05e0eed68797189db41a2b73f41bdc40ca99bfaafc9525f0
MD5 9b44475d7f179eaf6b5286b57232c4aa
BLAKE2b-256 d66c32f13ee5b5773ca2d4a738b720e8909c17119c12d6667c4a7ab0dcaf9825

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page