Skip to main content

A High-Performance, Production-Ready Python Implementation of C# LINQ with Deferred Execution.

Project description

Contributors Forks Stargazers Issues MIT License LinkedIn

Python PyLINQ (linqex)

A High-Performance, Production-Ready Python Implementation of C# LINQ with Deferred Execution.

Changelog · Report Bug · Request Feature


📋 About the Project

🚀 Why PyLINQ?

Data manipulation in Python often leads to highly nested comprehensions, unreadable functional chains (map, filter, reduce), or unnecessary memory overhead when processing large data streams.

linqex brings the elegance and power of C# LINQ (Language Integrated Query) directly into the Python ecosystem. It allows you to query, transform, and manipulate iterable sequences using a fluent, declarative syntax while maintaining absolute type safety and phenomenal execution speeds.

🚀 The Power of Deferred Execution (Lazy Evaluation)

Standard Python list comprehensions compute the entire result set in memory at once. If you only need the first 3 matching elements from a 10 GB log file, loading it all into memory is disastrous.

linqex is built on a pure lazy-evaluation architecture using native Python yield generators and the C-based itertools library. The data pipeline you define (e.g., .where().select().order_by()) is never executed until a terminal operation like .to_list(), .first(), or .count() is invoked. This results in an $O(1)$ memory footprint, unlocking the ability to process massive datasets seamlessly.

✨ Key Features

  • 100% C# LINQ Parity: Supports almost all LINQ operators from .NET 8, including modern additions like .chunk(), .max_by(), and .distinct_by().
  • Deferred Execution: Chain as many operations as you want. The engine only computes exactly what it needs, exactly when it needs it.
  • Pythonic Fast-Paths: If you pass an in-memory sequence (like a list or tuple), methods like .count(), .element_at(), and .reverse() bypass O(N) iterations and execute instantly in O(1) constant time leveraging Python's __len__ and __getitem__.
  • Zero Overhead Memory: Utilizes strict __slots__ across all classes, eliminating dynamic dictionary allocations and keeping memory usage razor-thin even when spawning millions of groups or ordered states.
  • Strict Exception Parity: Replicates C#'s robust exception behavior. Operations like .single() throw exceptions on duplicates, and .to_dict() fiercely guards against silent key overwrites, ensuring data integrity.
  • Absolute Type Safety: Meticulously annotated with Python typing generics (Generic[T], TypeVar). It provides flawless IDE autocomplete (VS Code, PyCharm) and fully supports static analyzers like mypy.
  • Stable Multi-Level Sorting: Offers .order_by().then_by_descending() chaining without re-evaluating the source, natively leveraging Python's lightning-fast Timsort algorithm.

⚙️ Architectural Notes

Engineering facts developers need to know when using this library:

  1. The Generator Exhaustion Reality: Python generators can only be traversed once. If you pass a generator expression (x for x in ...) into Enumerable and execute a terminal operation like .count(), the generator is consumed. A subsequent .to_list() will return an empty array. To perform multiple terminal operations, ensure you pass an in-memory collection (like a list) to the engine or explicitly call .to_list() first.
  2. Terminal vs. Intermediate Operations: Methods like where, select, and skip are Intermediate (they return a new Enumerable and do no work). Methods like to_list, count, sum, and first are Terminal (they force the evaluation of the pipeline).
  3. Lookup vs. Dictionary: In LINQ, a Dictionary maps one key to one value, while a Lookup maps one key to a collection of values. linqex strictly follows this. Furthermore, requesting a non-existent key from a .to_lookup() result returns an empty Enumerable instead of throwing a KeyError, making grouped data access incredibly safe.

🚀 Getting Started

🛠️ Dependencies

  • No external dependencies.
  • Only Python Standard Library (itertools, collections, functools, typing).
  • Fully compatible with Python 3.9+.

📦 Installation

The library has zero external dependencies and works natively with Python's core toolkit.

  1. Clone the repository

    git clone https://github.com/TahsinCr/python-linqex.git
    
  2. Install via PIP

    pip install linqex
    

💻 Usage Examples

1. Standard Data Transformation & Filtering

Cleanly filter, sort, and project data without nested comprehensions.

from linqex import Enumerable

data = [
    {"name": "Alice", "age": 28, "role": "Dev"},
    {"name": "Bob", "age": 35, "role": "HR"},
    {"name": "Charlie", "age": 42, "role": "Dev"},
    {"name": "Dave", "age": 22, "role": "Dev"}
]

# Pipeline is lazy. No iteration happens yet.
devs = (Enumerable(data)
    .where(lambda x: x["role"] == "Dev")
    .where(lambda x: x["age"] > 25)
    .order_by_descending(lambda x: x["age"])
    .select(lambda x: x["name"]))

# Terminal operation executes the pipeline
print(devs.to_list()) 
# Output: ['Charlie', 'Alice']

2. Aggregations and Fast-Paths

Finding the maximum element based on a specific property, similar to .MaxBy() in C#.

from linqex import Enumerable

inventory = [
    {"id": 1, "product": "Laptop", "price": 1200},
    {"id": 2, "product": "Mouse", "price": 45},
    {"id": 3, "product": "Monitor", "price": 300}
]

stream = Enumerable(inventory)

# Finds the actual dictionary object of the most expensive item
most_expensive = stream.max_by(lambda x: x["price"])
print(most_expensive["product"]) # Output: Laptop

# O(1) Fast-path count execution since the source is a List
total_items = stream.count() 

3. Massive Data Chunking (Memory Safe)

Process millions of records in chunks for database batch inserts without blowing up the RAM.

from linqex import Enumerable

def massive_database_stream():
    for i in range(1, 1000000):
        yield {"id": i, "status": "pending"}

stream = Enumerable(massive_database_stream())

# Groups data into lists of 500 items lazily
batches = stream.chunk(500)

for batch in batches.take(3): # Only process the first 3 batches
    print(f"Executing SQL bulk insert for {len(batch)} items...")

4. Grouping & Analytics (group_by)

Easily group data by a specific key and perform aggregate calculations on the sub-groups.

from linqex import Enumerable

orders = [
    {"customer": "C1", "amount": 100},
    {"customer": "C2", "amount": 50},
    {"customer": "C1", "amount": 200},
    {"customer": "C3", "amount": 300}
]

report = (Enumerable(orders)
    .group_by(lambda o: o["customer"])
    .select(lambda group: {
        "customer": group.key,
        "total_spent": group.sum(lambda x: x["amount"]),
        "order_count": group.count()
    })
    .to_list())

# [{'customer': 'C1', 'total_spent': 300, 'order_count': 2}, ...]

5. Relational Inner Joins in Memory

Merge two disparate data sources safely and efficiently.

from linqex import Enumerable

employees = [{"id": 1, "name": "Alice", "dept_id": 10}, {"id": 2, "name": "Bob", "dept_id": 20}]
departments = [{"id": 10, "name": "Engineering"}, {"id": 20, "name": "Sales"}]

joined_data = Enumerable(employees).join(
    inner=departments,
    outer_key=lambda e: e["dept_id"],
    inner_key=lambda d: d["id"],
    selector=lambda e, d: f"{e['name']} works in {d['name']}"
).to_list()

# ['Alice works in Engineering', 'Bob works in Sales']

🙏 Acknowledgments and License

This project is fully open-source under the MIT License (License).

If you find any bugs or want to make an architectural contribution, feel free to open an Issue or submit a Pull Request on GitHub!

📫 Contact

X: @TahsinCrs

Linkedin: @TahsinCr

Email: TahsinCrs@gmail.com

Project details


Release history Release notifications | RSS feed

This version

2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linqex-2.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linqex-2.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file linqex-2.0.tar.gz.

File metadata

  • Download URL: linqex-2.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for linqex-2.0.tar.gz
Algorithm Hash digest
SHA256 c38c103461bfe1f74e3dff4204c9e0c63b1a0de0c6b3eb217c35f6ece09bfccd
MD5 8c61450b09012258cbf626536d99b937
BLAKE2b-256 2fafb89fc86c1fbb7aeb0b75fe1ceda17296e362e8e136d277f0e9422c490f68

See more details on using hashes here.

File details

Details for the file linqex-2.0-py3-none-any.whl.

File metadata

  • Download URL: linqex-2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for linqex-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 653d8effd913ea76170880688b43e3f1e0d81fece838f452f3a269b4da440ebf
MD5 b14ead50aeffa269abacc26f97433daf
BLAKE2b-256 90be21f16edd3b7027fc13ff1b6438898f9aa4881b7234b19a2c054393e49a00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page