Skip to main content

Powerful deep search for nested dict/list structures

Project description

nestfind

Powerful deep search for nested dict/list structures in Python.

Traverses arbitrarily nested dict/list data using a flexible path-based syntax, supporting fallback paths, multiple sources, wildcard matching, predicate filtering, and more.

Installation

pip install nestfind

Quick Start

from nestfind import deep_search

data = {
    "user": {
        "profile": {
            "name": "Alice",
            "email": "alice@example.com"
        }
    }
}

deep_search(data, "user", "profile", "name")   # → "Alice"
deep_search(data, "email")                      # → "alice@example.com"  (wide search)

Path Segment Types

Segment Description Example
str Wide search key — finds key anywhere in nested structure "name"
str + "!" Condition key — returns the parent dict containing this key "uri!"
str + "?" Optional key — exact match only, skips wide search if not found "nickname?"
"*" Wildcard — matches ALL keys/items at this level "*"
int List index — exact positional access, supports negative 0, -1
callable Predicate filter — include item only if callable returns truthy lambda u: u.get("active")

Modes

Single path

deep_search(data, "a", "b", "c")

Fallback mode — tries paths in order, returns first non-empty result

deep_search(data, ["uri"], ["browser_native_hd_url"])

Multi-source mode — each list is [source, *keys]

deep_search([source1, "key1"], [source2, "key2", "key3"])

Parameters

Parameter Type Default Description
return_first bool True Return first match or list of all matches
default Any None Value to return if nothing found
type_filter type or tuple None Only return results of this type
value_filter callable None Only return results where value_filter(v) is truthy
transform callable None Apply function to each result before returning
max_depth int None Maximum nesting depth for wide search
exclude_keys list[str] None Skip these keys during wide search
strict bool False Disable wide search — exact path traversal only
with_path bool False Return (value, path) tuples instead of bare values
debug bool False Enable debug logging

Examples

from nestfind import deep_search, DeepSearch

data = {
    "users": [
        {"id": 1, "name": "Alice", "active": True},
        {"id": 2, "name": "Bob",   "active": False},
    ]
}

# Get all emails using wildcard
deep_search(data, "users", "*", "name", return_first=False)
# → ["Alice", "Bob"]

# Filter with predicate
deep_search(data, "users", lambda u: u.get("active"), "name")
# → "Alice"

# Return with path
deep_search(data, "name", with_path=True)
# → ("Alice", ["users", 0, "name"])

# Type filter
deep_search(data, "id", type_filter=int)
# → 1

# Class wrapper — bind config once, reuse
ds = DeepSearch(exclude_keys=["metadata"], max_depth=5)
ds(data, "users", "*", "name", return_first=False)
# → ["Alice", "Bob"]

DeepSearch class

Bind configuration once and reuse across calls:

class FacebookMapper:
    deep_search = DeepSearch(exclude_keys=["metadata"])

    def map(self, raw):
        return self.deep_search(raw, "user", "name")

Advanced Examples

Parsing inconsistent API responses

Real-world APIs often return the same data under different keys depending on the endpoint or version. Use fallback mode to handle all variants transparently:

# Instagram-style response — video URL can live under many keys
media = {
    "video_versions": [
        {"type": 101, "url": "https://cdn.example.com/video_hd.mp4"},
        {"type": 102, "url": "https://cdn.example.com/video_sd.mp4"},
    ]
}

url = deep_search(
    media,
    ["video_versions", 0, "url"],       # preferred: first video version
    ["video_dash_manifest"],             # fallback 1
    ["browser_native_hd_url"],           # fallback 2
    ["browser_native_sd_url"],           # fallback 3
)
# → "https://cdn.example.com/video_hd.mp4"

Multi-source with priority

When you have multiple raw payloads and want the first one that has a given value:

post    = {"media": {"image_versions": {"candidates": [{"url": "https://img.example.com/post.jpg"}]}}}
story   = {}   # empty / missing
reel    = {"image_versions": {"candidates": [{"url": "https://img.example.com/reel.jpg"}]}}

thumbnail = deep_search(
    [story,  "image_versions", "candidates", 0, "url"],
    [post,   "media", "image_versions", "candidates", 0, "url"],
    [reel,   "image_versions", "candidates", 0, "url"],
)
# → "https://img.example.com/post.jpg"  (story was empty, post matched first)

Wildcard + predicate chaining

Collect the display URL of every video item in a feed that has more than 1M views:

feed = {
    "items": [
        {"media_type": 2, "view_count": 1_500_000, "video_url": "https://cdn.example.com/a.mp4"},
        {"media_type": 1, "view_count": 3_000_000, "image_url": "https://cdn.example.com/b.jpg"},
        {"media_type": 2, "view_count": 800_000,   "video_url": "https://cdn.example.com/c.mp4"},
        {"media_type": 2, "view_count": 2_200_000, "video_url": "https://cdn.example.com/d.mp4"},
    ]
}

viral_videos = deep_search(
    feed,
    "items",
    lambda item: item.get("media_type") == 2 and item.get("view_count", 0) > 1_000_000,
    "video_url",
    return_first=False,
)
# → ["https://cdn.example.com/a.mp4", "https://cdn.example.com/d.mp4"]

Condition key "!" — grab the parent dict

Useful when you need the whole object that contains a specific key, not just the value at that key:

story = {
    "reel": {
        "items": [
            {
                "id": "abc123",
                "media": {
                    "uri": "https://cdn.example.com/story.mp4",
                    "width": 1080,
                    "height": 1920,
                }
            }
        ]
    }
}

# Get the entire media dict that contains "uri", not just the uri value
media_obj = deep_search(story, "media", "uri!")
# → {"uri": "https://cdn.example.com/story.mp4", "width": 1080, "height": 1920}

# Now you can access sibling keys directly
print(media_obj["width"], media_obj["height"])   # 1080 1920

Optional key "?" — graceful missing fields

Skip a segment silently when it may or may not exist, without falling back to wide search:

user_a = {"profile": {"display_name": "Alice",  "nickname": "ali"}}
user_b = {"profile": {"display_name": "Bob"}}   # no nickname

# "nickname?" won't error or wide-search if missing — just moves on
for user in [user_a, user_b]:
    label = deep_search(
        user,
        "profile", "nickname?",     # use nickname if present …
        default=deep_search(user, "profile", "display_name"),  # … else display_name
    )
    print(label)
# → "ali"
# → "Bob"

with_path — audit where a value came from

When debugging deeply nested structures, knowing where a value was found is as important as the value itself:

config = {
    "services": {
        "auth": {
            "database": {
                "host": "db-auth.internal",
                "port": 5432,
            }
        },
        "api": {
            "database": {
                "host": "db-api.internal",
                "port": 5432,
            }
        }
    }
}

results = deep_search(config, "host", return_first=False, with_path=True)
# → [
#     ("db-auth.internal", ["services", "auth", "database", "host"]),
#     ("db-api.internal",  ["services", "api",  "database", "host"]),
# ]

for value, path in results:
    print(" → ".join(str(p) for p in path), "=", value)
# services → auth → database → host = db-auth.internal
# services → api  → database → host = db-api.internal

transform + type_filter — extract and reshape in one pass

raw = {
    "stats": {
        "impressions": "12400",   # string from API
        "clicks":      "837",
        "spend":       "42.50",
    }
}

# Pull all numeric-looking strings and cast to float in one call
values = deep_search(
    raw,
    "stats",
    "*",
    return_first=False,
    value_filter=lambda v: isinstance(v, str) and v.replace(".", "").isdigit(),
    transform=float,
)
# → [12400.0, 837.0, 42.5]

exclude_keys + max_depth — scoped search in large payloads

Prevent the wide search from wandering into noisy or irrelevant subtrees:

response = {
    "data": {
        "user": {"id": 1, "name": "Alice"},
    },
    "metadata": {
        "user": {"id": 999, "name": "__system__"},   # should be ignored
    },
    "debug": {
        "trace": {"user": {"id": -1}}                # deep noise, also ignored
    }
}

name = deep_search(
    response,
    "user", "name",
    exclude_keys=["metadata", "debug"],
    max_depth=3,
)
# → "Alice"  (metadata and debug subtrees are skipped entirely)

strict=True — exact path, no surprises

When you know the exact structure and want to disable wide search for performance or correctness:

data = {
    "a": {
        "b": {
            "c": 42,
            "extra": {"c": 999}   # would be found by wide search
        }
    }
}

deep_search(data, "a", "b", "c")                    # → 42  (wide search off by default for exact hit)
deep_search(data, "a", "b", "c", strict=True)       # → 42  (exact path only)
deep_search(data, "b", "c",     strict=True)        # → None (strict: won't descend into "a" automatically)

Reusable mapper class with DeepSearch

Bind a shared configuration at the class level and override per-call as needed:

from nestfind import DeepSearch

class InstagramMediaMapper:
    ds = DeepSearch(exclude_keys=["debug", "logging"], max_depth=8)

    def map(self, raw: dict) -> dict:
        return {
            "id":        self.ds(raw, "pk"),
            "shortcode": self.ds(raw, "code"),
            "type":      self.ds(raw, "media_type", type_filter=int),
            "url":       self.ds(
                             raw,
                             ["video_versions", 0, "url"],
                             ["image_versions", "candidates", 0, "url"],
                         ),
            "width":     self.ds(raw, "original_width",  type_filter=int),
            "height":    self.ds(raw, "original_height", type_filter=int),
            "owner_id":  self.ds(raw, "owner", "pk"),
            "timestamp": self.ds(raw, "taken_at",        type_filter=int),
        }

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nestfind-0.1.1.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nestfind-0.1.1-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file nestfind-0.1.1.tar.gz.

File metadata

  • Download URL: nestfind-0.1.1.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for nestfind-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4b202ea926e4275ca8964cf0cff402f207187a68c01f2e1bb889c023ae654d18
MD5 3682b123d213ba3437f66780d49bc806
BLAKE2b-256 6f3625ab3f40bd8819db4324cbffe5fffe2daff925098db90bba7b36caefb2cf

See more details on using hashes here.

File details

Details for the file nestfind-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: nestfind-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for nestfind-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 03120d2641f4937a71a3309f4e7c7cd95535c6c62148825ca81dbdf36094e8cb
MD5 effbe5cff2105a8a957a84bc5278c859
BLAKE2b-256 893e9d6be82c11a1c4f96de47ab9a588eb90233b2184f03d615dc97852ab1014

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page