Powerful deep search for nested dict/list structures
Project description
nestfind
Powerful deep search for nested dict/list structures in Python.
Traverses arbitrarily nested dict/list data using a flexible path-based syntax,
supporting fallback paths, multiple sources, wildcard matching, predicate filtering, and more.
Installation
pip install nestfind
Quick Start
from nestfind import deep_search
data = {
"user": {
"profile": {
"name": "Alice",
"email": "alice@example.com"
}
}
}
deep_search(data, "user", "profile", "name") # → "Alice"
deep_search(data, "email") # → "alice@example.com" (wide search)
Path Segment Types
| Segment | Description | Example |
|---|---|---|
str |
Wide search key — finds key anywhere in nested structure | "name" |
str + "!" |
Condition key — returns the parent dict containing this key | "uri!" |
str + "?" |
Optional key — exact match only, skips wide search if not found | "nickname?" |
"*" |
Wildcard — matches ALL keys/items at this level | "*" |
int |
List index — exact positional access, supports negative | 0, -1 |
callable |
Predicate filter — include item only if callable returns truthy | lambda u: u.get("active") |
Modes
Single path
deep_search(data, "a", "b", "c")
Fallback mode — tries paths in order, returns first non-empty result
deep_search(data, ["uri"], ["browser_native_hd_url"])
Multi-source mode — each list is [source, *keys]
deep_search([source1, "key1"], [source2, "key2", "key3"])
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
return_first |
bool |
True |
Return first match or list of all matches |
default |
Any |
None |
Value to return if nothing found |
type_filter |
type or tuple |
None |
Only return results of this type |
value_filter |
callable |
None |
Only return results where value_filter(v) is truthy |
transform |
callable |
None |
Apply function to each result before returning |
max_depth |
int |
None |
Maximum nesting depth for wide search |
exclude_keys |
list[str] |
None |
Skip these keys during wide search |
strict |
bool |
False |
Disable wide search — exact path traversal only |
with_path |
bool |
False |
Return (value, path) tuples instead of bare values |
debug |
bool |
False |
Enable debug logging |
Examples
from nestfind import deep_search, DeepSearch
data = {
"users": [
{"id": 1, "name": "Alice", "active": True},
{"id": 2, "name": "Bob", "active": False},
]
}
# Get all emails using wildcard
deep_search(data, "users", "*", "name", return_first=False)
# → ["Alice", "Bob"]
# Filter with predicate
deep_search(data, "users", lambda u: u.get("active"), "name")
# → "Alice"
# Return with path
deep_search(data, "name", with_path=True)
# → ("Alice", ["users", 0, "name"])
# Type filter
deep_search(data, "id", type_filter=int)
# → 1
# Class wrapper — bind config once, reuse
ds = DeepSearch(exclude_keys=["metadata"], max_depth=5)
ds(data, "users", "*", "name", return_first=False)
# → ["Alice", "Bob"]
DeepSearch class
Bind configuration once and reuse across calls:
class FacebookMapper:
deep_search = DeepSearch(exclude_keys=["metadata"])
def map(self, raw):
return self.deep_search(raw, "user", "name")
Advanced Examples
Parsing inconsistent API responses
Real-world APIs often return the same data under different keys depending on the endpoint or version. Use fallback mode to handle all variants transparently:
# Instagram-style response — video URL can live under many keys
media = {
"video_versions": [
{"type": 101, "url": "https://cdn.example.com/video_hd.mp4"},
{"type": 102, "url": "https://cdn.example.com/video_sd.mp4"},
]
}
url = deep_search(
media,
["video_versions", 0, "url"], # preferred: first video version
["video_dash_manifest"], # fallback 1
["browser_native_hd_url"], # fallback 2
["browser_native_sd_url"], # fallback 3
)
# → "https://cdn.example.com/video_hd.mp4"
Multi-source with priority
When you have multiple raw payloads and want the first one that has a given value:
post = {"media": {"image_versions": {"candidates": [{"url": "https://img.example.com/post.jpg"}]}}}
story = {} # empty / missing
reel = {"image_versions": {"candidates": [{"url": "https://img.example.com/reel.jpg"}]}}
thumbnail = deep_search(
[story, "image_versions", "candidates", 0, "url"],
[post, "media", "image_versions", "candidates", 0, "url"],
[reel, "image_versions", "candidates", 0, "url"],
)
# → "https://img.example.com/post.jpg" (story was empty, post matched first)
Wildcard + predicate chaining
Collect the display URL of every video item in a feed that has more than 1M views:
feed = {
"items": [
{"media_type": 2, "view_count": 1_500_000, "video_url": "https://cdn.example.com/a.mp4"},
{"media_type": 1, "view_count": 3_000_000, "image_url": "https://cdn.example.com/b.jpg"},
{"media_type": 2, "view_count": 800_000, "video_url": "https://cdn.example.com/c.mp4"},
{"media_type": 2, "view_count": 2_200_000, "video_url": "https://cdn.example.com/d.mp4"},
]
}
viral_videos = deep_search(
feed,
"items",
lambda item: item.get("media_type") == 2 and item.get("view_count", 0) > 1_000_000,
"video_url",
return_first=False,
)
# → ["https://cdn.example.com/a.mp4", "https://cdn.example.com/d.mp4"]
Condition key "!" — grab the parent dict
Useful when you need the whole object that contains a specific key, not just the value at that key:
story = {
"reel": {
"items": [
{
"id": "abc123",
"media": {
"uri": "https://cdn.example.com/story.mp4",
"width": 1080,
"height": 1920,
}
}
]
}
}
# Get the entire media dict that contains "uri", not just the uri value
media_obj = deep_search(story, "media", "uri!")
# → {"uri": "https://cdn.example.com/story.mp4", "width": 1080, "height": 1920}
# Now you can access sibling keys directly
print(media_obj["width"], media_obj["height"]) # 1080 1920
Optional key "?" — graceful missing fields
Skip a segment silently when it may or may not exist, without falling back to wide search:
user_a = {"profile": {"display_name": "Alice", "nickname": "ali"}}
user_b = {"profile": {"display_name": "Bob"}} # no nickname
# "nickname?" won't error or wide-search if missing — just moves on
for user in [user_a, user_b]:
label = deep_search(
user,
"profile", "nickname?", # use nickname if present …
default=deep_search(user, "profile", "display_name"), # … else display_name
)
print(label)
# → "ali"
# → "Bob"
with_path — audit where a value came from
When debugging deeply nested structures, knowing where a value was found is as important as the value itself:
config = {
"services": {
"auth": {
"database": {
"host": "db-auth.internal",
"port": 5432,
}
},
"api": {
"database": {
"host": "db-api.internal",
"port": 5432,
}
}
}
}
results = deep_search(config, "host", return_first=False, with_path=True)
# → [
# ("db-auth.internal", ["services", "auth", "database", "host"]),
# ("db-api.internal", ["services", "api", "database", "host"]),
# ]
for value, path in results:
print(" → ".join(str(p) for p in path), "=", value)
# services → auth → database → host = db-auth.internal
# services → api → database → host = db-api.internal
transform + type_filter — extract and reshape in one pass
raw = {
"stats": {
"impressions": "12400", # string from API
"clicks": "837",
"spend": "42.50",
}
}
# Pull all numeric-looking strings and cast to float in one call
values = deep_search(
raw,
"stats",
"*",
return_first=False,
value_filter=lambda v: isinstance(v, str) and v.replace(".", "").isdigit(),
transform=float,
)
# → [12400.0, 837.0, 42.5]
exclude_keys + max_depth — scoped search in large payloads
Prevent the wide search from wandering into noisy or irrelevant subtrees:
response = {
"data": {
"user": {"id": 1, "name": "Alice"},
},
"metadata": {
"user": {"id": 999, "name": "__system__"}, # should be ignored
},
"debug": {
"trace": {"user": {"id": -1}} # deep noise, also ignored
}
}
name = deep_search(
response,
"user", "name",
exclude_keys=["metadata", "debug"],
max_depth=3,
)
# → "Alice" (metadata and debug subtrees are skipped entirely)
strict=True — exact path, no surprises
When you know the exact structure and want to disable wide search for performance or correctness:
data = {
"a": {
"b": {
"c": 42,
"extra": {"c": 999} # would be found by wide search
}
}
}
deep_search(data, "a", "b", "c") # → 42 (wide search off by default for exact hit)
deep_search(data, "a", "b", "c", strict=True) # → 42 (exact path only)
deep_search(data, "b", "c", strict=True) # → None (strict: won't descend into "a" automatically)
Reusable mapper class with DeepSearch
Bind a shared configuration at the class level and override per-call as needed:
from nestfind import DeepSearch
class InstagramMediaMapper:
ds = DeepSearch(exclude_keys=["debug", "logging"], max_depth=8)
def map(self, raw: dict) -> dict:
return {
"id": self.ds(raw, "pk"),
"shortcode": self.ds(raw, "code"),
"type": self.ds(raw, "media_type", type_filter=int),
"url": self.ds(
raw,
["video_versions", 0, "url"],
["image_versions", "candidates", 0, "url"],
),
"width": self.ds(raw, "original_width", type_filter=int),
"height": self.ds(raw, "original_height", type_filter=int),
"owner_id": self.ds(raw, "owner", "pk"),
"timestamp": self.ds(raw, "taken_at", type_filter=int),
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nestfind-0.1.1.tar.gz.
File metadata
- Download URL: nestfind-0.1.1.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b202ea926e4275ca8964cf0cff402f207187a68c01f2e1bb889c023ae654d18
|
|
| MD5 |
3682b123d213ba3437f66780d49bc806
|
|
| BLAKE2b-256 |
6f3625ab3f40bd8819db4324cbffe5fffe2daff925098db90bba7b36caefb2cf
|
File details
Details for the file nestfind-0.1.1-py3-none-any.whl.
File metadata
- Download URL: nestfind-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03120d2641f4937a71a3309f4e7c7cd95535c6c62148825ca81dbdf36094e8cb
|
|
| MD5 |
effbe5cff2105a8a957a84bc5278c859
|
|
| BLAKE2b-256 |
893e9d6be82c11a1c4f96de47ab9a588eb90233b2184f03d615dc97852ab1014
|