Build bipartite networks from JSON affiliation data
Project description
Affiliation Builder is currently in alpha and welcomes feedback from early adopters.
Affiliation Builder
Build bipartite affiliation networks from JSON data using NetworkX.
Overview
Affiliation Builder is a Python package for creating bipartite networks from JSON data on co-affiliation relationships. It transforms structured data about entities (such as people and organizations) and their shared affiliations (such as in events) into NetworkX graph objects for analysis and visualization.
While designed with event-participant data in mind, the package works with any co-affiliation scenario where a set of entities connects to a set of items through shared relationships.
Features
- Flexible JSON input: Supports various JSON structures (arrays, wrapped objects)
- Multiple entity types: Handle different entity types simultaneously (such as persons and organizations)
- Simple and complex entities: Work with string identifiers or objects
- Rich metadata: Preserve all JSON attributes as node properties
- URL support: Load data from local files or URLs
- Comprehensive validation: Detailed error messages and logging
- NetworkX integration: Returns standard NetworkX graph objects
Requirements
The package has been developed and tested with:
- Python 3.9+
- NetworkX 3.0+
- Requests 2.31.0+
Installation
pip install affiliation-builder
Quick Start
from affiliation_builder import build
# Build a bipartite network from JSON data
G = build(
json_path='events.json',
node_set_0_key='events',
node_set_1_keys='participants',
identifier_key='event_id',
node_set_1_identifier_key='person_name'
)
# Returns a standard NetworkX Graph object
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
# Access node sets
node_set_0 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
node_set_1 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}
Understanding the Parameters
The build() function has 5 parameters that control how your JSON data maps to the bipartite network:
Parameter 1: json_path (str or Path)
What it is: Path to your local JSON file or URL
Examples:
json_path='data/events.json'
json_path='https://example.com/data.json'
Parameter 2: node_set_0_key (str or None)
What it is: JSON key containing your items (such as events)
Use None if: JSON is direct array of items (not wrapped in an object)
Examples:
Wrapped object format (specify the key):
{
"events": [
{"id": "evt1", "participants": ["Alice", "Bob"]},
{"id": "evt2", "participants": ["Bob", "Carol"]}
]
}
node_set_0_key='events'
Direct array format (use None):
[
{"id": "evt1", "participants": ["Alice", "Bob"]},
{"id": "evt2", "participants": ["Bob", "Carol"]}
]
node_set_0_key=None
Parameter 3: node_set_1_keys (str or list of str)
What it is: The JSON key(s) that contain the entities affiliated with each item
Pass list when: You have multiple entity types (e.g., both persons and organizations)
Examples:
Single entity type:
{"id": "evt1", "participants": ["Alice", "Bob"]}
node_set_1_keys='participants'
Multiple entity types:
{
"id": "evt1",
"persons": ["Alice", "Bob"],
"organizations": ["University A", "Company B"]
}
node_set_1_keys=['persons', 'organizations']
Parameter 4: identifier_key (str)
What it is: JSON key that uniquely identifies each item (e.g., event)
Examples:
{"id": "evt1", "name": "Conference 2024", ...}
identifier_key='id'
Parameter 5: node_set_1_identifier_key (str or None, optional)
What it is: Key to extract identifiers from entity objects (when entities are objects, not strings)
Use None (default) when: Entities are simple strings/numbers
Pass key when: Entities are objects with multiple attributes
Examples:
Simple entities (strings):
{"id": "evt1", "participants": ["Alice", "Bob"]}
node_set_1_identifier_key=None
Complex entities (objects):
{
"id": "evt1",
"participants": [
{"person_name": "Alice", "role": "speaker", "affiliation": "MIT"},
{"person_name": "Bob", "role": "attendee", "affiliation": "Stanford"}
]
}
# Extract 'Alice' and 'Bob' as node IDs
# All other attributes (role, affiliation) are preserved as node properties
node_set_1_identifier_key='person_name'
JSON Structure Examples
Example 1: Wrapped Object with Simple Entities
{
"events": [
{"name": "Conference 2024", "participants": ["Alice", "Bob", "Carol"]},
{"name": "Workshop 2024", "participants": ["Bob", "David"]}
]
}
G = build(
json_path='events.json',
node_set_0_key='events',
node_set_1_keys='participants',
identifier_key='name'
)
Example 2: Direct Array with Complex Entities
[
{
"project_id": "proj1",
"members": [
{"name": "Alice", "role": "lead", "department": "Engineering"},
{"name": "Bob", "role": "contributor", "department": "Design"}
]
}
]
G = build(
json_path='projects.json',
node_set_0_key=None, # Direct array
node_set_1_keys='members',
identifier_key='project_id',
node_set_1_identifier_key='name' # Extract name from member objects
)
# All attributes preserved as node properties
print(G.nodes['Alice']) # {'bipartite': 1, 'role': 'lead', 'department': 'Engineering'}
Example 3: Multiple Entity Types
{
"events": [
{
"name": "Summit 2024",
"persons": ["Alice", "Bob"],
"organizations": ["Company A", "University B"]
}
]
}
G = build(
json_path='https://example.com/data/events.json',
node_set_0_key='events',
node_set_1_keys=['persons', 'organizations'], # Multiple types
identifier_key='name'
)
Working with the Output
The build() function returns a standard NetworkX Graph object with bipartite structure for further processing:
import networkx as nx
from affiliation_builder import build
# Build network
G = build('events.json', 'events', 'participants', 'event_id')
# Access node sets
events = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
participants = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}
# Check bipartite validity
print(nx.is_bipartite(G))
# Analyze the network
print(f"Number of events: {len(events)}")
print(f"Number of participants: {len(participants)}")
print(f"Network density: {nx.density(G)}")
# Project to unipartite network
P = nx.bipartite.weighted_projected_graph(G, participants)
print(f"Co-affiliation edges: {P.number_of_edges()}")
Duplicate Entity Node Handling
When the same entity appears multiple times (such as a participant in several events), the node is created once and edges are added for each affiliation. This is the expected behavior for affiliation networks.
However, if the same entity appears with different attributes in different items, the last set of attributes overwrites earlier sets. For example:
{
"events": [
{
"name": "Event 1",
"participants": [{"name": "Alice", "role": "speaker"}]
},
{
"name": "Event 2",
"participants": [{"name": "Alice", "role": "attendee"}]
}
]
}
After processing, G.nodes['Alice'] will have role: 'attendee' (from Event 2), but not role: 'speaker' (from Event 1).
Limitations
- UTF-8 encoding: Local JSON files must be UTF-8 encoded. Other encodings will raise an error. (URL sources handle encoding automatically based on server response headers.)
- Hashable identifiers: Node IDs must be hashable Python types (strings, numbers, tuples). Lists or dictionaries as identifiers will be skipped with a warning.
- Flat entity lists: Entity values (under
node_set_1_keys) must be arrays. Nested structures are not recursively processed.
Security Considerations
Be aware of potential security risks when processing JSON data from untrusted sources:
Resource Exhaustion
- Large files: No size limits are enforced on JSON files or URL downloads
- Deep nesting: Extremely nested JSON structures could cause memory or stack issues
- Malicious data: An attacker could provide data designed to consume excessive resources
Recommendations
- Trust your sources: Only load JSON from sources you control or trust
- Validate externally: Pre-validate JSON files for size and structure if loading from untrusted sources
- Monitor resources: For production use, implement resource monitoring
- Sandbox if needed: Run in isolated environments if processing untrusted data
Future Considerations
Future versions may include:
- Optional
max_sizeparameter for downloads - Configurable nesting depth limits
- Enhanced validation options
For now: Use this package with data from trusted sources, or implement your own validation layer for untrusted input.
Logging
The package uses Python's logging module. By default, log messages are not displayed. To receive processing information, configure logging in your application:
Display full logging from DEBUG level upward:
import logging
from affiliation_builder import build
logging.getLogger('affiliation_builder').setLevel(logging.DEBUG)
logging.getLogger('affiliation_builder').addHandler(logging.StreamHandler())
Or set the level of logging to logging.INFO for logging only from INFO level upward.
Examples
See the examples/ directory for:
example.json- Sample JSON data structureexample.ipynb- Complete Jupyter Notebook with test analysis and visualization
Changelog
v0.2.0
- Added support for single entities as JSON objects: input data no longer requires entities to be wrapped in a list when there is only one item
v0.1.0
- Initial release
- Core functionality for building bipartite affiliation networks from JSON data
- Support for flexible input formats
- Comprehensive error handling and logging
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.
License
This project is licensed under the MIT License - see this LICENSE for details.
Citation
If you use this software in your research, please cite:
@software{fruehwirth2025affiliation,
author = {Frühwirth, Timo},
title = {Affiliation Builder: Build bipartite affiliation networks from JSON data},
year = {2025},
url = {https://github.com/timofruehwirth/affiliation-builder},
version = {0.1.0}
}
Acknowledgments
Built with NetworkX for network analysis and Requests for HTTP functionality.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file affiliation_builder-0.2.0.tar.gz.
File metadata
- Download URL: affiliation_builder-0.2.0.tar.gz
- Upload date:
- Size: 337.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7bf924e19d05f514cd8cb22eb5ce63b4478e028d57af524d4357bc749de831a
|
|
| MD5 |
416a335be1bef8e2e5888580f47686e0
|
|
| BLAKE2b-256 |
3972f1f1ebb86dcb0c1230f0d3aeeada3738732e3944a77090d55a258b72f4b6
|
Provenance
The following attestation bundles were made for affiliation_builder-0.2.0.tar.gz:
Publisher:
publish-pypi.yml on timofruehwirth/affiliation-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
affiliation_builder-0.2.0.tar.gz -
Subject digest:
a7bf924e19d05f514cd8cb22eb5ce63b4478e028d57af524d4357bc749de831a - Sigstore transparency entry: 740094886
- Sigstore integration time:
-
Permalink:
timofruehwirth/affiliation-builder@de18653a84f10eb5598ca5a17c7ffd85e9a7a043 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/timofruehwirth
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@de18653a84f10eb5598ca5a17c7ffd85e9a7a043 -
Trigger Event:
release
-
Statement type:
File details
Details for the file affiliation_builder-0.2.0-py3-none-any.whl.
File metadata
- Download URL: affiliation_builder-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71e97b1ea131bdba1ac17e6cf9e4a39a0ad809b2347ec3d489d08ed123052001
|
|
| MD5 |
84ef053178b3fcccaad05e378fee59c9
|
|
| BLAKE2b-256 |
c94e0ad6e2a3906c78ea37e5621fdcfa91eea8782e148c0522aa138292c5401e
|
Provenance
The following attestation bundles were made for affiliation_builder-0.2.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on timofruehwirth/affiliation-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
affiliation_builder-0.2.0-py3-none-any.whl -
Subject digest:
71e97b1ea131bdba1ac17e6cf9e4a39a0ad809b2347ec3d489d08ed123052001 - Sigstore transparency entry: 740094897
- Sigstore integration time:
-
Permalink:
timofruehwirth/affiliation-builder@de18653a84f10eb5598ca5a17c7ffd85e9a7a043 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/timofruehwirth
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@de18653a84f10eb5598ca5a17c7ffd85e9a7a043 -
Trigger Event:
release
-
Statement type: