A tool for visualizing categorical data over time.
Project description
PyCatFlow
A Python package for visualizing categorical data over time using temporal flow diagrams.
Overview
PyCatFlow is a specialized visualization tool designed to represent temporal developments in categorical data. It creates flow diagrams that show how categories evolve, appear, and disappear over time periods, making it ideal for analyzing trends in datasets with temporal and categorical dimensions.
Key Features
- Temporal Flow Visualization: Create dynamic flow diagrams showing category changes over time
- Multiple Connection Types: Choose from semi-curved, curved, or straight connection styles
- Data Input: Support for CSV files
- Customizable Appearance: Extensive options for colors, spacing, labels, and legends
- Export Capabilities: Generate high-quality SVG and PNG outputs
- Professional Output: Publication-ready visualizations with comprehensive styling options
Installation
PyPI Installation
pip install pycatflow
Development Installation
git clone https://github.com/bumatic/PyCatFlow.git
cd PyCatFlow
pip install -r requirements-dev.txt
pip install -e .
Alternative using extras:
pip install -e ".[dev]"
System Dependencies
PyCatFlow requires Cairo for PNG export functionality. Install Cairo using your system's package manager:
macOS (using Homebrew):
brew install cairo
Ubuntu/Debian:
sudo apt-get install libcairo2-dev
Windows: Follow the instructions at cairographics.org
Additional Python Dependencies: For PNG export functionality, install:
pip install cairosvg
Quick Start
Basic Usage
import pycatflow as pcf
# Load and parse data
data = pcf.read_file(
"data.csv",
columns="time_period",
nodes="category",
categories="subcategory"
)
# Create visualization
viz = pcf.visualize(
data,
spacing=20,
width=800,
connection_type="semi-curved"
)
# Export results
viz.save_svg('output.svg')
viz.save_png('output.png')
# Display in Jupyter
viz
Data Format Requirements
Your CSV data should contain at minimum:
- Time periods: Column indicating different time points
- Categories: Column with categorical data to track over time
- Subcategories (optional): Additional categorical dimension for color coding
Example data structure:
time_period,category,subcategory
2020,LibraryA,Core
2020,LibraryB,Optional
2021,LibraryA,Core
2021,LibraryC,New
Advanced Configuration
Visualization Parameters
viz = pcf.visualize(
data,
# Layout
spacing=50, # Space between time periods
width=1200, # Canvas width (auto if None)
height=800, # Canvas height (auto if None)
# Node appearance
node_size=10, # Base node size
minValue=1, # Minimum node size
maxValue=20, # Maximum node size
node_scaling="linear", # Scaling method
# Connections
connection_type="semi-curved", # "semi-curved", "curved", "straight"
line_opacity=0.5, # Connection transparency
# Colors
color_categories=True, # Color by subcategory
color_startEnd=True, # Highlight start/end nodes
palette=("viridis", 10), # Matplotlib colormap
# Labels
show_labels=True, # Display node labels
label_text="item", # "item", "item_count", "item_category"
label_position="nodes", # "nodes", "start_end"
# Legend
legend=True, # Include legend
# Sorting
sort_by="frequency" # "frequency", "alphabetical", "category"
)
Data Loading Options
# File loading with custom parameters
data = pcf.read_file(
"data.csv",
columns="time_col", # Time period column
nodes="category_col", # Category column
categories="subcat_col", # Subcategory column (optional)
orientation="horizontal", # Data layout
delimiter=",", # Custom delimiter
column_order="order_col" # Column for custom time ordering
)
# Direct string parsing
data = pcf.read(
csv_string,
columns="time_col",
nodes="category_col"
)
Examples
Example 1: Software Dependencies Over Time
import pycatflow as pcf
# Load dependency data
data = pcf.read_file(
"dependencies.csv",
columns="year",
nodes="library",
categories="type"
)
# Create professional visualization
viz = pcf.visualize(
data,
spacing=30,
width=1000,
connection_type="curved",
color_categories=True,
label_text="item_count",
legend=True
)
viz.save_svg('dependencies_flow.svg')
Example 2: Custom Styling
# Create visualization with custom colors
viz = pcf.visualize(
data,
palette=("Set3", 12),
nodes_color="#f0f0f0",
start_node_color="#2e8b57",
end_node_color="#dc143c",
line_opacity=0.7,
label_color="#333333"
)
API Reference
Core Functions
read_file(filepath, **kwargs)
Load and parse data from CSV file.
Parameters:
filepath(str): Path to CSV filecolumns(str): Column name containing time periodsnodes(str): Column name containing categories to trackcategories(str, optional): Column name for subcategoriesorientation(str): "horizontal" or "vertical" data layoutdelimiter(str, optional): CSV delimiter (auto-detected if None)
Returns:
dict: Structured data ready for visualization
visualize(data, **kwargs)
Generate flow visualization from structured data.
Parameters:
data(dict): Output fromread_file()orread()spacing(int): Space between time periods (default: 50)connection_type(str): "semi-curved", "curved", or "straight"color_categories(bool): Enable category-based coloringlegend(bool): Include legend in output
Returns:
drawsvg.Drawing: SVG visualization object
Visualization Methods
The returned visualization object supports:
save_svg(filename): Export as SVGsave_png(filename): Export as PNG (requires cairosvg)- Display in Jupyter notebooks directly
Data Format Specifications
Horizontal Format (Recommended)
Time periods in one column, categories in another:
time_period,category,subcategory
2020,ItemA,TypeX
2020,ItemB,TypeY
2021,ItemA,TypeX
2021,ItemC,TypeZ
Vertical Format
Time periods as column headers:
category,2020,2021,2022
ItemA,TypeX,TypeX,
ItemB,TypeY,,TypeY
ItemC,,TypeZ,TypeZ
Changelog
Version 0.2.0 (2024)
Major Update: drawSVG 2.x Migration
Breaking Changes
- Updated drawSVG dependency: Now requires
drawsvg>=2.0(previouslydrawSVG<2.0) - API method names: Updated to snake_case following drawSVG 2.x conventions
viz.saveSvg()→viz.save_svg()viz.savePng()→viz.save_png()
- Package name: Import statement unchanged (
import drawsvg), but package name is now lowercase
Migration Notes
Users upgrading from version 0.1.x should:
- Update method calls:
save_svg()andsave_png()instead of camelCase versions - Install updated dependencies:
pip install drawsvg>=2.0 cairosvg - Existing visualization outputs will be functionally identical with minor coordinate improvements
Version 0.1.x (2021-2023)
- Initial release with drawSVG 1.x support
- Core visualization functionality
- Basic CSV data loading
- SVG and PNG export capabilities
- Multiple connection types and styling options
Development and Contributing
Setting Up Development Environment
git clone https://github.com/bumatic/PyCatFlow.git
cd PyCatFlow
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install -e .
Running Tests
# Using pytest (recommended)
python -m pytest tests/ -v
# With coverage report
python -m pytest tests/ --cov=pycatflow --cov-report=html
Code Style
The project follows Python best practices:
- PEP 8 style guidelines
- Comprehensive docstrings
- Type hints where appropriate
- Professional error handling
Troubleshooting
Common Issues
PNG Export Not Working
pip install cairosvg
Import Errors Ensure all dependencies are installed:
pip install drawsvg>=2.0 matplotlib cairosvg
Data Loading Issues
- Verify CSV format matches expected structure
- Check column names match those specified in parameters
- Ensure file encoding is UTF-8
Performance Considerations
- Large datasets (>1000 categories) may require increased spacing
- Complex connection types (curved) take longer to render
- PNG export is slower than SVG due to rasterization
Related Resources
- Tutorial Article: Medium article with detailed explanation
- Interactive Tutorial: Jupyter Notebook with widgets
- Example Data: Sample datasets available in the
example/directory
Citation
If you use PyCatFlow in your research, please cite:
Marcus Burkhardt, and Herbert Natta. 2021. "PyCatFlow: A Python Package for Visualizing Categorical Data over Time". Zenodo. https://doi.org/10.5281/zenodo.5531785.
License
PyCatFlow is released under the MIT License. See LICENSE file for details.
Credits
Conceptualization: Marcus Burkhardt Implementation: Marcus Burkhardt and Herbert Natta (@herbertmn) Inspiration: Rankflow visualization tool by Bernhard Rieder
For questions, issues, or contributions, please visit the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycatflow-0.2.1.tar.gz.
File metadata
- Download URL: pycatflow-0.2.1.tar.gz
- Upload date:
- Size: 27.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b99a7ef07fb8eaa08f992b754d9c97933e72cde4b48ab861c1fa41259252c93
|
|
| MD5 |
31bb46393f9744e6027f43626cbabedb
|
|
| BLAKE2b-256 |
e1fc5b86a5189f82da9b38df96af372dfa88c54b96fea4d71757f9f9955a837c
|
File details
Details for the file pycatflow-0.2.1-py3-none-any.whl.
File metadata
- Download URL: pycatflow-0.2.1-py3-none-any.whl
- Upload date:
- Size: 24.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52371d60266b1473f067e41a8eb55ca34ac5f598fe16eba0c1fc13cc13d11392
|
|
| MD5 |
65d1c25894de9d97cffeb18668370582
|
|
| BLAKE2b-256 |
214f6718ec6bf24eda4f28aacd13aacdffaafd366d93dd543d1734385a121b11
|