Data collection from Git merge conflicts.
Project description
conflict-collection
Data collection toolkit for Git merge conflicts: structural types, social signals, and similarity metrics.
conflict-collection builds on conflict-parser to turn raw in-progress merges into rich, typed records suitable for research, analytics, or ML.
✨ Features
- Classify and enumerate merge conflict cases (modify/modify, add/add, delete/modify, etc.)
- Produce structured frozen dataclasses with per-side contents & resolution metadata
- Build canonical 5‑tuples (A,B,O,M,R) for dataset curation
- Extract social / ownership signals (recency, blame composition, integrator priors)
- Compute a 3‑way anchored similarity ratio between two resolution candidates
- Fully typed (PEP 561) & test‑covered
📦 Install
pip install conflict-collection
# or with documentation extras
pip install conflict-collection[docs]
🚀 Quick Start
from conflict_collection.collectors.conflict_type import collect as collect_conflicts
from conflict_collection.collectors.societal import collect as collect_social
from conflict_collection.metrics.anchored_ratio import anchored_ratio
# 1) Enumerate conflict cases after a merge produced conflicts
cases = collect_conflicts(repo_path='.', resolution_sha='<resolved-commit-sha>')
print(len(cases), 'cases')
print(cases[0].conflict_type, cases[0].conflict_path)
# 2) Capture social signals
signals = collect_social(repo_path='.')
for path, rec in signals.items():
print(path, rec.ours_author, rec.owner_commits_ours)
# 3) Similarity metric example
O = 'line1\nline2\nline3'
R = 'line1\nX\nline3'
R_hat = 'line1\nY\nline3'
print('anchored ratio =', anchored_ratio(O, R, R_hat))
📚 Documentation
Full docs (usage guides + auto-generated API reference) are published with MkDocs & mkdocstrings:
https://jinu-jang.github.io/conflict-collection
Local build:
pip install -e .[docs]
mkdocs serve
🧩 Data Models
| Model | Purpose |
|---|---|
| Typed Conflict Cases | Frozen dataclasses per conflict archetype |
Conflict5Tuple |
Canonical (A,B,O,M,R) capture |
| Social Signals | Ownership & recency metrics per file |
| Anchored Ratio | Algorithmic similarity between two edits |
🔬 Testing
git clone https://github.com/jinu-jang/conflict-collection
cd conflict-collection
pip install -e .[dev]
pytest -q
🤝 Contributing
PRs welcome! Please:
- Add or update tests
- Run
black . && isort . && pytest -q - If adding public API, include docstrings & update docs nav (
mkdocs.yml)
See docs/contributing.md for details.
📄 License
MIT © 2025 Jinu Jang
🔖 Status
Beta. Interfaces may change before 0.1. Feedback appreciated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file conflict_collection-0.0.2.tar.gz.
File metadata
- Download URL: conflict_collection-0.0.2.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4466f0a3d20be7f5a2922369ad00d06de6feac8a52c534bd5e04b25021c8f246
|
|
| MD5 |
86a6b720ccbcab2ce0914b1ed8f4cb3f
|
|
| BLAKE2b-256 |
7b2ca315c954de23a7ec6a18f6ef76c162c6768ee1c933e4c3c8ce797406e6f1
|
File details
Details for the file conflict_collection-0.0.2-py3-none-any.whl.
File metadata
- Download URL: conflict_collection-0.0.2-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7845a824811f8228d36d0d5f2cb1f6ba3a5110907499712ee2578b0a04c3374f
|
|
| MD5 |
c9417c5c64e535dd13ecd96ffc481a00
|
|
| BLAKE2b-256 |
9261d693ae011b0996ac9747f6e5c0055181367b7c89b9bd98612e7bf43fffcb
|