Skip to main content

Data collection from Git merge conflicts.

Project description

conflict-collection

PyPI version License: MIT Python Versions Downloads

Data collection toolkit for Git merge conflicts: structural types, social signals, and similarity metrics.

conflict-collection builds on conflict-parser to turn raw in-progress merges into rich, typed records suitable for research, analytics, or ML.


✨ Features

  • Classify and enumerate merge conflict cases (modify/modify, add/add, delete/modify, etc.)
  • Produce structured frozen dataclasses with per-side contents & resolution metadata
  • Build canonical 5‑tuples (A,B,O,M,R) for dataset curation
  • Extract social / ownership signals (recency, blame composition, integrator priors)
  • Compute a 3‑way anchored similarity ratio between two resolution candidates
  • Fully typed (PEP 561) & test‑covered

📦 Install

pip install conflict-collection
# or with documentation extras
pip install conflict-collection[docs]

🚀 Quick Start

from conflict_collection.collectors.conflict_type import collect as collect_conflicts
from conflict_collection.collectors.societal import collect as collect_social
from conflict_collection.metrics.anchored_ratio import anchored_ratio

# 1) Enumerate conflict cases after a merge produced conflicts
cases = collect_conflicts(repo_path='.', resolution_sha='<resolved-commit-sha>')
print(len(cases), 'cases')
print(cases[0].conflict_type, cases[0].conflict_path)

# 2) Capture social signals
signals = collect_social(repo_path='.')
for path, rec in signals.items():
	print(path, rec.ours_author, rec.owner_commits_ours)

# 3) Similarity metric example
O = 'line1\nline2\nline3'
R = 'line1\nX\nline3'
R_hat = 'line1\nY\nline3'
print('anchored ratio =', anchored_ratio(O, R, R_hat))

📚 Documentation

Full docs (usage guides + auto-generated API reference) are published with MkDocs & mkdocstrings:

https://jinu-jang.github.io/conflict-collection

Local build:

pip install -e .[docs]
mkdocs serve

🧩 Data Models

Model Purpose
Typed Conflict Cases Frozen dataclasses per conflict archetype
Conflict5Tuple Canonical (A,B,O,M,R) capture
Social Signals Ownership & recency metrics per file
Anchored Ratio Algorithmic similarity between two edits

🔬 Testing

git clone https://github.com/jinu-jang/conflict-collection
cd conflict-collection
pip install -e .[dev]
pytest -q

🤝 Contributing

PRs welcome! Please:

  1. Add or update tests
  2. Run black . && isort . && pytest -q
  3. If adding public API, include docstrings & update docs nav (mkdocs.yml)

See docs/contributing.md for details.


📄 License

MIT © 2025 Jinu Jang


🔖 Status

Beta. Interfaces may change before 0.1. Feedback appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conflict_collection-0.0.2.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

conflict_collection-0.0.2-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file conflict_collection-0.0.2.tar.gz.

File metadata

  • Download URL: conflict_collection-0.0.2.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for conflict_collection-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4466f0a3d20be7f5a2922369ad00d06de6feac8a52c534bd5e04b25021c8f246
MD5 86a6b720ccbcab2ce0914b1ed8f4cb3f
BLAKE2b-256 7b2ca315c954de23a7ec6a18f6ef76c162c6768ee1c933e4c3c8ce797406e6f1

See more details on using hashes here.

File details

Details for the file conflict_collection-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for conflict_collection-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7845a824811f8228d36d0d5f2cb1f6ba3a5110907499712ee2578b0a04c3374f
MD5 c9417c5c64e535dd13ecd96ffc481a00
BLAKE2b-256 9261d693ae011b0996ac9747f6e5c0055181367b7c89b9bd98612e7bf43fffcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page