Skip to main content

SIB: Self-Improving Benchmark agent framework

Project description

SIB (Self-Improving Benchmark)

A framework for building self-improving AI agents that autonomously refine their performance on benchmark tasks.

Installation

pip install sib-agent

Usage

from sib import Generation, GenerationLog, ScoreTracker

# Track generations in a self-improvement loop
log = GenerationLog()

gen = log.new_generation(config={"model": "claude-sonnet", "temperature": 0.7})
# ... run your agent ...
gen.finish(result={"accuracy": 0.82})

gen = log.new_generation(config={"model": "claude-sonnet", "temperature": 0.5})
# ... run improved agent ...
gen.finish(result={"accuracy": 0.91})

# Find the best generation
best = log.best("accuracy")
print(best.generation_id, best.result)  # 1 {'accuracy': 0.91}

# Save and reload logs
log.save("run_log.json")
log = GenerationLog.load("run_log.json")

# Track scores across generations
tracker = ScoreTracker()
tracker.record("accuracy", 0.82)
tracker.record("accuracy", 0.91)

summary = tracker.summarise("accuracy")
print(summary.improvement)   # 0.09
print(summary.is_improving)  # True

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sib_agent-0.1.0.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sib_agent-0.1.0-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file sib_agent-0.1.0.tar.gz.

File metadata

  • Download URL: sib_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for sib_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7b438fff9a16050fd2bfc20bfac2f1a8b9b7afd826585a2e1a18884349b23f70
MD5 6a6a790ff47e3c8fe347dbd3cb3d9df7
BLAKE2b-256 f9854f458af21f9e9b45cde09724df3aef7fbd3a6cb2a8757c098cab6d271ea8

See more details on using hashes here.

File details

Details for the file sib_agent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sib_agent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for sib_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41c8b76410fe8089418927603cbecb8e8f8a6424af01b3d911148189c2f89629
MD5 f9dac130e4589df854e39b88f201403c
BLAKE2b-256 e4a9a2c7a9725bc261df0c125f5ec1168580b9cc8f2e27fe69bfea0cf2b758e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page