Skip to main content

Minimal self-evolving autonomous agent framework

Project description

English | 中文


🌟 Overview

GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3,300 lines of code. Through 7 atomic tools + a 92-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).

Its design philosophy: don't preload skills — evolve them.

Every time GenericAgent solves a new task, it automatically crystallizes the execution path into an skill for direct reuse later. The longer you use it, the more skills accumulate — forming a skill tree that belongs entirely to you, grown from 3,300 lines of seed code.

🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running git init to every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.

📋 Core Features

  • Self-Evolving: Automatically crystallizes each task into an skill. Capabilities grow with every use, forming your personal skill tree.
  • Minimal Architecture: ~3,300 lines of core code. Agent Loop is just 92 lines. No complex dependencies, zero deployment overhead.
  • Strong Execution: Injects into a real browser (preserving login sessions). 7 atomic tools take direct control of the system.
  • High Compatibility: Supports Claude / Gemini / Kimi and other major models. Cross-platform.

🧬 Self-Evolution Mechanism

This is what fundamentally distinguishes GenericAgent from every other agent framework.

[New Task] --> [Autonomous Exploration] (install deps, write scripts, debug & verify) -->
[Crystallize Execution Path into skill] --> [Write to Memory Layer] --> [Direct Recall on Next Similar Task]
What you say What the agent does the first time Every time after
"Read my WeChat messages" Install deps → reverse DB → write read script → save skill one-line invoke
"Monitor stocks and alert me" Install mootdx → build selection flow → configure cron → save skill one-line start
"Send this file via Gmail" Configure OAuth → write send script → save skill ready to use

After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3,300 lines of seed code.

🎯 Demo Showcase
🧋 Food Delivery Order 📈 Quantitative Stock Screening
Order Tea Stock Selection
"Order me a milk tea" — Navigates the delivery app, selects items, and completes checkout automatically. "Find GEM stocks with EXPMA golden cross, turnover > 5%" — Screens stocks with quantitative conditions.
🌐 Autonomous Web Exploration 💰 Expense Tracking
Web Exploration Alipay Expense
Autonomously browses and periodically summarizes web content. "Find expenses over ¥2K in the last 3 months" — Drives Alipay via ADB.

📅 Latest News


🚀 Quick Start

Method 1: Standard Installation

# 1. Install from PyPI (after release)
pip install genericagent

# 2. Prepare config in current working directory
curl -L -o mykey.py https://raw.githubusercontent.com/lsdefine/GenericAgent/main/mykey_template.py
# Edit mykey.py and fill in your LLM API Key

# 3. Launch desktop UI
genericagent-launch

Or install from source:

git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
pip install .
cp mykey_template.py mykey.py
genericagent-launch

Method 2: Windows Portable Version (Recommended for beginners)

Download portable version (19MB, unzip and run)

Full guide: WELCOME_NEW_USER.md

Method 3: Android (Termux)

cd /sdcard/ga
python agentmain.py

🤖 Bot Interfaces (Optional)

QQ Bot

Uses qq-botpy WebSocket long connection — no public webhook required:

pip install qq-botpy

Add to mykey.py:

qq_app_id = "YOUR_APP_ID"
qq_app_secret = "YOUR_APP_SECRET"
qq_allowed_users = ["YOUR_USER_OPENID"]  # or ['*'] for public access
python frontends/qqapp.py
# or launch together with the desktop floating window
python launch.pyw --qq

Create a bot at the QQ Open Platform to get AppID / AppSecret. After the first message, user openid is logged in temp/qqapp.log.

Lark (Feishu)

pip install lark-oapi
python frontends/fsapp.py          # or python launch.pyw --feishu
fs_app_id = "cli_xxx"
fs_app_secret = "xxx"
fs_allowed_users = ["ou_xxx"]  # or ['*']

Inbound support: text, rich text post, images, files, audio, media, interactive cards / share cards Outbound support: streaming progress cards, image replies, file / media replies Vision model: Images are sent as true multimodal input to OpenAI Vision-compatible backends on the first turn

Full setup: assets/SETUP_FEISHU.md

WeCom (Enterprise WeChat)

pip install wecom_aibot_sdk
python frontends/wecomapp.py       # or python launch.pyw --wecom
wecom_bot_id = "your_bot_id"
wecom_secret = "your_bot_secret"
wecom_allowed_users = ["your_user_id"]
wecom_welcome_message = "Hello, I'm online."

DingTalk

pip install dingtalk-stream
python frontends/dingtalkapp.py    # or python launch.pyw --dingtalk
dingtalk_client_id = "your_app_key"
dingtalk_client_secret = "your_app_secret"
dingtalk_allowed_users = ["your_staff_id"]  # or ['*']

Telegram Bot

# mykey.py
tg_bot_token = 'YOUR_BOT_TOKEN'
tg_allowed_users = [YOUR_USER_ID]
python frontends/tgapp.py

📊 Comparison with Similar Tools

Feature GenericAgent OpenClaw Claude Code
Codebase ~3,300 lines ~530,000 lines Open-sourced (large)
Deployment pip install + API Key Multi-service orchestration CLI + subscription
Browser Control Real browser (session preserved) Sandbox / headless browser Via MCP plugin
OS Control Mouse/kbd, vision, ADB Multi-agent delegation File + terminal
Self-Evolution Autonomous skill growth Plugin ecosystem Stateless between sessions
Out of the Box 10 .py files + 5 skills Hundreds of modules Rich CLI toolset

🧠 How It Works

GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.

1️⃣ Layered Memory System

Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.

  • L0 — Meta Rules: Core behavioral rules and system constraints of the agent
  • L2 — Global Facts: Stable knowledge accumulated over long-term operation
  • L3 — Task Skillss: Workflows for completing specific task types

2️⃣ Autonomous Execution Loop

Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop

The entire core loop is just 92 lines of code (agent_loop.py).

3️⃣ Minimal Toolset

GenericAgent provides only 7 atomic tools, forming the foundational capabilities for interacting with the outside world.

Tool Function
code_run Execute arbitrary code
file_read Read files
file_write Write files
file_patch Patch / modify files
web_scan Perceive web content
web_execute_js Control browser behavior
ask_user Human-in-the-loop confirmation

Additionally, 2 memory management tools (update_working_checkpoint, start_long_term_update) allow the agent to persist context and accumulate experience across sessions.

4️⃣ Capability Extension Mechanism

Capable of dynamically creating new tools.

Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.

GenericAgent Workflow
GenericAgent Workflow Diagram

⭐ Support

If this project helped you, please consider leaving a Star! 🙏

You're also welcome to join our GenericAgent Community Group for discussion, feedback, and co-building 👏

📄 License

MIT License — see LICENSE


🌟 项目简介

GenericAgent 是一个极简、可自我进化的自主 Agent 框架。核心仅 ~3,300 行代码,通过 7 个原子工具 + 92 行 Agent Loop,赋予任意 LLM 对本地计算机的系统级控制能力,覆盖浏览器、终端、文件系统、键鼠输入、屏幕视觉及移动设备。

它的设计哲学是:不预设技能,靠进化获得能力。

每解决一个新任务,GenericAgent 就将执行路径自动固化为 Skill,供后续直接调用。使用时间越长,沉淀的技能越多,形成一棵完全属于你、从 3,300 行种子代码生长出来的专属技能树。

🤖 自举实证 — 本仓库的一切,从安装 Git、git init 到每一条 commit message,均由 GenericAgent 自主完成。作者全程未打开过一次终端。

📋 核心特性

  • 自我进化: 每次任务自动沉淀 Skill,能力随使用持续增长,形成专属技能树
  • 极简架构: ~3,300 行核心代码,Agent Loop 仅 92 行,无复杂依赖,部署零负担
  • 强执行力: 注入真实浏览器(保留登录态),7 个原子工具直接接管系统
  • 高兼容性: 支持 Claude / Gemini / Kimi 等主流模型,跨平台运行

🧬 自我进化机制

这是 GenericAgent 区别于其他 Agent 框架的根本所在。

[遇到新任务]-->[自主摸索](安装依赖、编写脚本、调试验证)-->
[将执行路径固化为 Skill]-->[写入记忆层]-->[下次同类任务直接调用]
你说的一句话 Agent 第一次做了什么 之后每次
"监控股票并提醒我" 安装 mootdx → 构建选股流程 → 配置定时任务 → 保存 Skill 一句话启动
"用 Gmail 发这个文件" 配置 OAuth → 编写发送脚本 → 保存 Skill 直接可用

用几周后,你的 Agent 实例将拥有一套任何人都没有的专属技能树,全部从 3,300 行种子代码中生长而来。

🎯 实例展示

🧋 外卖下单 📈 量化选股
Order Tea Stock Selection
"Order me a milk tea" — 自动导航外卖 App,选品并完成结账 "Find GEM stocks with EXPMA golden cross, turnover > 5%" — 量化条件筛股
🌐 自主网页探索 💰 支出追踪
Web Exploration Alipay Expense
自主浏览并定时汇总网页信息 "查找近 3 个月超 ¥2K 的支出" — 通过 ADB 驱动支付宝

📅 最新动态


🚀 快速开始

方法一:标准安装

# 1. 克隆仓库
git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent

# 2. 安装最小依赖
pip install streamlit pywebview

# 3. 配置 API Key
cp mykey_template.py mykey.py
# 编辑 mykey.py,填入你的 LLM API Key

# 4. 启动
python launch.pyw

方法二:Windows 便携版(推荐新手)

下载便携版(19MB,解压即用)

完整引导流程见 WELCOME_NEW_USER.md

方法三:Android(Termux)

cd /sdcard/ga
python agentmain.py

🤖 Bot 接口(可选)

QQ Bot

使用 qq-botpy WebSocket 长连接,无需公网 webhook

pip install qq-botpy

mykey.py 中补充:

qq_app_id = "YOUR_APP_ID"
qq_app_secret = "YOUR_APP_SECRET"
qq_allowed_users = ["YOUR_USER_OPENID"]  # 或 ['*'] 公开访问
python frontends/qqapp.py
# 或与桌面悬浮窗一起启动
python launch.pyw --qq

QQ 开放平台 创建机器人获取 AppID / AppSecret。首次消息后,用户 openid 记录于 temp/qqapp.log

飞书(Lark)

pip install lark-oapi
python frontends/fsapp.py          # 或 python launch.pyw --feishu
fs_app_id = "cli_xxx"
fs_app_secret = "xxx"
fs_allowed_users = ["ou_xxx"]  # 或 ['*']

入站支持:文本、富文本 post、图片、文件、音频、media、交互卡片 / 分享卡片
出站支持:流式进度卡片、图片回传、文件 / media 回传
视觉模型:图片首轮以真正的多模态输入发送给兼容 OpenAI Vision 的后端

详细配置见 assets/SETUP_FEISHU.md

企业微信(WeCom)

pip install wecom_aibot_sdk
python frontends/wecomapp.py       # 或 python launch.pyw --wecom
wecom_bot_id = "your_bot_id"
wecom_secret = "your_bot_secret"
wecom_allowed_users = ["your_user_id"]
wecom_welcome_message = "你好,我在线上。"

钉钉(DingTalk)

pip install dingtalk-stream
python frontends/dingtalkapp.py    # 或 python launch.pyw --dingtalk
dingtalk_client_id = "your_app_key"
dingtalk_client_secret = "your_app_secret"
dingtalk_allowed_users = ["your_staff_id"]  # 或 ['*']

Telegram Bot

# mykey.py
tg_bot_token = 'YOUR_BOT_TOKEN'
tg_allowed_users = [YOUR_USER_ID]
python frontends/tgapp.py

📊 与同类产品对比

特性 GenericAgent OpenClaw Claude Code
代码量 ~3,300 行 ~530,000 行 已开源(体量大)
部署方式 pip install + API Key 多服务编排 CLI + 订阅
浏览器控制 注入真实浏览器(保留登录态) 沙箱 / 无头浏览器 通过 MCP 插件
OS 控制 键鼠、视觉、ADB 多 Agent 委派 文件 + 终端
自我进化 自主生长 Skill 和工具 插件生态 会话间无状态
出厂配置 10 个 .py + 5 个 Skills 数百模块 丰富 CLI 工具集

🧠 工作机制

GenericAgent 通过分层记忆 × 最小工具集 × 自主执行循环完成复杂任务,并在执行过程中持续积累经验。

1️⃣ 分层记忆系统

记忆在任务执行过程中持续沉淀,使 Agent 逐步形成稳定且高效的工作方式

  • L0 — 元规则(Meta Rules):Agent 的基础行为规则和系统约束
  • L2 — 全局事实(Global Facts):在长期运行过程中积累的稳定知识
  • L3 — 任务 Skills(Standard Operating Procedure):完成特定任务的操作流程

2️⃣ 自主执行循环

感知环境状态 → 任务推理 → 调用工具执行 → 经验写入记忆 → 循环

整个核心循环仅 92 行代码agent_loop.py)。

3️⃣ 最小工具集

GenericAgent 仅提供 7 个原子工具,构成与外部世界交互的基础能力

工具 功能
code_run 执行任意代码
file_read 读取文件
file_write 写入文件
file_patch 修改文件
web_scan 感知网页内容
web_execute_js 控制浏览器行为
ask_user 人机协作确认

此外,还有 2 个记忆管理工具update_working_checkpointstart_long_term_update),使 Agent 能够跨会话积累经验、维持持久上下文。

4️⃣ 能力扩展机制

具备动态创建新的工具能力

通过 code_run,GenericAgent 可在运行时动态安装 Python 包、编写新脚本、调用外部 API 或控制硬件,将临时能力固化为永久工具。

GenericAgent 工作流程
GenericAgent 工作流程图

⭐ 支持

如果这个项目对您有帮助,欢迎点一个 Star! 🙏

同时也欢迎加入我们的GenericAgent体验交流群,一起交流、反馈和共建 👏

📄 许可

MIT License — 详见 LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genericagent-1.0.1.tar.gz (10.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genericagent-1.0.1-py3-none-any.whl (135.5 kB view details)

Uploaded Python 3

File details

Details for the file genericagent-1.0.1.tar.gz.

File metadata

  • Download URL: genericagent-1.0.1.tar.gz
  • Upload date:
  • Size: 10.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for genericagent-1.0.1.tar.gz
Algorithm Hash digest
SHA256 549635eac57b0cfd0c9d9f273af4342067780f04247fc83182b91e3915afbf99
MD5 41241c73d5504c51a4aa32dad08f9e71
BLAKE2b-256 130362691cf9577b5c47972143251fd334c2f46d1cedea37bd184f993c5c46c1

See more details on using hashes here.

File details

Details for the file genericagent-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: genericagent-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 135.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for genericagent-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4e76c965cfa62339f8dac830f6bf10414598782caffe210386115a3700dde782
MD5 569178dcf005a99e6bfbc94f0e8fb6a0
BLAKE2b-256 5c45ae6790c2962ec7f26d6b34a288db824c5ba1bd2cdf55c8080cb3bb543589

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page