Skip to main content

LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.

Project description

Speechless LLM based Agents

LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.

Speechless.AI

Speechless.AI is committed to integrating the superior language processing and deep reasoning capabilities of large language models into practical business applications. By enhancing the model's language understanding, knowledge accumulation, and text creation abilities, and introducing long-term memory, external tool integration, and local deployment, our aim is to establish an intelligent collaborative partner that can independently interact, continuously evolve, and closely align with various business scenarios.

  • Firstly, we focus on building a large model with enhanced reasoning capabilities, ensuring its outstanding performance in language processing and logical analysis.

  • Next, we design and implement an efficient operational framework for the intelligent entity. This framework not only supports rapid deployment and invocation of the model but also boasts features like autonomous interaction, real-time feedback adjustment, context awareness, and long-term memory. For instance, in customer service scenarios, the intelligent entity can provide more precise and personalized responses based on a user's historical interactions and current context. In content recommendation scenarios, it can dynamically adjust its strategies by capturing real-time shifts in user interests.

  • Ultimately, we integrate it with real business scenarios, ensuring that the intelligent entity seamlessly aligns with various business processes, delivering tangible value to enterprises.

What's New

speechless.ai.overview

Speechless.Tools

The speechless-tools-7b model is fine-tuned on speechless-coding-7b-16k-tora, following the guidance of the ToolLlama project, aims to empower open-source LLMs with the ability to handle thousands of diverse real-world APIs.

speechless-tools-7b-dfs vs chatgpt-cot

Dataset Win Rate
G1_instruction 0.465
G1_category 0.495
G1_tool 0.505
G2_instruction 0.61
G2_category 0.585
G3_instruction 0.66

speechless-tools-7b-dfs vs toolllama-dfs

Dataset Win Rate
G1_instruction 0.45
G1_category 0.45
G1_tool 0.51
G2_instruction 0.53
G2_category 0.575
G3_instruction 0.46

Models

Models Repositry

⭐️ My Focus 🔥🔥🔥 DL > 10k/month 🔥🔥 DL > 7K/month 🔥 DL > 3K/month

Mar. 2024

Feb. 2024

Jan. 2024

Dec. 2023

Nov. 2023

Oct. 2023

Sep. 2023

Aug. 2023

CodeLlama based Models

Mistral based Models

Tora based Models

  • ⭐️ speechless-coding-7b-16k-tora 2023.11.01

    Fine-tune on the llm_agents/tora-code-7b-v1.0. The primary goal is to enhance the code generation capability of the model, thereby achieving a large-scale intelligent agent base model with good planning and reasoning abilities.

    HumanEval & MultiPL-E

    HumanEval-Python Python Java JavaScript CPP Rust Go Shell Julia D Lua PHP R
    52.44 55.96 37.84 46.93 37.48 29.01 28.99 12.11 31.47 12.05 26.52 39.25 22.09
  • 🔥🔥 speechless-tora-code-7b-v1.0 2023.10.10

    GPTQ GGUF AWQ by TheBloke

Llama2 based Models

Datasets

speechless.finetune

python -m speechless.finetune init --task_name my_task

python -m speechless.finetune run --task_name my_task

python -m speechless.finetune merge --task_name my_task

python -m speechless.finetune backup --task_name my_task

python -m speechless.finetune list

Install speechless

pip install speechless

Prepare train dataset

The training dataset is a jsonl file, with each line containing a JSON formatted instruction data. The data format is as follows:

{
    "conversations":[
        {"from": "human", "value": "Human's Instruction"},
        {"from": "assistant", "value": "Assistant's response"}
    ],
    "prompt_type": "alpaca", # Current support 'alpaca', 'toolllama-multi-rounds', default is 'alpaca' if prompt_type set to empty.
    "system_prompt": "", # Use alpaca system prompt if system_prompt filed is empty, otherwise use it as system prompt of this instruction.
    "category": "my_category", # User customized category, can be anythings.
}

Run Fine-tune

#!/bin/bash
SCRIPT_PATH=$(cd $(dirname ${BASH_SOURCE[0]}); pwd)

# -------------------- Model --------------------
export MODELS_ROOT_DIR=/opt/local/llm_models/huggingface.co
export BASE_MODEL_PATH=${MODELS_ROOT_DIR}/llm_agents/tora-code-7b-v1.0
export TEST_MODEL_PATH=${MODELS_ROOT_DIR}/speechlessai/$(basename ${PWD})

# -------------------- Dataset --------------------
export SPEECHLESS_DATA_DIR=/opt/local/datasets/speechless_data
export DATASET=${SPEECHLESS_DATA_DIR}/speechless-toolbench-multi-rounds.jsonl
export DATASET_FORMAT=dialog

# -------------------- Environment --------------------
export OUTPUT_DIR=./outputs
export RAY_memory_monitor_refresh_ms=0

# -------------------- Task --------------------
export TASK_NAME=$(basename ${TEST_MODEL_PATH})
export TASK_CHECKPOINT_DIR=${OUTPUT_DIR}
export WANDB_PROJECT=${TASK_NAME}

# -------------------- Train --------------------
export SAVE_STEPS=10
export EVAL_STEPS=10
export WARMUP_STEPS=10
export MAX_EVAL_SAMPLES=200
export EVAL_DATASET_SIZE=0.005
export GROUP_BY_LENGTH=False
export LR_SCHEDULER_TYPE=cosine
export LEARNING_RATE=2e-4

export BITS=4
export LORA_R=32
export LORA_ALPHA=256

export MODEL_MAX_LENGTH=32768
export ROPE_THETA=1000000
export SLIDING_WINDOW=8192

export NUM_GPUS=2
export NUM_TRAIN_EPOCHS=3

export SAVE_STRATEGY=epoch
export SAVE_TOTAL_LIMIT="--save_total_limit ${NUM_TRAIN_EPOCHS}"

export PER_DEVICE_TRAIN_BATCH_SIZE=2
export GRADIENT_ACCUMULATION_STEPS=16
export MAX_MEMORY_MB=32000

PYTHONPATH=${SPEECHLESS_ROOT} \
torchrun --nnodes=1 --nproc_per_node=${NUM_GPUS} \
    -m speechless.finetune.finetune_dialog \
    --task_name ${TASK_NAME} \
    --run_name $(date +%Y%m%d-%H%M%S) \
    --model_name_or_path ${BASE_MODEL_PATH} \
    --output_dir ${OUTPUT_DIR} \
    --num_train_epochs ${NUM_TRAIN_EPOCHS} \
    --data_seed 10042 \
    --save_strategy ${SAVE_STRATEGY} \
    ${SAVE_TOTAL_LIMIT} \
    --evaluation_strategy steps \
    --eval_dataset_size ${EVAL_DATASET_SIZE} \
    --save_steps ${SAVE_STEPS} \
    --eval_steps ${EVAL_STEPS} \
    --warmup_steps ${WARMUP_STEPS} \
    --max_train_samples ${MAX_TRAIN_SAMPLES} \
    --max_eval_samples ${MAX_EVAL_SAMPLES} \
    --dataloader_num_workers 3 \
    --logging_strategy steps \
    --logging_steps 1 \
    --report_to tensorboard \
    --remove_unused_columns False \
    --do_train \
    --max_memory_MB ${MAX_MEMORY_MB} \
    --bits ${BITS} \
    --lora_r ${LORA_R} \
    --lora_alpha ${LORA_ALPHA} \
    --lora_dropout 0.05 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bf16 \
    --sliding_window ${SLIDING_WINDOW} \
    --rope_theta ${ROPE_THETA} \
    --dataset ${DATASET} \
    --dataset_format ${DATASET_FORMAT} \
    --max_new_tokens ${MODEL_MAX_LENGTH} \
    --model_max_len ${MODEL_MAX_LENGTH} \
    --per_device_train_batch_size ${PER_DEVICE_TRAIN_BATCH_SIZE} \
    --gradient_accumulation_steps ${GRADIENT_ACCUMULATION_STEPS} \
    --per_device_eval_batch_size 1 \
    --learning_rate ${LEARNING_RATE} \
    --lr_scheduler_type ${LR_SCHEDULER_TYPE} \
    --weight_decay 0.0 \
    --seed 10042 \
    --optim paged_adamw_8bit \
    --gradient_checkpointing True \
    --group_by_length ${GROUP_BY_LENGTH} \
    --ddp_find_unused_parameters False \
    --force_remove_overlength_samples False \
    --flash_attention True 

speechless.quant

Speechless currently supports GGUF quantification, including the following types: q4_k_m, q5_k_m, q8_0.

# quant_type: q4_km/q5_km/q8_0
python -m speechless.quant llamacpp --model_path path/to/hf/model --llamacpp_quant_type <quant_type>

speechless.infer

Ollama is used as default backend, and litellm is used as default frontend api.

The unified classic application paradigm is to use the unified OpenAI API access interface, and the backend defaults to using the GGUF Q4_K_M quantization model.

python -m speechless.infer litellm_proxy --litellm_port 18342

Import GGUF into ollama

python -m speechless.infer ollama_create path/to/gguf/file

speechless.api.server

python -m speechless.api.server \
    start \
    --model ${TASK_MODEL_PATH} \
    --backbone vllm \
    --host 0.0.0.0 \
    --port 5001

speechless.eval

Speechless supports HumanEval, MultiPL-E, SQLEval, lm-evaluation-harness.

lm-evluation-harness

LMEVAL_OUTPUT_DIR=eval_results/lm_eval/${TASK_NAME}

# lmeval
python -m speechless.eval.lmeval \
    --do_gen \
    --model hf-causal-experimental \
    --model_args pretrained=${TEST_MODEL_PATH},use_accelerate=True \
    --batch_size 4 \
    --output_path ${LMEVAL_OUTPUT_DIR} 

# lmeval_show_results
python -m speechless.eval.lmeval \
    --show_results \
    --output_path eval_results/lm_eval/${TASK_NAME} 
    --output_path ${LMEVAL_OUTPUT_DIR} 

HumanEval

Execute the HumanEval geenrate command on the GPU server where the model is located.

HUMANEVAL_OUTPUT_DIR=eval_results/human_eval/${TASK_NAME}

# humaneval
PYTHONLIB=${SPEECHLESS_ROOT} \
python -m speechless.eval.humaneval \
    --do_gen \
    --do_eval \
    --model ${TEST_MODEL_PATH} \
    --output_dir ${HUMANEVAL_OUTPUT_DIR}

# humaneval_show_results
PYTHONLIB=${SPEECHLESS_ROOT} \
python -m speechless.eval.lmeval \
    --show_result \
    --output_path ${HUMANEVAL_OUTPU_DIR}

bigcode-evaluation-harness

docker pull ghcr.io/bigcode-project/evaluation-harness
docker tag ghcr.io/bigcode-project/evaluation-harness evaluation-harness

MultiPL-E

docker pull ghcr.io/bigcode-project/evaluation-harness-multiple
docker tag ghcr.io/bigcode-project/evaluation-harness-multiple evaluation-harness-multiple
python -m speechless.eval.multiple \
    genrate \
    --name ${TASK_MODEL_PATH} \
    --output_dir_prefix ${EVAL_OUTPUT_DIR} \

python -m speechless.eval.multiple \
    eval \
    --results_dir ${EVAL_OUTPUT_DIR}

SQLEval

python -m speechless.eval.sqleval \
    genrate \
    --model ${TASK_MODEL_PATH} \
    --output_dir ${EVAL_OUTPUT_DIR} \

python -m speechless.eval.sqleval \
    eval \
    --eval_dir ${EVAL_OUTPUT_DIR}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechless-0.9.0.tar.gz (507.5 kB view details)

Uploaded Source

Built Distribution

speechless-0.9.0-py3-none-any.whl (656.1 kB view details)

Uploaded Python 3

File details

Details for the file speechless-0.9.0.tar.gz.

File metadata

  • Download URL: speechless-0.9.0.tar.gz
  • Upload date:
  • Size: 507.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for speechless-0.9.0.tar.gz
Algorithm Hash digest
SHA256 91a9e8b1af6209286850d0d36916f51eff7f1c9b17ccde4d0faa858889f0aa65
MD5 96a05968a0cdb8e85c9ecfc901b01be8
BLAKE2b-256 c17b0762dec00ad7d0f4cedcc42ada048db7e3ae1422976c1a9c23be63d1b7a0

See more details on using hashes here.

File details

Details for the file speechless-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: speechless-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 656.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for speechless-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe3e450c8f37a9a1917836e1ab812b5a64440f10d53d2e34cd4b67e6e559f4af
MD5 aa45bd9f5fe3edca2463de968cd0db11
BLAKE2b-256 9785f5d3de18038f9d0d6fd41ff4b4e689b05a491b8a0b2b72b873128a8b21dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page