No project description provided
Project description
Evo Researcher
Overview
This project is aimed to be an iteration on top of Autonolas AI Mechs's research process aimed towards making informed predictions.
It contains two primary features:
- An information research function
- An information grading function
Additionally, Mech's predict capability has been ported into a function that can also be independently run from this repo.
Below, there's a high level explanation of their implementations, respectively.
Research Function
The research function takes a question, like "Will Twitter implement a new misinformation policy before the 2024 elections?"
and will then:
- Generate n web search queries
- Re-rank the queries using an LLM call, and then select the most relevant ones
- Search the web for each query, using Tavily
- Scrape and sanitize the content of each result's website
- Use Langchain's
RecursiveCharacterTextSplitter
to split the content of all pages into chunks. - Create embeddings of all chunks, and store the source of each as metadata
- Iterate over the queries selected on step
2
. And for each one of them, vector search for the most relevant embeddings for each. - Aggregate the chunks from the previous steps and prepare a report.
Grading Function
For the implmentation of this function, the information quality criteria were selected from https://guides.lib.unc.edu/evaluating-info/evaluate, ignoring usability
and intended audience
.
Upon receiving a question like "Will Twitter implement a new misinformation policy before the 2024 elections?"
and information, it will:
- Create en evaluation plan
- Perform the evaluation of the information according to the plan from the previous step
- Extract the scores from the evaluation
Predict
Ported implementation from: https://github.com/valory-xyz/mech/blob/main/tools/prediction_request_embedding/prediction_sentence_embedding.py
Installation
poetry install
poetry shell
Run
Research
With Evo:
poetry run python ./evo_researcher/main.py research "Will Twitter implement a new misinformation policy before the 2024 elections?" evo
With Autonolas:
poetry run python ./evo_researcher/main.py research "Will Twitter implement a new misinformation policy before the 2024 elections?" autonolas
Predict
poetry run python ./evo_researcher/main.py predict "Will Twitter implement a new misinformation policy before the 2024 elections?" ./outputs/myinfopath
Evaluate
poetry run python ./evo_researcher/main.py evaluate "Will Twitter implement a new misinformation policy before the 2024 elections?" ./outputs/myinfopath
Test
Run all questions
pytest
Run specific questions
Use pytest
's -k
flag and a string matcher. Example:
pytest -k "Twitter"
Example results
- Evo Reports: https://hackmd.io/mrCRBJyiTi-aO1gSFPjoDg
- Evo Information excerpts: https://hackmd.io/VQGJgHD1SImZrDR7puauJA?both
- Autonolas Information excerpts: https://hackmd.io/5qt_0HkvQyuGZ2r0RqJtoQ
Ideas for future improvement
For the researcher:
- Using LLM re-ranking, like Cursor to optimize context-space and reduce noise
- Use self-consistency and generate several reports and compare them to choose the best, or even merge information
- Plan research using more complex techniques like tree of thoughts
- Implement a research loop, where research is performed and then evaluated. If the evaluation scores are under certain threshold, re-iterate to gather missing information or different sources, etc.
- Perform web searches under different topic or category focuses like Tavily does. For example, some questions benefit more from a "social media focused" research: gathering information from twitter threads, blog articles. Others benefit more from prioritizing scientific papers, institutional statements, and so on.
- Identify strong claims and perform sub-searches to verify them. This is the basis of AI powered fact-checkers like: https://fullfact.org/
- Evaluate sources credibility
- Further iterate over chunking and vector-search strategies
- Use HyDE
For the information evaluator/grader
- Use self-consistency to generate several scores and choose the most repeated ones.
- Enhance the evaluation and reduce its biases through the implementation of more advanced techniques, like the ones described here https://arxiv.org/pdf/2307.03025.pdf and here https://arxiv.org/pdf/2305.17926.pdf
- Further evaluate biases towards writing-style, length, among others described here: https://arxiv.org/pdf/2308.02575.pdf and mitigate them
- Evaluate using different evaluation criteria
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file evo_researcher-0.1.5.tar.gz
.
File metadata
- Download URL: evo_researcher-0.1.5.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/6.5.0-14-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6334bbb12d960ec08ea30e4021e48cfa1615aafe1c8a66efd775ccd015070452 |
|
MD5 | 16b2601aad10ff3b5845bbca6a4f69a6 |
|
BLAKE2b-256 | 974c360f0848e2f1cac8662bffeb31f95cfbe72f34e5d86b3227c4bc4341055c |
File details
Details for the file evo_researcher-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: evo_researcher-0.1.5-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/6.5.0-14-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a75b7c510e1a4602f860b8b8ab4f04eae9798aca97d0f8235ce2e44ee60beb21 |
|
MD5 | 23db020216684803141e31ed6051407d |
|
BLAKE2b-256 | 725d2e15aeebda22fee0de2fd1ebdc99b1b6da514b6bdfe1b3db1c3ebfeac220 |