MASA SDK - Masa's AI Software Architecture
Project description
Masa AI Software Architecture
MASA is a project for data retrieval, quality control, and orchestration. It currently provides tools to retrieve data from Twitter using the Masa Protocol Node API, with plans to expand to other data sources and functionalities in the future.
Note: This SDK requires a Masa Protocol Node to be running on the system. Instructions on how to install and run a node can be found here.
Quick Start
-
Install the MASA package:
pip install masa-ai
If you encounter issues running or installing
masa-ai, please refer to the System Requirements section to ensure you have the necessary system dependencies installed. -
Create a
request_list.jsonfile with the queries you'd like to process. This file can be placed anywhere on your system. Here is an example of what therequest_list.jsonmight look like:[ { "scraper": "XTwitterScraper", "endpoint": "data/twitter/tweets/recent", "priority": 1, "params": { "query": "#example", "count": 100 } }, { "scraper": "XTwitterScraper", "endpoint": "data/twitter/tweets/recent", "priority": 2, "params": { "query": "from:example_user", "count": 50 } } ]
An example
request_list.jsonfile is included in the package. You can find it in the examples folder at the following path:EXAMPLE_PATH=$(pip show masa-ai | grep Location | awk '{print $2"/masa_ai/examples/request_list.json"}') echo "Example request_list.json path: $EXAMPLE_PATH"
-
Use the MASA CLI:
masa-ai-cli <command> [options]
Available commands:
process [path_to_requests_json]: Process all requests (both resumed and new).docs [page_name]: Rebuild and view the documentation for the specified page (page_nameis optional).data: List the scraped data files.list-requests [--statuses STATUS_LIST]: List requests filtered by statuses.clear-requests [REQUEST_IDS]: Clear queued or in-progress requests by IDs.
Examples:
# Process requests from a JSON file masa-ai-cli process /path/to/request_list.json # View the usage documentation masa-ai-cli docs usage # List scraped data files masa-ai-cli data # List queued and in-progress requests masa-ai-cli list-requests # List requests with specific statuses masa-ai-cli list-requests --statuses completed,failed # Clear all queued and in-progress requests masa-ai-cli clear-requests # Clear specific requests by IDs masa-ai-cli clear-requests req1,req2,req3
-
Accessing Scraped Data:
By default, the data that is scraped is saved to the current working directory under the
datafolder. You can designate a different directory by setting theDATA_DIRECTORYin the configuration. To list all scraped data files, use the following command:masa-ai-cli dataThis will display the structure of the
datafolder and list all the files contained within it. -
Recommendations for Accessing and Using Scraped Data:
-
Command Line: You can navigate to the
datafolder using the command line to view and manipulate the files directly.IMPORTANT: The
datafolder is created when you run themasa-ai-cli process [path_to_requests_json]command.# Navigate to the data directory cd /path/to/your/data_directory
If you have set a custom
DATA_DIRECTORYin your configuration, replace/path/to/your/data_directorywith the path you have designated. You can use this path to access data for further processing, analysis, and utilization with agents.
-
-
For detailed usage instructions, please refer to the Usage Guide.
Managing Requests
The MASA CLI now provides commands to manage your data retrieval requests more effectively.
Listing Requests
You can list the current requests that are queued or in progress:
masa-ai-cli list-requests
By default, this command lists requests with statuses queued and in_progress. You can specify other statuses using the --statuses option:
masa-ai-cli list-requests --statuses completed,failed
To list all requests regardless of their status:
masa-ai-cli list-requests --statuses all
Clearing Requests
To clear all requests that are queued or in progress:
masa-ai-cli clear-requests
To clear specific requests by their IDs:
masa-ai-cli clear-requests req1,req2,req3
Requests that are cleared will have their status changed to
cancelledand will not be processed.
Configuration
The project uses YAML files for configuration:
configs/settings.yaml: Main configuration file containing settings for Twitter API, request management, and logging.configs/.secrets.yaml: (Optional) File for storing sensitive information like API keys.
The settings.yaml file is loaded using Dynaconf, which allows for easy environment-based configuration management.
Advanced Twitter Search
The Masa Protocol Node API provides advanced search capabilities for retrieving Twitter data. Some of the available search options include:
- Hashtag Search:
#hashtag - Mention Search:
@username - From User Search:
from:username - Keyword Exclusion:
-keyword - OR Operator:
term1 OR term2 - Geo-location Based Search:
geocode:latitude,longitude,radius - Language-Specific Search:
lang:language_code
For more details, refer to the Masa Protocol Twitter Docs.
Project Structure
masa_ai/: Main package directoryconfigs/: Configuration filesconnections/: API connection handlerstools/: Core functionality modulesqc/: Quality control toolsretrieve/: Data retrieval toolsutils/: Utility functions
orchestration/: Request management and processinglogs/: Log filesdata/: Scraped dataexamples/: Example files
System Requirements {#system-requirements}
If you run into issues running or installing masa-ai, ensure you have the necessary system dependencies installed.
On Debian-based systems (e.g., Ubuntu)
Install build-essential:
sudo apt-get update
sudo apt-get install -y build-essential
On Red Hat-based systems (e.g., CentOS)
Install Development Tools:
sudo yum groupinstall 'Development Tools'
On macOS
Install Xcode Command Line Tools:
xcode-select --install
On Windows
- Download and install the Microsoft Visual C++ Build Tools.
- Ensure that the installation includes the "Desktop development with C++" workload.
- Install
makeusing Chocolatey:
choco install make
Dependencies
Key dependencies include:
- Data Processing:
numpy,pandas - API Interaction:
requests - Configuration:
dynaconf,pyyaml,python-dotenv - Quality Control:
colorlog - Progress Display:
tqdm - Documentation:
sphinx,sphinx_rtd_theme,recommonmark,myst-parser - Jupyter Notebooks:
jupyter,notebook,ipykernel - Database Interaction:
psycopg2-binary - Data Parsing:
feedparser
For a full list of dependencies, refer to pyproject.toml.
Documentation
The MASA project uses Sphinx to generate its documentation. The documentation is automatically rebuilt and viewed when using the docs command with the masa-ai-cli command.
To view the documentation:
masa-ai-cli docs [page_name]
This command will rebuild and view the documentation for the specified page. Note that the [page_name] is optional. If no page name is provided, the documentation for the entire project will be displayed.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file masa_ai-0.2.7.tar.gz.
File metadata
- Download URL: masa_ai-0.2.7.tar.gz
- Upload date:
- Size: 51.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a2fe81b0dd75b631b9825b6d3107666b4c2be14fbd815e28b53aea65a4b124e
|
|
| MD5 |
24222bc66448d504d67c20cb5192b05e
|
|
| BLAKE2b-256 |
770e6e07daefc16de93b8f779fd0f349ce9c48c240ecc551c0e1c5a6d7071ca5
|
File details
Details for the file masa_ai-0.2.7-py3-none-any.whl.
File metadata
- Download URL: masa_ai-0.2.7-py3-none-any.whl
- Upload date:
- Size: 74.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7000ce89756d52de4814559024af3616c6e23c652113fa9aefa9a3b481a46562
|
|
| MD5 |
67bbf07db4c5e8dbc645de2d2d1406c9
|
|
| BLAKE2b-256 |
1c0de3a8f5514e499a6fc0f35fb3131010c2df73c98c19007eb01c430c7915c8
|