CEDARScript grammar.js for tree-sitter
Project description
CEDARScript
A SQL-like language for efficient code analysis, transformations, and tool use. Most useful for AI code assistants.
Table of Contents
- What is CEDARScript?
- How to use it?
- CEDARScript ELI5'ed
- Technical Overview
- Key Features
- Supported Languages
- How can CEDARScript be used
- Proposals
- Related
What is CEDARScript?
It's a domain-specific language designed to improve how AI coding assistants interact with codebases and communicate their code modification intentions.
It provides a standardized way to express complex code modification and analysis operations, making it easier for AI-assisted development tools to understand and execute these tasks.
It also works as a gateway to external tools, so that the LLM can easily call local shell commands, external HTTP API endpoints, etc
How to use it
- You can easily install a tool that supports CEDARScript.
- Then, just ask the AI assistant to fix a bug or something in your codebase.
The assistant will write CEDARSCript
commands that will be executed by the CEDARScript runtime editor.
CEDARScript ELI5'ed
The Magical Librarian analogy
Imagine a vast library (your codebase
) with millions of books (files
) across thousands of shelves (directories
).
Traditional code editing is like manually searching through each book, line by line, character by character, to find
relevant information or make changes.
CEDARScript, on the other hand, is like having a magical librarian with superpowers, like:
- TurboKognition Boost (
Code Analysis
):- This librarian can act as an Omniscient Cataloger who can instantly tell you where any piece of information is located across all books.
- Want to know every place where a specific protagonist (
function
) is mentioned Or where he/she was born? Or find all the chapters (classes
) that discuss a particular topic (variable usage
)? The librarian provides this information immediately, without having to flip through pages (waste precious tokens
)
- The GanzPunktGenau Editing Powers (
Code Manipulation
):- When you want to make changes, instead of specifying exact page and line numbers, you can give high-level instructions. For example, "Add this new paragraph after the first mention of 'dragons' in the fantasy section" or "Move the chapter about 'time travel' to come before 'parallel universes' in all science fiction books." The librarian understands these abstract instructions and makes the precise edits across all relevant books, handling details like page layout and consistent formatting.
This magical librarian (CEDARScript
) collaborates with the LLM and allows it to assume the role of an Architect
who can work with your vast library of code at a higher level, making both understanding and modifying your codebase
faster and more intuitive. It bridges the gap between the LLM's high-level intent and the nitty-gritty details
of code structure, allowing the architect to focus on the 'what' while it handles the 'how' of code analysis
and modification.
Audio overview / Podcasts There are a few podcasts discussing CEDARScript you can listen to:
Technical Overview
CEDARScript
(Concise Examination, Development, And Refactoring Script) is a SQL-like language designed to
lower costs and improve the efficiency and accuracy of AI code assistants. It enables offloading low-level code syntax and
structure concerns, such as indentation and line counting, from the LLMs.
It aims to improve how AI coding assistants interact with codebases and communicate their code modification intentions
by providing a standardized and concise way to express complex code analysis and modification operations, making it easier for
AI-assisted development tools to understand and execute these tasks.
CEDARScript transforms LLMs from code writers into code architects.
The Architect doesn't need to specify every tiny detail - instead of spending expensive tokens writing out
complete code changes, it simply provides high-level blueprints using CEDARScript commands like
UPDATE FILE "main.py" MOVE FUNCTION "execute" INSERT AFTER FUNCTION "plan"
.
This division of labor between the architect and CEDARScript is not just efficient - it's economical. The Architect (LLM) conserves valuable resources (tokens) by focusing on strategic decisions rather than character- or line-level editing tasks.
The CEDARScript runtime then handles all the minute details - precise line numbers, indentation counts, and syntax consistency - at zero token cost.
Let's get to know the 3 primary functions offered by CEDARScript:
- Code Analysis to quickly get to know a large code base without having to read all contents of all files.
- The CEDARScript runtime searches through the whole code base and only returns the relevant results, thus reducing the token traffic between the LLM and the user;
- This can be used to more quickly understand key aspects of the codebase, search for all or specific identifiers (classes, methods, functions or variables) defined across ALL files of the project or in specific ones, etc.
- Search results can include not only identifier definitions (in whole or only the signature or summary),
but also call-sites and usages of an identifier;
- These results can be useful not only when the LLM needs to read them, but also when the LLM wants to show some
parts of the code to the user (why send a function to the user if the LLM can simply
SELECT
it and have the CEDARScript runtime show the contents?)
- These results can be useful not only when the LLM needs to read them, but also when the LLM wants to show some
parts of the code to the user (why send a function to the user if the LLM can simply
- Code Manipulation and Refactoring:
- The CEDARScript runtime bears the brunt of file
editing by locating the exact line numbers and characters to change, which indentation levels to apply to each line and
so on, allowing the CEDARScript commands to focus instead on higher levels of abstraction, like
identifier names, line markers, relative
indentations and positions
(
AFTER
,BEFORE
,INTO
a function, itsBODY
, at theTOP
orBOTTOM
of it...)
- The CEDARScript runtime bears the brunt of file
editing by locating the exact line numbers and characters to change, which indentation levels to apply to each line and
so on, allowing the CEDARScript commands to focus instead on higher levels of abstraction, like
identifier names, line markers, relative
indentations and positions
(
- Tool Use: The runtime acts as a gateway through which the LLM can send and receive information. This opens up many possibilities.
Key Features:
- Learning Curve
- For humans: its SQL-like syntax allows for intuitive code querying and manipulation (however, humans don't even need to learn it, as its primary purpose is to offer LLMs an easy language with which they can write simple, concise commands to modify code or analyse it);
- For AIs: some prompt engineering is enough to enable most LLMs (even cheaper ones like Gemini Flash) to learn it well. Other forms of fine-tuning are planned, so that even SLMs (Small Language Models) like Microsoft's Phi 3 could be able to learn CEDARScript. This has the potential to unlock locally-deployed SLMs to be used as AI code assistants.
- Shows improved results in refactoring benchmarks when compared to standard diff formats
- Gemini 1.5 Flash outperformed Claude 3.5 Sonnet
- Pass rate: 76.4% (beats Sonnet 3.5 at
64.0%
) - Well-formed cases: 94.4% (beats Sonnet 3.5 at
76.4%
)
- Pass rate: 76.4% (beats Sonnet 3.5 at
- Gemini 1.5 Flash outperformed Claude 3.5 Sonnet
- Reduced token usage via semantic-level code transformations, not character-by-character matching;
- Scalable to larger codebases with minimal token usage;
- Project-wide refactorings can be performed with a single, concise command
- Avoids wasted time and tokens on failed search/replace operations caused by misplaced spaces, indentations or typos;
- High-level abstractions for complex refactoring operations via refactoring languages (currently supports Rope syntax);
- Relative indentation for easily maintaining proper code structure;
- Allows fetching or modifying targeted parts of code;
- Locations in code: Doesn't use line numbers. Instead, offers more resilient alternatives, like:
- Line markers. Ex:
LINE "if name == 'some name':"
- Identifier markers (
VARIABLE
,FUNCTION
,CLASS
). Ex:FUNCTION 'my_function'
- Line markers. Ex:
- Language-agnostic design for versatile code analysis
- Code analysis operations return results in XML format for easier parsing and processing by LLM (Large Language Model) systems.
Supported Languages
Currently, CEDARScript
theoretically supports Python, Kotlin, PHP, Rust, Go, C++, C, Java, Javascript, Lua, FORTRAN, Scala and C#,
but only Python has been tested so far.
Cobol and MatLab: Initial queries for these languages are ready, but the Tree-Sitter parsers for them still need to be included.
Projects using the CEDARScript Language
- CEDARScript Integration: Aider - Provides
CEDARScript
edit format for Aider - CEDARScript AST Parser (Python)
- CEDARScript Editor
- CEDARScript Prompt Engineering
- Provides prompts that teach
CEDARScript
to LLMs - Also includes real conversations held via Aider in which an LLM uses this language to propose code modifications
- Provides prompts that teach
How can CEDARScript be used?
Improving LLM <-> codebase interactions
CEDARScript
can be used as a way to standardize and improve how AI coding assistants interact with codebases, learn about your code, and communicate their code modification intentions while keeping token usage low.
This efficiency allows for more complex operations within token limits.
It provides a concise way to express complex code modification and analysis operations, making it easier for AI-assisted development tools to understand and perform these tasks.
Codebase Interaction Examples
Quick example: turn a method into a top-level function, using CASE
filter with REGEX:
UPDATE FILE "baseconverter.py"
MOVE FUNCTION "convert"
INSERT BEFORE class "BaseConverter"
RELATIVE INDENTATION 0;
-- Update the call sites in encode() and decode() methods to use the top-level convert() function
UPDATE CLASS "BaseConverter"
FROM FILE "baseconverter".py
REPLACE BODY
WITH CASE -- Filter each line in the function body through this CASE filter
WHEN REGEX r"self\.convert\((.*?)\)"
THEN REPLACE r"convert(\1)"
END;
Use an ED script to change a function:
UPDATE FILE "app/main.py" REPLACE FUNCTION "calculate_total" WITH ED '''
-- Add type hints to parameters
1s/calculate_total(base_amount, tax_rate, discount, apply_shipping)/calculate_total(base_amount: float, tax_rate: float, discount: float, apply_shipping: bool) -> float/
-- Add docstring after function definition
1a
"""
Calculate the total amount including tax, shipping, and discount.
Args:
base_amount: Base price of the item
tax_rate: Tax rate as decimal (e.g., 0.1 for 10%)
discount: Discount as decimal (e.g., 0.2 for 20%)
apply_shipping: Whether to add shipping cost
Returns:
float: Final calculated amount rounded to 2 decimal places
"""
.
-- Add logging before return
/return/i
logger.info(f"Calculated total amount: {subtotal:.2f}")
.
''';
There are many more examples to look at...
Use as a refactoring language / diff format
One can use CEDARScript
to concisely and unambiguously represent code modifications at a higher level than a standard diff format can.
IDEs can store the local history of files in CEDARScript format, and this can also be used for searches.
Tool Use
If explicit configuration is set, the CEDARScript runtime can act as a gateway through which an LLM can:
- Call local commands (
ls
,grep
,find
,open
) - Run scripts
- Call external HTTP API services
- See the user's screen and take control of the mouse and keyboard
- Possibilities are numerous...
The output from the external tool is captured and sent back to the LLM.
Tool Use Examples
Run Python scripts to find the correct answer for certain types of problems
-- Suppose the LLM has difficulty counting letters...
-- It can delegate the counting to a Python script:
CALL LANGUAGE "python" WITH CONTENT '''
print("Refrigerator".lower().count('r'))
''';
-- Using env var
CALL LANGUAGE "python"
ENV CONTENT '''WORD=Refrigerator'''
WITH CONTENT '''
import os
print(os.environ['WORD'].count('r'))
''';
-- Using env var from the host computer
CALL LANGUAGE "python"
ENV INHERIT ONLY 'WORD'
WITH CONTENT '''
import os
print(os.environ['WORD'].count('r'))
''';
Obtain the current local weather
CALL COMMAND
ENV INHERIT ONLY 'LOCATION' -- Get the current location from the host env var
WITH CONTENT r'''
#!/bin/bash
curl -s "wttr.in/$LOCATION?format=%l:+%C+%t,+feels+like+%f,+%h+humidity"
''';
Get a list of image files in the current working dir
CALL LANGUAGE "bash"
WITH CONTENT r'''
find . -type f -name "*.jpg"
''';
Take a peek at the user's screen and right-click on the user's clock widget
CALL LANGUAGE "python"
WITH CONTENT r'''
import pyautogui
import time
from datetime import datetime
import os
# Take screenshot and save it
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
screenshot_path = f"screen_{timestamp}.png"
pyautogui.screenshot(screenshot_path)
# Print the path so the LLM can analyze the image
print(f"IMAGE_PATH={screenshot_path}")
''';
After the LLM takes a look at the screenshot, it finds the clock and sends a mouse click:
CALL LANGUAGE "python"
ENV r'''
X=1850 # Coordinates provided by LLM after image analysis
Y=12 # Coordinates provided by LLM after image analysis
'''
WITH CONTENT r'''
import pyautogui
import os
# Get coordinates from environment
x = int(os.environ['X'])
y = int(os.environ['Y'])
# Move and click
pyautogui.moveTo(x, y, duration=1.0)
pyautogui.click()
print(f"Clicked at ({x}, {y})")
''';
Other Ideas to Explore
- Code review systems for automated, in-depth code assessments
- Automated code documentation and explanation tools
- ...
Proposals
Related
- .QL - Object-oriented query language that enables querying Java source code using SQL-like syntax;
- JQL (Java Query Language) - Allows querying Java source code with SQL. It's designed for Java code analysis and linting;
- Joern - While primarily focused on C/C++, Joern is an open-source code analysis platform that uses a custom graph database to store code property graphs. It allows querying code using a Scala-based domain-specific language;
- Codebase Context Suite - A comprehensive tool for managing codebase context, generating prompts, and enhancing development workflows;
- CONVENTIONS.md
See Also
Unrelated
- Cedar Policy Language ('CEDARScript' is not a policy language. 'Cedar' and 'CEDARScript' are totally unrelated.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cedarscript_grammar-0.5.2.tar.gz
.
File metadata
- Download URL: cedarscript_grammar-0.5.2.tar.gz
- Upload date:
- Size: 136.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab4bbf969a369b0c626a43d736cedfd010e007716c43a6745d66c16a231b2ea9 |
|
MD5 | 808aeac0ecc29496cdd3df269e5782b1 |
|
BLAKE2b-256 | 0461fce53f7f825000dca0af550ac375381633465f8603736fe9599db25eb850 |
File details
Details for the file cedarscript_grammar-0.5.2-py3-none-any.whl
.
File metadata
- Download URL: cedarscript_grammar-0.5.2-py3-none-any.whl
- Upload date:
- Size: 106.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37601b67f954fd435f4d28b2279aa3e11ebcdf4c82d1add525bfe4dd54baeb63 |
|
MD5 | 07d2863804eea4f843af918c15af0179 |
|
BLAKE2b-256 | 6a5eda704aa6d2f07aa2a7ef7162c84e03d8e6ca5f4e1d73ff69946f91e93040 |