Skip to main content

No project description provided

Project description

Gossiphs = Gossip Graphs

Crates.io Version RealWorld Test

"Zero setup" general code file relationship analysis. With Python & Rust. Based on tree-sitter and git analysis.

What's it

Gossiphs can analyze the history of commits and the relationships between variable declarations and references in your codebase to obtain a relationship diagram of the code files.

It also allows developers to query the content declared in each file, thereby enabling free search for its references throughout the entire codebase to achieve more complex analysis.

graph TD
    A[main.py] --- S1[func_main] --- B[module_a.py]
    A --- S2[Handler] --- C[module_b.py]
    B --- S3[func_util] --- D[utils.py]
    C --- S3[func_util] --- D
    A --- S4[func_init] --- E[module_c.py]
    E --- S5[process] --- F[module_d.py]
    E --- S6[Processor] --- H[module_e.py]
    H --- S7[transform] --- I[module_f.py]
    I --- S3[func_util] --- D

Supported Languages

We are expanding language support based on Tree-Sitter Query, which isn't too costly. If you're interested, you can check out the contribution section.

Language Status
Rust
Python
TypeScript
JavaScript
Golang
Java
Kotlin
Swift

You can see the rule files here.

Usage

Python

pip install gossiphs

Analyze your codebase with networkx within 30 lines:

import networkx as nx
from gossiphs import GraphConfig, create_graph, Graph

config = GraphConfig()
config.project_path = "../.."
graph: Graph = create_graph(config)

nx_graph = nx.DiGraph()

for each_file in graph.files():
    nx_graph.add_node(each_file, metadata=graph.file_metadata(each_file))

    related_files = graph.related_files(each_file)
    for each_related_file in related_files:
        related_symbols = set(each_symbol.symbol.name for each_symbol in each_related_file.related_symbols)

        nx_graph.add_edge(
            each_file,
            each_related_file.name,
            related_symbols=list(related_symbols)
        )

print(f"NetworkX graph created with {nx_graph.number_of_nodes()} nodes and {nx_graph.number_of_edges()} edges.")

for src, dest, data in nx_graph.edges(data=True):
    print(f"{src} -> {dest}, related symbols: {data['related_symbols']}")

Output:

NetworkX graph created with 13 nodes and 27 edges.
src/server.rs -> src/main.rs, related symbols: ['server_main']
src/main.rs -> src/graph.rs, related symbols: ['default']
src/main.rs -> examples/mini.rs, related symbols: ['default']
src/main.rs -> src/server.rs, related symbols: ['main']
src/symbol.rs -> src/graph.rs, related symbols: ['link_file_to_symbol', 'list_references', 'list_references_by_definition', 'id', 'enhance_symbol_to_symbol', 'add_file', 'add_symbol', 'list_definitions', 'list_symbols', 'new', 'link_symbol_to_symbol', 'get_symbol']
...

More examples can be found here.

Others

We also provide a CLI and additional usage options, making it easy to directly export CSV files or start an HTTP service.

See usage page.

Goal & Motivation

[!TIP] Create a file relationship index with:

  • low cost
  • acceptable accuracy
  • high versatility for nearly any code repository

Code navigation is a fascinating subject that plays a pivotal role in various domains, such as:

  • Guiding the context during the development process within an IDE.
  • Facilitating more convenient code browsing on websites.
  • Analyzing the impact of code changes in Continuous Integration (CI) systems.
  • ...

In the past, I endeavored to apply LSP/LSIF technologies and techniques like Github's Stack-Graphs to impact analysis, encountering different challenges along the way. For our needs, a method akin to Stack-Graphs aligns most closely with our expectations. However, the challenges are evident: it requires crafting highly language-specific rules, which is a considerable investment for us, given that we do not require such high precision data.

We attempt to make some trade-offs on the challenges currently faced by stack-graphs to achieve our expected goals to a certain extent:

  • Zero repo-specific configuration: It can be applied to most languages and repositories without additional configuration.
  • Low extension cost: adding rules for languages is not high.
  • Acceptable precision: We have sacrificed a certain level of precision, but we also hope that it remains at an acceptable level.

How it works

Gossiphs constructs a graph that interconnects symbols of definitions and references.

  1. Extract imports and exports: Identify the imports and exports of each file.
  2. Connect nodes: Establish connections between potential definition and reference nodes.
  3. Refine edges with commit histories: Utilize commit histories to refine the relationships between nodes.

Unlike stack-graphs, we have omitted the highly complex scope analysis and instead opted to refine our edges using commit histories. This approach significantly reduces the complexity of rule writing, as the rules only need to specify which types of symbols should be exported or imported for each file.

While there is undoubtedly a trade-off in precision, the benefits are clear:

  1. Minimal impact on accuracy: In practical scenarios, the loss of precision is not as significant as one might expect.
  2. Commit history relevance: The use of commit history to reflect the influence between code segments aligns well with our objectives.
  3. Language support: We can easily support the vast majority of programming languages, meeting the analysis needs of various types of repositories.

Precision

Static analysis has its limits, such as dynamic binding. Therefore, it is unlikely to achieve the level of accuracy provided by LSP, but it can offer sufficient accuracy in the areas where it is primarily used.

The method we use to demonstrate accuracy is to compare the results with those of LSP/LSIF. It must be admitted that static inference is almost impossible to obtain all reference relationships like LSP, but in strict mode, our calculation accuracy is still quite considerable. In normal mode, you can decide whether to adopt the relationship based on the weight returned.

Repo Precision (Strict Mode) Graph Generated Time
https://github.com/williamfzc/srctx 80/80 = 100 % 83.139791ms
https://github.com/gin-gonic/gin 160/167 = 95.80838 % 310.6805ms

Contribution

The project is still in a very early and experimental stage. If you are interested, please leave your thoughts through an issue. In the short term, we hope to build better support for more languages.

You just need to:

  1. Edit rules in src/rule.rs
  2. Test it in src/extractor.rs
  3. Try it with your repo in src/graph.rs

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gossiphs-0.9.17.tar.gz (45.9 kB view details)

Uploaded Source

Built Distributions

gossiphs-0.9.17-cp38-abi3-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.8+ Windows x86-64

gossiphs-0.9.17-cp38-abi3-manylinux_2_34_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.34+ x86-64

gossiphs-0.9.17-cp38-abi3-manylinux_2_34_i686.whl (4.8 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.34+ i686

gossiphs-0.9.17-cp38-abi3-macosx_11_0_arm64.whl (4.1 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

gossiphs-0.9.17-cp38-abi3-macosx_10_12_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file gossiphs-0.9.17.tar.gz.

File metadata

  • Download URL: gossiphs-0.9.17.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for gossiphs-0.9.17.tar.gz
Algorithm Hash digest
SHA256 9bb0b730d40e2dd529b6c3a0615e3865ea28cd66f9249f849ef3b5282c378d1c
MD5 cd5cd4cccc43d3ea8612a8ae86aea934
BLAKE2b-256 41da57c0bf9cf4a1bdfa5cc9ab5538b879189db97df3f854cbbdab1b6a6b04c1

See more details on using hashes here.

File details

Details for the file gossiphs-0.9.17-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for gossiphs-0.9.17-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6879ee3eb0032a77f2273fe1e9e3cdc033827dc5e884b01348ff7d6dc621f3ed
MD5 0aff0f491f51cb685a70c7eeeabc0a69
BLAKE2b-256 dbf41a89ae1a5c0e3a6f3bac1f08a6465b713c163d94fe7d54e2320c83a2cd7e

See more details on using hashes here.

File details

Details for the file gossiphs-0.9.17-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for gossiphs-0.9.17-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9555f8af8f0e5c59442479abb8f9278256c9065730517f00ccbdda6f6ec475d0
MD5 ee9463598ce6d76118636b7c82f0a7de
BLAKE2b-256 9625ae47413ec85df8ca1bce1e03a0be4a17c99aa23be21dfdd1845d6ba8da1e

See more details on using hashes here.

File details

Details for the file gossiphs-0.9.17-cp38-abi3-manylinux_2_34_i686.whl.

File metadata

File hashes

Hashes for gossiphs-0.9.17-cp38-abi3-manylinux_2_34_i686.whl
Algorithm Hash digest
SHA256 5b76a9b259d3da3bd516bb4ae47b578ce4e19f1a7332bc0085192fcdba3a84bc
MD5 0b7969854f1e9ab22c869003f6ae9749
BLAKE2b-256 2d3d70298ca7bbdc5e3256150399cd6ef1c43ce3c79637a061a16de3968d7b0b

See more details on using hashes here.

File details

Details for the file gossiphs-0.9.17-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for gossiphs-0.9.17-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 14f5d7279d242134f1daab3838d988b2b98d0a184d71e227a9f794d08dfbab38
MD5 969b19dfb239c8402086eb7d7467d28f
BLAKE2b-256 34ed6774e4fa014909e216ef679bd14abf4e430703d6b642bce362a9abcbfd29

See more details on using hashes here.

File details

Details for the file gossiphs-0.9.17-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for gossiphs-0.9.17-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d07089c97c378bafd397c1e634820b88e1c88a731908ee399c6f036e1738d565
MD5 964304699f120b154dfb1fbb4ef7eaad
BLAKE2b-256 ad16c883b239b99b58b51f671ffba329c32e841665d89c923a24f7a766f5eaa9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page