Skip to main content

Core library for directory mapping and summarization

Project description

dirmapper-core

A directory mapping library that aids in visualization and directory structuring.

Features

  • Generate directory structures in various styles (tree, list, flat, etc.)
  • Apply custom formatting to directory structures (plain text, JSON, HTML, Markdown, etc.)
  • Ignore specific files and directories using patterns
  • Summarize directory contents using AI (local or API-based)

Installation

To install the library, use pip:

pip install dirmapper-core

Usage

Generating Directory Structure

You can generate a directory structure using the DirectoryStructureGenerator class. Here is an example:

# Example usage

from dirmapper_core.generator.directory_structure_generator import DirectoryStructureGenerator
from dirmapper_core.ignore.path_ignorer import PathIgnorer
from dirmapper_core.styles.tree_style import TreeStyle
from dirmapper_core.formatter.formatter import PlainTextFormatter

# Define ignore patterns
ignore_patterns = [
    SimpleIgnorePattern('.git/'),
    SimpleIgnorePattern('.github/'),
    SimpleIgnorePattern('__pycache__/')
]

# Initialize PathIgnorer
path_ignorer = PathIgnorer(ignore_patterns)

# Initialize DirectoryStructureGenerator
generator = DirectoryStructureGenerator(
    root_dir='./path/to/your/directory',
    output='output.txt',
    ignorer=path_ignorer,
    sorting_strategy=AscendingSortStrategy(case_sensitive=False),
    style=TreeStyle(),
    formatter=PlainTextFormatter()
)

# Generate the directory structure into the style specified when initializing DirectoryStructureGenerator
structure = generator.generate() # Returns str

Generating a directory structure results in a formatted string depending on your style and formatter. Here is an example of the TreeStyle:

/path/to/root/directory
├── .devcontainer
│   ├── devcontainer.json
│   └── Dockerfile
├── .DS_Store
├── .git
├── .gitignore
├── .mapping-ignore
├── game
│   ├── __init__.py
│   ├── colors.py
│   ├── config.py
│   ├── display.py
│   ├── fonts.py
│   ├── game_loop.py
│   ├── obstacles.py
│   ├── score.py
│   └── snake.py
├── high_scores.yaml
├── main.py
├── main.spec
├── README.md
└── requirements.txt

See the styles folder for all valid style examples.

You may also generate a raw DirectoryStructure object if you set the parameter styled to False in the generate() method. This allows access to the DirectoryStructure class helper functions and access to DirectoryItems

# ...
structure = generator.generate(styled=False) # Returns DirectoryStructure object

# Access file content lazily
for item in structure.to_list():
    if item.metadata.get('content') is not None:
        print(f"Content of {item.path}: {item.content}")

Creating Directory Structure from Template

You can create a directory structure from a template using the StructureWriter class. Here is an example:

from dirmapper_core.writer.structure_writer import StructureWriter

# Define the base path where the structure will be created
base_path = 'Path/To/Your/Project'

# Define the structure template
structure_template = {
    "meta": {
        "version": "2.0",
        "source": "dirmapper",
        "author": "root",
        "root_path": base_path,
        "creation_date": "2024-11-01T20:06:14.510200",
        "last_modified": "2024-11-01T20:06:14.510211"
    },
    "template": {
        "requirements.txt": {},
        "tests/": {
            "test1.py": {},
            "__init__.py": {}
        },
        "docs/": {},
        "README.md": {},
        "setup.py": {},
        ".gitignore": {},
        "src/": {
            "snake_game/": {
                "__init__.py": {},
                "utils/": {
                    "helper.py": {}
                },
                "main.py": {}
            }
        }
    }
}

# Initialize StructureWriter
writer = StructureWriter(base_path)

# Create the directory structure
writer.create_structure(structure_template)

# Write the structure to OS file system
writer.write_structure()

Writing Directory Structure to Template File

You can write the generated directory structure to a template file using the write_template function. Here is an example:

from dirmapper_core.writer.template_writer import write_template

# Define the path to the template file
template_path = 'path/to/your/template.json'

# Define the structure template
structure_template = {
    "meta": {
        "version": "2.0",
        "source": "dirmapper",
        "author": "root",
        "root_path": template_path,
        "creation_date": "2024-11-01T20:06:14.510200",
        "last_modified": "2024-11-01T20:06:14.510211"
    },
    "template": {
        "requirements.txt": {},
        "tests/": {
            "test1.py": {},
            "__init__.py": {}
        },
        "docs/": {},
        "README.md": {},
        "setup.py": {},
        ".gitignore": {},
        "src/": {
            "snake_game/": {
                "__init__.py": {},
                "utils/": {
                    "helper.py": {}
                },
                "main.py": {}
            }
        }
    }
}

write_template(template_path, structure_template)

You may also use a valid styled directory string (i.e. tree) as your structure_template in the example above to write a YAML or JSON template file.

Writing a Directory Structure Formatted String to Template

You can create a JSON template from a formatted directory structure string. Here is a valid example using the tree style:

import json
from dirmapper_core.writer.template_parser import TemplateParser

tp = TemplateParser()
template = """
/path/to/root/directory
├── .git/
├── .github/
│   └── workflows/
│       ├── check_version.yaml
│       ├── publish.yaml
│       └── update-homebrew-formula.yml
├── .gitignore
├── .pytest_cache/
├── CHANGELOG.md
├── docs/
├── LICENSE
├── Makefile
├── pyproject.toml
├── README.md
├── requirements.txt
├── src/
│   └── dirmapper/
│       ├── __init__.py
│       ├── ai/
│       │   └── summarizer.py
│       ├── api_client.py
│       ├── config.py
│       ├── config_manager.py
│       ├── data/
│       │   └── .mapping-ignore
│       ├── main.py
│       ├── reader/
│       │   ├── __init__.py
│       │   └── reader.py
│       ├── token_manager.py
│       └── writer/
│           ├── __init__.py
│           └── writer.py
└── tests/
    ├── __init__.py
    └── test_main.py
"""
parsed_template = tp.parse_from_directory_structure(template)
print(json.dumps(parsed_template, indent=4))

Other allowable styles include list, flat, and indentation.

Summarizing Directory Structure

You can summarize the directory structure using the DirectorySummarizer class. Here is an example:

from dirmapper_core.generator.directory_structure_generator import DirectoryStructureGenerator
from dirmapper_core.ai.summarizer import DirectorySummarizer
from dirmapper_core.models.directory_structure import DirectoryStructure
from dirmapper_core.models.directory_item import DirectoryItem
from dirmapper_core.styles.tree_style import TreeStyle

# Initialize DirectorySummarizer with configuration
config = {
    "use_local": False,
    "api_token": "your_openai_api_token",
    "summarize_file_content": True,  # Enable file content summarization
    "max_file_summary_words": 50     # Limit file summaries to 50 words
}

summarizer = DirectorySummarizer(config)

# Create a DirectoryStructure instance
root_dir = "path/to/root/project"
dsg = DirectoryStructureGenerator("path/to/root/project")
structure = dsg.generate(styled = False)

# Generate summaries
result = summarizer.summarize(structure)
print(json.dumps(result, indent=2))

# Nicely format your short summaries to the terminal
print(TreeStyle.write_structure_with_short_summaries(structure))

In the above example, you see that the result from the DirectorySummarizer is a dict for easy JSON convertability. If you want to view the directory in a tree style, you can use the write_structure_with_short_summaries method which takes a DirectoryStructure object.

For more details on the DirectorySummarizer class, see the summarizer.py file. You can also read the AI Summarizer documentation for more information.

Summarizing Files

You can summarize individual files using the FileSummarizer class. Here is an example.

from dirmapper_core.ai.summarizer import FileSummarizer

preferences = {
    "use_local": False,
    "api_token": "your_openai_api_token_here"
}

file_summarizer = FileSummarizer(preferences)
file_path = "/path/to/your/file.py"
summary = file_summarizer.summarize_file(file_path, max_words=150)
print("File Summary:")
print(summary)

Paginating Directory Structure

You can paginate a large directory structure into smaller chunks using the DirectoryPaginator class. Here is an example:

from dirmapper_core.models.directory_structure import DirectoryStructure
from dirmapper_core.utils.paginator import DirectoryPaginator

# Create a DirectoryStructure instance
structure = DirectoryStructure()

# Add items to the structure
# ... (add items to the structure)

# Initialize DirectoryPaginator
paginator = DirectoryPaginator(max_items_per_page=20)

# Paginate the directory structure
paginated_structures = paginator.paginate(structure)

# Process each paginated structure
for paginated_structure in paginated_structures:
    print(paginated_structure)

Configuration

Ignoring Files and Directories

You can specify files and directories to ignore using the .mapping-ignore file or by providing patterns directly to the PathIgnorer class.

Example .mapping-ignore file:

# Ignore .git directory
.git/
# Ignore all __pycache__ directories
regex:^.*__pycache__$
# Ignore all .pyc files
regex:^.*\.pyc$
# Ignore all .egg-info directories
regex:^.*\.egg-info$
# Ignore all dist directories
regex:^.*dist$

Custom Styles and Formatters

You can create custom styles and formatters by extending the BaseStyle and Formatter classes, respectively.

Appendix

Working with DirectoryStructure Class

All styles and many functions work with the DirectoryStructure class under the hood. This class is essentially an abstracted list of DirectoryItem objects. Below is a sample usage of initializing, adding items, and converting to a specialized dictionary object.

# Create a new DirectoryStructure instance
structure = DirectoryStructure()

# Add items to the structure
structure.add_item(DirectoryItem('/path/to/game', 0, 'game', {
    'type': 'directory',
    'content': None,
    'summary': None,
    'short_summary': None,
    'tags': []
}))

structure.add_item(DirectoryItem('/path/to/game/game_loop.py', 1, 'game_loop.py', {
    'type': 'file',
    'content': 'def main_loop():\n    while True:\n        update()\n        render()',
    'summary': None,
    'short_summary': None,
    'tags': []
}))

structure.add_item(DirectoryItem('/path/to/game/snake.py', 1, 'snake.py', {
    'type': 'file',
    'content': 'class Snake:\n    def __init__(self):\n        self.length = 1',
    'summary': None,
    'short_summary': None,
    'tags': []
}))

    # Convert to nested dictionary
    nested_dict = structure.to_nested_dict(use_json_style=False)

    # Print the result in a readable format
    print(json.dumps(nested_dict, indent=2))

The above code will generate the following output to the console:

{
  "path": {
    "to": {
      "game": {
        "__keys__": {
          "type": "directory",
          "content": null,
          "summary": null,
          "short_summary": null,
          "tags": []
        },
        "game_loop.py": {
          "__keys__": {
            "type": "file",
            "content": "def main_loop():\n    while True:\n        update()\n        render()",
            "summary": null,
            "short_summary": null,
            "tags": []
          }
        },
        "snake.py": {
          "__keys__": {
            "type": "file",
            "content": "class Snake:\n    def __init__(self):\n        self.length = 1",
            "summary": null,
            "short_summary": null,
            "tags": []
          }
        }
      }
    }
  }
}

If you chose to set the parameter use_json_style in the method to_nested_dict to True, the output would be a JSON-style dict from the JSONStyle Class.

Working with DirectoryItem Class

The most basic element of a directory structure is an item represented by the DirectoryItem class. This class is an abstracted object representing a file or directory. Each object holds valuable metadata about the underlying item, including summaries of the contents which can be generated with AI.

    DirectoryItem(
        path='/path/to/project/README.md',
        level=1,
        name='README.md',
        metadata={
            'type': 'file',
            'content': '# Project Documentation',
            'content_hash': 'a1b2c3d4e5f6g7h8i9j0',
            'summary': None,
            'short_summary': None,
            'tags': []
        }
    )

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page