Skip to main content

Tool for detecting sensitive data leaks in files and cloud storage

Project description

Data Leak Inspector

Find sensitive data. Fix risky permissions.

Data Leak Inspector is a CLI tool that helps you identify potentially exposed files in your storage systems โ€” starting with Google Drive.

Instead of scanning file contents, DLI analyzes metadata and permissions to quickly highlight files that may be publicly accessible or shared.

๐Ÿš€ Features (v0.1)

  • ๐Ÿ” Metadata-based scanning (no file content access)
  • โ˜๏ธ Google Drive integration
  • ๐Ÿ“‚ Scans all files (including nested ones)
  • ๐Ÿ” Exposure detection based on permissions
    • Public (anyone with link)
    • Shared (users, groups, domain)
    • Private
  • ๐Ÿ’ฌ Human-readable explanations
  • ๐Ÿ“Š Clean CLI output with summaries
  • โšก Progress bar during scanning
  • ๐Ÿงช Demo dataset for quick testing

๐Ÿง  How It Works

DLI does not read file contents.

Instead, it analyzes:

  • File permissions (Google Drive API)
  • Sharing settings
  • Basic metadata (name, type, timestamps)

This allows:

โœ” Faster scans
โœ” Lower permissions required
โœ” Easier approval for Google APIs
โœ” Better privacy guarantees

๐Ÿ“ธ Example Output

Scanning 12/120: payroll.xlsx โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 100%

SCAN RESULTS (BASIC)

[PUBLIC ] payroll_2024.xlsx
          โ†’ anyone with link (reader)

[SHARED ] team_notes.docx
          โ†’ shared with 3 user(s)

[PRIVATE] personal.txt
          โ†’ only accessible by owner

Summary:
  Total files: 120
  Public: 5
  Shared: 18
  Private: 97

โš™๏ธ Installation

git clone https://github.com/yourusername/data-leak-inspector.git
cd data-leak-inspector


pip install -e .

๐Ÿงช Run with Demo Data

dli scan --demo

โ˜๏ธ Google Drive Setup

1. Create credentials

  • Go to Google Cloud Console
  • Enable Google Drive API
  • Create OAuth credentials (Desktop app)
  • Download credentials.json

2. Place credentials

Create the directory:

~/Documents/dli/

Add:

credentials.json

3. Run scan

dli scan --gdrive

On first run:

  • Browser will open for authentication
  • A token.json file will be created

๐Ÿงพ CLI Usage

dli scan [OPTIONS]

Options

Option Description
--demo Use built-in demo dataset
--gdrive Scan Google Drive
--verbose Show debug logs
--quiet Show only errors
--report Export results to JSON

๐Ÿ“ Project Structure

leak_inspector/
โ”œโ”€โ”€ application/
โ”‚   โ”œโ”€โ”€ scanner.py
โ”‚   โ”œโ”€โ”€ risk_evaluator.py
โ”‚   โ””โ”€โ”€ ports/
โ”œโ”€โ”€ domain/
โ”‚   โ”œโ”€โ”€ models.py
โ”‚   โ”œโ”€โ”€ enums.py
โ”‚   โ””โ”€โ”€ reporting.py
โ”œโ”€โ”€ infrastructure/
โ”‚   โ”œโ”€โ”€ storage/
โ”‚   โ””โ”€โ”€ gdrive/
โ”œโ”€โ”€ interfaces/
โ”‚   โ””โ”€โ”€ cli/

๐Ÿ” Exposure Levels

Level Description
PUBLIC Accessible by anyone with link
SHARED Shared with specific users/groups
PRIVATE Only accessible by owner

โš ๏ธ Limitations (v0.1)

  • โŒ No content scanning (PII detection)
  • โŒ No Google Docs content parsing
  • โŒ Heuristic-based risk (metadata only)

๐Ÿ›ฃ Roadmap

v0.1 (current)

  • Metadata scanning
  • Google Drive integration
  • Exposure detection

๐Ÿง  Philosophy

DLI is designed to:

  • โœ” Minimize permissions
  • โœ” Respect user privacy
  • โœ” Deliver fast insights
  • โœ” Be transparent in analysis

๐Ÿค Contributing

Contributions are welcome.

  1. Fork the repo
  2. Create a branch
  3. Submit a PR

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_leak_inspector-0.1.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_leak_inspector-0.1.0-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file data_leak_inspector-0.1.0.tar.gz.

File metadata

  • Download URL: data_leak_inspector-0.1.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for data_leak_inspector-0.1.0.tar.gz
Algorithm Hash digest
SHA256 759f41b53fc41354a96bcdb8b658e559416138151cc1bf4b42ef7a552a10f7a0
MD5 62687cb7e3682cdc79ce9e75436edb78
BLAKE2b-256 4c1ec67566505c5c297159a2d7ad5b098fd946223c3e5d3c6c4aa388103dcd13

See more details on using hashes here.

File details

Details for the file data_leak_inspector-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_leak_inspector-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c55db341a9159089c575419978160c535f8c793af1649c4656aadfb89dc23e14
MD5 8c93326e28247e6485ef5e75102e6078
BLAKE2b-256 05393025e2f512b2fcaf8e0d352e0b67512a842ab0232dc639b289f85a384843

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page