Skip to main content

File Crawler index files and search credentials.

Project description

Knows More

Build Build Downloads Supported Versions Contributors PyPI version License: GPL-3.0

FileCrawler officially supports Python 3.7+.

Main features

  • List all file contents
  • Index file contents at Elasticsearch
  • Do OCR at several file types (with tika lib)
  • Look for hard-coded credentials
  • Much more...

Parsers:

  • PDF files
  • Microsoft Office files (Word, Excel etc)
  • X509 Certificate files
  • Image files (Jpg, Png, Gif etc)
  • Java packages (Jar and war)
  • Disassembly APK Files with APKTool
  • Compressed files (zip, tar, gzip etc)

Extractors:

  • AWS credentials
  • Github and gitlab credentials

Installing

pip install -U filecrawler

Running

Config file

Create a sample config file with default parameters

filecrawler --create-config -v

Edit the configuration file config.yml with your desired parameters

Note: You must adjust the Elasticsearch URL parameter before continue

Run

filecrawler --index-name filecrawler --path /mnt/client_files --crawler --elastic -T 30 -v

Help

$ filecrawler -h

File Crawler v0.1.1 by Helvio Junior
File Crawler index files and search credentials.
https://github.com/helviojunior/filecrawler
    
usage: 
    filecrawler module [flags]

Available Modules:
  --crawler                  Crawler folder and files

Global Flags:
  --index-name [index name]  Crawler name
  --path [folder path]       Folder path to be indexed
  --config [config file]     Configuration file. (default: ./fileindex.yml)
  --db [sqlite file]         Filename to save status of indexed files. (default: ~/.filecrawler/{index_name}/indexer.db)
  -T [tasks]                 number of connects in parallel (per host, default: 16)
  --create-config            Create config sample
  --clear-session            Clear old file status and reindex all files
  -h, --help                 show help message and exit
  -v                         Specify verbosity level (default: 0). Example: -v, -vv, -vvv

Use "filecrawler [module] --help" for more information about a command.

Credits

This project was inspired of:

  1. FSCrawler
  2. Gitleaks

Note: Some part of codes was ported from this 2 projects

To do

Check the TODO file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FileCrawler-0.1.1.tar.gz (23.4 MB view details)

Uploaded Source

File details

Details for the file FileCrawler-0.1.1.tar.gz.

File metadata

  • Download URL: FileCrawler-0.1.1.tar.gz
  • Upload date:
  • Size: 23.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for FileCrawler-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f64dd3a3e9734e69977b7a150324aef4a3da37970124388a9e8baf29c33252e6
MD5 a5fa0c57b09b9a2a2ad035044cea4bef
BLAKE2b-256 e79052bde485007e0b5dca0619c7338997978ef71a75639cb3a745051b7feaf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page