Skip to main content

Scan your filesystem to look for files that are a potential GDPR risk

Project description

DriveScanner

DriveScanner is a Python library created by The Analytics Lab, which is powered by Cmotions. This library aims to help you with identifying files on your filesystem that could be a potential GDPR threat. To do this, the file contents are scanned, looking for specific information like IBAN, social security numbers (Dutch: BSN), telephone numbers, email addresses, credit card numbers and more.

Installation

Install DriveScanner using pip

pip install drivescanner

Usage

import drivescanner

# set the location of the files you want to scan
# all files in all subdirectories will also be taken into account
file_path = "C:/MyFiles"
file_list = drivescanner.list_all_files(file_path)

# create an overview of all the filetypes on our example drive
drivescanner.extension_stats(file_list)

# if we want we can include/exclude certain extensions
file_list = drivescanner.select_files(file_list, include=["xlsx", "xls", "docx", "doc", "pdf", "ppt", "pptx"], exclude=None)

# now we are ready to scan all the files in the list
resultdict = drivescanner.scan_drive(file_list)

# and calculate the risk score for all scanned files
# there might be some files which gave problems and are not scanned
# your retrieve those in a separate dataframe
df_result, df_not_processed = drivescanner.calculate_severity(resultdict)

# that's it, now you can use and inspect the result any way you like

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Please make sure to update tests as appropriate.

License

GNU General Public License v3.0

Contributors

Jeanine Schoonemann, Rick Flamand, Sem Frankenberg, Wim Verboom and Wouter van Gils
Contact us

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drivescanner-0.0.4.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

drivescanner-0.0.4-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file drivescanner-0.0.4.tar.gz.

File metadata

  • Download URL: drivescanner-0.0.4.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for drivescanner-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ee4734ea666af6f5bb0ad01839d4882c2c269bd9c4c62b0f95e98ddc0d9c0027
MD5 b929ddf47ab1b331ca77dab577870b9d
BLAKE2b-256 8cd5b8d2fbe0cd8a3f635ff8c562211bb44a5339065c258223c492f67d07c609

See more details on using hashes here.

File details

Details for the file drivescanner-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: drivescanner-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for drivescanner-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bd04e3aeda592fb4af12876254aee962358c03179fd30dd168d337312764fd63
MD5 7d842a9c0ced9c7974fc338a85be994b
BLAKE2b-256 5de9c941d0dcf07fa7c86d6cf77a853527852d1abb506576fe0dca7c917c1180

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page