Skip to main content

Scan your filesystem to look for files that are a potential GDPR risk

Project description

DriveScanner

DriveScanner is a Python library created by The Analytics Lab, which is powered by Cmotions. This library aims to help you with identifying files on your filesystem that could be a potential GDPR threat. To do this, the file contents are scanned, looking for specific information like IBAN, social security numbers (Dutch: BSN), telephone numbers, email addresses, credit card numbers and more.

Installation

Install DriveScanner using pip

pip install drivescanner

Usage

import drivescanner

# set the location of the files you want to scan
# all files in all subdirectories will also be taken into account
file_path = "C:/MyFiles"
file_list = drivescanner.list_all_files(file_path)

# create an overview of all the filetypes on our example drive
drivescanner.extension_stats(file_list)

# if we want we can include/exclude certain extensions
file_list = drivescanner.select_files(file_list, include=["xlsx", "xls", "docx", "doc", "pdf", "ppt", "pptx"], exclude=None)

# now we are ready to scan all the files in the list
resultdict = drivescanner.scan_drive(file_list)

# and calculate the risk score for all scanned files
# there might be some files which gave problems and are not scanned
# your retrieve those in a separate dataframe
df_result, df_not_processed = drivescanner.calculate_severity(resultdict)

# that's it, now you can use and inspect the result any way you like

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Please make sure to update tests as appropriate.

License

GNU General Public License v3.0

Contributors

Jeanine Schoonemann, Rick Flamand, Sem Frankenberg, Wim Verboom and Wouter van Gils
Contact us

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drivescanner-0.0.4.tar.gz (38.4 kB view hashes)

Uploaded Source

Built Distribution

drivescanner-0.0.4-py3-none-any.whl (26.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page