Scan your filesystem to look for files that are a potential GDPR risk
Project description
DriveScanner
DriveScanner is a Python library created by The Analytics Lab, which is powered by Cmotions. This library aims to help you with identifying files on your filesystem that could be a potential GDPR threat. To do this, the file contents are scanned, looking for specific information like IBAN, social security numbers (Dutch: BSN), telephone numbers, email addresses, credit card numbers and more.
Installation
Install DriveScanner using pip
pip install drivescanner
Usage
import drivescanner
# set the location of the files you want to scan
# all files in all subdirectories will also be taken into account
file_path = "C:/MyFiles"
file_list = drivescanner.list_all_files(file_path)
# create an overview of all the filetypes on our example drive
drivescanner.extension_stats(file_list)
# if we want we can include/exclude certain extensions
file_list = drivescanner.select_files(file_list, include=["xlsx", "xls", "docx", "doc", "pdf", "ppt", "pptx"], exclude=None)
# now we are ready to scan all the files in the list
resultdict = drivescanner.scan_drive(file_list)
# and calculate the risk score for all scanned files
# there might be some files which gave problems and are not scanned
# your retrieve those in a separate dataframe
df_result, df_not_processed = drivescanner.calculate_severity(resultdict)
# that's it, now you can use and inspect the result any way you like
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Please make sure to update tests as appropriate.
License
GNU General Public License v3.0
Contributors
Jeanine Schoonemann, Rick Flamand, Sem Frankenberg, Wim Verboom and Wouter van Gils
Contact us
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file drivescanner-0.0.4.tar.gz
.
File metadata
- Download URL: drivescanner-0.0.4.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee4734ea666af6f5bb0ad01839d4882c2c269bd9c4c62b0f95e98ddc0d9c0027 |
|
MD5 | b929ddf47ab1b331ca77dab577870b9d |
|
BLAKE2b-256 | 8cd5b8d2fbe0cd8a3f635ff8c562211bb44a5339065c258223c492f67d07c609 |
File details
Details for the file drivescanner-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: drivescanner-0.0.4-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd04e3aeda592fb4af12876254aee962358c03179fd30dd168d337312764fd63 |
|
MD5 | 7d842a9c0ced9c7974fc338a85be994b |
|
BLAKE2b-256 | 5de9c941d0dcf07fa7c86d6cf77a853527852d1abb506576fe0dca7c917c1180 |