Guesses the file type/mime/encoding of files. It uses the binaries from File and Libmagic.
Project description
Guesses the file type/mime/encoding of files
pip install get-file-type
Guesses the file type/mime/encoding of files. It uses the binaries from https://github.com/julian-r/file-windows/releases
(File and Libmagic build with Visual Studio) - They are included in this package
Args:
files_folders (list or str): A list of file/folder paths or a single file/folder path.
maxsubfolders (int, optional): Maximum number of subfolders to scan. Default is -1, which means no limit.
pandas_dataframe (bool, optional): Determines if the results should be returned as a pandas DataFrame.
Requires pandas to be installed. Default is False.
verbose (bool, optional): Determines if verbose output should be displayed. Default is True.
Returns:
list or pd.DataFrame: A list of file type information for each file or a pandas DataFrame if pandas_dataframe is True.
Raises:
Exception: If pandas is not installed and pandas_dataframe is set to True.
Example:
from get_file_type import guess_filetypes
result_list = guess_filetypes(
files_folders=[
r"C:\Users\hansc\Pictures\fastcpy", # png file without ending
r"C:\Users\hansc\Pictures\fastcpy - Copy.png", # an actual png file, to check if files with the correct ending are ignored
r"C:\Users\hansc\Pictures\cppcomp.jpg", # a .txt file with the wrong ending
r"E:\destinationcopytemp5", # internet cache files - whole folder will be scanned
],
maxsubfolders=-1, # if you want to limit the number of subfolders to scan, -1 means no limit
pandas_dataframe=False, # return the results as a pd.DataFrame (pandas must be installed)
verbose=True, # visual output
)
result_df = guess_filetypes(
files_folders=[
r"C:\Users\hansc\Pictures\fastcpy", # png file without ending
r"C:\Users\hansc\Pictures\fastcpy - Copy.png", # an actual png file, to check if files with the correct ending are ignored
r"C:\Users\hansc\Pictures\cppcomp.jpg", # a .txt file with the wrong ending
r"E:\destinationcopytemp5", # internet cache files - whole folder will be scanned
],
maxsubfolders=-1,
pandas_dataframe=True,
verbose=True,)
output:
[[['C:\\Users\\hansc\\Pictures\\fastcpy', 'image/png', 'charset=binary', ('png',), ('C:\\Users\\hansc\\Pictures\\fastcpy.png',)]]]
[[['C:\\Users\\hansc\\Pictures\\fastcpy - Copy.png', 'image/png', 'charset=binary', ('png',), ('C:\\Users\\hansc\\Pictures\\fastcpy - Copy.png',)]]]
[[['C:\\Users\\hansc\\Pictures\\cppcomp.jpg', 'text/plain', 'charset=us-ascii', ('conf', 'def', 'in', 'ini', 'list', 'log', 'text', 'txt'), ('C:\\Users\\hansc\\Pictures\\cppcomp.jpg.conf', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.def', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.in', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.ini', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.list', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.log', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.text', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.txt')]]]
[[['E:\\destinationcopytemp5\\00000000__2023_05_13_22_33_46\\Users\\hansc\\AppData\\Local\\Temp\\RBX-64BA6ED6.log', 'text/plain', 'charset=us-ascii', ('conf', 'def', 'in', 'ini', 'list', 'log', 'text', 'txt'), ('E:\\destinationcopytemp5\\00000000__2023_05_13_22_33_46\\Users\\hansc\\AppData\\Local\\Temp\\RBX-64BA6ED6.log',)]]]
[[['E:\\destinationcopytemp5\\00000000__2023_05_13_22_33_46\\Users\\hansc\\AppData\\Local\\Temp\\RBX-DF39BC9A.log', 'text/plain', 'charset=us-ascii', ('conf', 'def', 'in', 'ini', 'list', 'log', 'text', 'txt'), ('E:\\destinationcopytemp5\\00000000__2023_05_13_22_33_46\\Users\\hansc\\AppData\\Local\\Temp\\RBX-DF39BC9A.log',)]]]
aa_filename aa_mime aa_encoding aa_possible_extensions aa_possible_filenames
0 C:\Users\hansc\Pictures\fastcpy image/png charset=binary (png,) (C:\Users\hansc\Pictures\fastcpy.png,)
1 C:\Users\hansc\Pictures\fastcpy - Copy.png image/png charset=binary (png,) (C:\Users\hansc\Pictures\fastcpy - Copy.png,)
2 C:\Users\hansc\Pictures\cppcomp.jpg text/plain charset=us-ascii (conf, def, in, ini, list, log, text, txt) (C:\Users\hansc\Pictures\cppcomp.jpg.conf, C:\Users\hansc\Pictures\cppcomp.jpg.def, C:\Users\hansc\Pictures\cppcomp.jpg.in, C:\Users\hansc\Pictures\cppcomp.jpg.ini, C:\Users\hansc\Pictures\cppcomp.jpg.list, C:\Users\hansc\Pictures\cppcomp.jpg.log, C:\Users\hansc\Pictures\cppcomp.jpg.text, C:\Users\hansc\Pictures\cppcomp.jpg.txt)
3 E:\destinationcopytemp5\00000000__2023_05_13_22_33_46\Users\hansc\AppData\Local\Temp\RBX-64BA6ED6.log text/plain charset=us-ascii (conf, def, in, ini, list, log, text, txt) (E:\destinationcopytemp5\00000000__2023_05_13_22_33_46\Users\hansc\AppData\Local\Temp\RBX-64BA6ED6.log,)
4 E:\destinationcopytemp5\00000000__2023_05_13_22_33_46\Users\hansc\AppData\Local\Temp\RBX-DF39BC9A.log text/plain charset=us-ascii (conf, def, in, ini, list, log, text, txt) (E:\destinationcopytemp5\00000000__2023_05_13_22_33_46\Users\hansc\AppData\Local\Temp\RBX-DF39BC9A.log,)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
get_file_type-0.10.tar.gz
(577.5 kB
view details)
Built Distribution
get_file_type-0.10-py3-none-any.whl
(616.8 kB
view details)
File details
Details for the file get_file_type-0.10.tar.gz
.
File metadata
- Download URL: get_file_type-0.10.tar.gz
- Upload date:
- Size: 577.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f77b21b0bf9f4a2f19092c1f78e2022d3ebfcfbf577474990e2cd111cd7af763 |
|
MD5 | b8567f842dead695083e5a8b54040e0d |
|
BLAKE2b-256 | ef8bf7158c12a7435e2bbd1b892fbc256abe45cdb76aba7f3983775284e3a198 |
File details
Details for the file get_file_type-0.10-py3-none-any.whl
.
File metadata
- Download URL: get_file_type-0.10-py3-none-any.whl
- Upload date:
- Size: 616.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc33ee2953c5103e88e6100499021793596b54cd6b5f3d17235af80d074d713f |
|
MD5 | f9761ee29ef866a2a6ff8644b387456d |
|
BLAKE2b-256 | cd2713062797d98d44fc3bf2413f3381a8dd51d44cc4a65cd6de8c97d93e57cf |