Skip to main content

Guesses the file type/mime/encoding of files. It uses the binaries from File and Libmagic.

Project description

Guesses the file type/mime/encoding of files

pip install get-file-type

    Guesses the file type/mime/encoding of files. It uses the binaries from https://github.com/julian-r/file-windows/releases
    (File and Libmagic build with Visual Studio) - They are included in this package


    Args:
        files_folders (list or str): A list of file/folder paths or a single file/folder path.
        maxsubfolders (int, optional): Maximum number of subfolders to scan. Default is -1, which means no limit.
        pandas_dataframe (bool, optional): Determines if the results should be returned as a pandas DataFrame.
                                           Requires pandas to be installed. Default is False.
        verbose (bool, optional): Determines if verbose output should be displayed. Default is True.

    Returns:
        list or pd.DataFrame: A list of file type information for each file or a pandas DataFrame if pandas_dataframe is True.

    Raises:
        Exception: If pandas is not installed and pandas_dataframe is set to True.

    Example:
from get_file_type import guess_filetypes	
result_list = guess_filetypes(
    files_folders=[
        r"C:\Users\hansc\Pictures\fastcpy",  # png file without ending
        r"C:\Users\hansc\Pictures\fastcpy - Copy.png",  # an actual png file, to check if files with the correct ending are ignored
        r"C:\Users\hansc\Pictures\cppcomp.jpg",  # a .txt file with the wrong ending
        r"E:\destinationcopytemp5",  # internet cache files - whole folder will be scanned
    ],
    maxsubfolders=-1,  # if you want to limit the number of subfolders to scan, -1 means no limit
    pandas_dataframe=False,  # return the results as a pd.DataFrame (pandas must be installed)
    verbose=True,  # visual output
)

result_df = guess_filetypes(
    files_folders=[
        r"C:\Users\hansc\Pictures\fastcpy",  # png file without ending
        r"C:\Users\hansc\Pictures\fastcpy - Copy.png",  # an actual png file, to check if files with the correct ending are ignored
        r"C:\Users\hansc\Pictures\cppcomp.jpg",  # a .txt file with the wrong ending
        r"E:\destinationcopytemp5",  # internet cache files - whole folder will be scanned
    ],
    maxsubfolders=-1,
    pandas_dataframe=True,
    verbose=True,)
    output:
    [[['C:\\Users\\hansc\\Pictures\\fastcpy', 'image/png', 'charset=binary', ('png',), ('C:\\Users\\hansc\\Pictures\\fastcpy.png',)]]]
    [[['C:\\Users\\hansc\\Pictures\\fastcpy - Copy.png', 'image/png', 'charset=binary', ('png',), ('C:\\Users\\hansc\\Pictures\\fastcpy - Copy.png',)]]]
    [[['C:\\Users\\hansc\\Pictures\\cppcomp.jpg', 'text/plain', 'charset=us-ascii', ('conf', 'def', 'in', 'ini', 'list', 'log', 'text', 'txt'), ('C:\\Users\\hansc\\Pictures\\cppcomp.jpg.conf', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.def', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.in', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.ini', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.list', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.log', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.text', 'C:\\Users\\hansc\\Pictures\\cppcomp.jpg.txt')]]]
    [[['E:\\destinationcopytemp5\\00000000__2023_05_13_22_33_46\\Users\\hansc\\AppData\\Local\\Temp\\RBX-64BA6ED6.log', 'text/plain', 'charset=us-ascii', ('conf', 'def', 'in', 'ini', 'list', 'log', 'text', 'txt'), ('E:\\destinationcopytemp5\\00000000__2023_05_13_22_33_46\\Users\\hansc\\AppData\\Local\\Temp\\RBX-64BA6ED6.log',)]]]
    [[['E:\\destinationcopytemp5\\00000000__2023_05_13_22_33_46\\Users\\hansc\\AppData\\Local\\Temp\\RBX-DF39BC9A.log', 'text/plain', 'charset=us-ascii', ('conf', 'def', 'in', 'ini', 'list', 'log', 'text', 'txt'), ('E:\\destinationcopytemp5\\00000000__2023_05_13_22_33_46\\Users\\hansc\\AppData\\Local\\Temp\\RBX-DF39BC9A.log',)]]]


                                                                                                 aa_filename     aa_mime       aa_encoding                      aa_possible_extensions                                                                                                                                                                                                                                                                                                                       aa_possible_filenames
    0                                                                        C:\Users\hansc\Pictures\fastcpy   image/png    charset=binary                                      (png,)                                                                                                                                                                                                                                                                                                      (C:\Users\hansc\Pictures\fastcpy.png,)
    1                                                             C:\Users\hansc\Pictures\fastcpy - Copy.png   image/png    charset=binary                                      (png,)                                                                                                                                                                                                                                                                                               (C:\Users\hansc\Pictures\fastcpy - Copy.png,)
    2                                                                    C:\Users\hansc\Pictures\cppcomp.jpg  text/plain  charset=us-ascii  (conf, def, in, ini, list, log, text, txt)  (C:\Users\hansc\Pictures\cppcomp.jpg.conf, C:\Users\hansc\Pictures\cppcomp.jpg.def, C:\Users\hansc\Pictures\cppcomp.jpg.in, C:\Users\hansc\Pictures\cppcomp.jpg.ini, C:\Users\hansc\Pictures\cppcomp.jpg.list, C:\Users\hansc\Pictures\cppcomp.jpg.log, C:\Users\hansc\Pictures\cppcomp.jpg.text, C:\Users\hansc\Pictures\cppcomp.jpg.txt)
    3  E:\destinationcopytemp5\00000000__2023_05_13_22_33_46\Users\hansc\AppData\Local\Temp\RBX-64BA6ED6.log  text/plain  charset=us-ascii  (conf, def, in, ini, list, log, text, txt)                                                                                                                                                                                                                                    (E:\destinationcopytemp5\00000000__2023_05_13_22_33_46\Users\hansc\AppData\Local\Temp\RBX-64BA6ED6.log,)
    4  E:\destinationcopytemp5\00000000__2023_05_13_22_33_46\Users\hansc\AppData\Local\Temp\RBX-DF39BC9A.log  text/plain  charset=us-ascii  (conf, def, in, ini, list, log, text, txt)                                                                                                                                                                                                                                    (E:\destinationcopytemp5\00000000__2023_05_13_22_33_46\Users\hansc\AppData\Local\Temp\RBX-DF39BC9A.log,)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

get_file_type-0.10.tar.gz (577.5 kB view details)

Uploaded Source

Built Distribution

get_file_type-0.10-py3-none-any.whl (616.8 kB view details)

Uploaded Python 3

File details

Details for the file get_file_type-0.10.tar.gz.

File metadata

  • Download URL: get_file_type-0.10.tar.gz
  • Upload date:
  • Size: 577.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for get_file_type-0.10.tar.gz
Algorithm Hash digest
SHA256 f77b21b0bf9f4a2f19092c1f78e2022d3ebfcfbf577474990e2cd111cd7af763
MD5 b8567f842dead695083e5a8b54040e0d
BLAKE2b-256 ef8bf7158c12a7435e2bbd1b892fbc256abe45cdb76aba7f3983775284e3a198

See more details on using hashes here.

File details

Details for the file get_file_type-0.10-py3-none-any.whl.

File metadata

  • Download URL: get_file_type-0.10-py3-none-any.whl
  • Upload date:
  • Size: 616.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for get_file_type-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 fc33ee2953c5103e88e6100499021793596b54cd6b5f3d17235af80d074d713f
MD5 f9761ee29ef866a2a6ff8644b387456d
BLAKE2b-256 cd2713062797d98d44fc3bf2413f3381a8dd51d44cc4a65cd6de8c97d93e57cf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page