Skip to main content

A file integrity verifier based on the format of the file.

Project description

integv

pypi versions python versions codecov unittest

integv is a file integrity verifier based on the format of the file. It's capable of checking the integrity of multiple types of files without any additional information like Content-Length or checksum. The main goal of integv is to detect file corruption (mostly shortened) during file download caused by network glitch. But integv still can be used for many other purposes as well.

Installation

pip install integv

Why integv

Sometimes when you download some media files using requests, a network glitch happens and your file downloaded is corrupted. If there's a Content-Length header, you can compare it to the downloaded file size. But the worst thing is most of the time, media files are served using HTTP chunked transfer encoding, and there's no Content-Length header. So you don't know if the download file is good or not. And that's the time integv comes to help, just feed the downloaded file to integv and it can verify the integrity of the file with zero other information like Content-Length. All integv needs are the type of the file.

integv has many advantages.

  1. integv is light, integv is written in pure python with 0 dependencies. Which makes integv portable and easy to integrate into your project.

  2. integv is fast, integv does not try to decode the file, it just checks all the key points in the file, so integv is much faster than other solutions that try to decode the file.

Here's a comparison of verifying a 70 MB mp4 file using integv and FFmpeg. integv only takes about 60 microseconds, FFmpeg takes about 10 seconds.

python3 -m timeit "import integv;integv.FileIntegrityVerifier().verify('../test.mp4')"
5000 loops, best of 5: 61.4 usec per loop

python3 -m timeit "import subprocess;subprocess.run('ffmpeg -v error -i ../test.mp4 -f null -', shell=True)"
1 loop, best of 5: 11.2 sec per loop

Quick Start

import integv

# load a test mp4 file
file_path = "./test/sample/video/sample.mp4"
with open(file_path, "rb") as f:
   file = f.read()

# verify using the file and file_type
# file_type can be a simple filename extension like "mp4" or "jpg"
# or you can provide a full MIME type like "video/mp4" or "image/jpeg"
integv.verify(file, file_type="mp4") # True

# a corrupted file (in this case, shortened by one byte) will not pass the verification
integv.verify(file[:-1], file_type="mp4") # False

# the file input for the verifier can be bytes or a binary file like object
integv.verify(open(file_path, "rb"), file_type="mp4") # True

# it can also be a string representing a file path
# if the file path contains a proper filename extension, the file_type is not needed.
integv.verify(file_path) # True

Supported types

Video

  • mp4: video/mp4
  • mkv: video/x-matroska
  • webm: video/webm
  • avi: video/vnd.avi
  • flv*: video/x-flv

* not f4v. Basically, f4v is just mp4 with a different name. For f4v files, use mp4 integrity verifier.

Image

  • jpeg: image/jpeg
  • png: image/png
  • gif: image/gif
  • webp image/webp

Audio

  • wav: audio/x-wav
  • ogg: audio/ogg

Limitation of integv

The integv verifier only checks the file by the format information embedded in file like file size in header, chunk size in chunk header, end of file markers, etc. It does not try to decode the file which makes integv fast and simple. But that also means the possibility of false negative (corrupted files can't be detected). The baseline of all integv file integrity verifiers must be extremely sensitive to shortened files, which is very common in file downloaded from the network. Some types of files like png contain checksum inside, which is less error-prone. By all means, do not use integv for any kind of security verification. As a bad file which passes the verification can be simply forged.

Effectiveness of integv on different types of corruption

Types of corruption:

  • Small Deletion at the End of the file. (SDE)

A few bytes of data were deleted at the end of the file. The length of the file is reduced.

Original file:  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXY
  • Large Deletion at the End of the file. (LDE)

A large chunk of data was deleted at the end of the file. The length of the file is reduced.

Original file:  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNO
  • Small Substitution at the End of the file. (SSE)

A few bytes of data were substituted at the end of file. The length of the file remains the same.

Original file:  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYA
  • Large Substitution at the End of the file. (LSE)

A large chunk of data was substituted at the end of file. The length of the file remains the same.

Original file:  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNAAAAAAAAAAAA
  • Small Deletion at a Random position of the file. (SDR)

A few bytes of data were deleted at a random position of the file. The length of the file is reduced.

Original file:  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLNOPQRSTUVWXYZ
                                                      ^
  • Large Deletion at a Random position of the file. (LDR)

A large chunk of data was deleted at a random position of the file. The length of the file is reduced.

Original file:  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLYZ
                                                      ^
  • Small Substitution at a Random position of the file. (SSR)

A few bytes of data were substituted at a random position of the file. The length of the file remains the same.

Original file:  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLANOPQRSTUVWXYZ
                                                      ^
  • Large Substitution at a Random position of the file. (LSR)

A large chunk of data wass substituted at a random position of the file. The length of the file remains the same.

Original file:  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLAAAAAAAAAAWXYZ
                                                      ^

Effectiveness Table

From my personal experience, the most common types of corruption happen during file downloading using requests or similar things are SDE and LDE.

SDE LDE SSE LSE SDR LDR SSR LSR
mp4 :smiley: :smiley: :frowning: :smiley: :smiley: :smiley: :frowning: :smiley:
mkv :smiley: :smiley: :frowning: :smiley: :smiley: :smiley: :frowning: :smiley:
webm :smiley: :smiley: :frowning: :smiley: :smiley: :smiley: :frowning: :smiley:
avi :smiley: :smiley: :frowning: :frowning: :smiley: :smiley: :frowning: :frowning:
flv :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :frowning: :smiley:
jpeg :smiley: :smiley: :smiley: :smiley: :frowning: :frowning: :frowning: :frowning:
png :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley:
gif :smiley: :smiley: :smiley: :smiley: :frowning: :frowning: :frowning: :frowning:
webp :smiley: :smiley: :frowning: :frowning: :smiley: :smiley: :frowning: :frowning:
wav :smiley: :smiley: :frowning: :frowning: :smiley: :smiley: :frowning: :frowning:
ogg :smiley: :smiley: :frowning: :smiley: :smiley: :smiley: :frowning: :smiley:
ogg(slow) :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley:

Advanced Usage

Using a FileIntegrityVerifier object

You can use a FileIntegrityVerifier object to verify your file just like integv.verify.

from integv import FileIntegrityVerifier

verifier = FileIntegrityVerifier()
verifier.verify("./test/sample/video/sample.mp4") # True

Specialized File Integrity Verifier

There are some specialized file integrity verifier for different types of files. You can find them in integv.video, integv.image and integv.audio. They are used exactly like the FileIntegrityVerifier except file_type are not needed.

from integv.video import MP4IntegrityVerifier

verifier = MP4IntegrityVerifier()
verifier.verify("./test/sample/video/sample.mp4") # True

Optional slow argument in verifier initialization

A boolean argument slow can be provided in verifier initialization. It will enable some sophisticated verification to eliminate false negatives. And that will consume more time. The default value of slow is False. For now, only one verifier, OGGIntegrityVerifier has a slow method of verification.

from integv import FileIntegrityVerifier

verifier = FileIntegrityVerifier()
slow_verifier = FileIntegrityVerifier(slow=True)

file_path = "./test/sample/audio/sample.ogg"
verifier.verify(file_path) # True
slow_verifier.verify(file_path) # also True, but slower

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

integv-1.3.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

integv-1.3.0-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file integv-1.3.0.tar.gz.

File metadata

  • Download URL: integv-1.3.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.0

File hashes

Hashes for integv-1.3.0.tar.gz
Algorithm Hash digest
SHA256 45b86c9bc6bbfce8badc00ae5cdd57878a5a56e2dee9dc416c80001ca1a245ae
MD5 0484bec2efca73e277dd2bdfecdb29e2
BLAKE2b-256 9c9825cfc16679434c810c97b782c516ae611193b95976e2e748657b2278a071

See more details on using hashes here.

File details

Details for the file integv-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: integv-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.0

File hashes

Hashes for integv-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8993b9da27abbe4407fb9b214fe1c43990c20cabe33f7166afdb4aef8201778b
MD5 bb7d997531e10d4837ae6ff2db288f54
BLAKE2b-256 0d476ef13c171e23952cf2c25c984e9f65d246baeba9cfdc6b7a0e10407d6dee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page