The super storage backend
If making sure your file uploads are never duplicated is more important than organising your files into neat folders, you might want to try this package.
You can use the storage backend on a global level by adding the following to your django settings:
DEFAULT_FILE_STORAGE = 'dedupebackend.storage.DedupedStorage'
If you want to use the other features offered by dedupebackend, you need to add dedupebackend to INSTALLED_APPS like this:
INSTALLED_APPS = [ 'dedupebackend', # does not matter what spot ... ]
Adding dedupebackend to INSTALLED_APPS gives you an admin page where you can check your uploaded files. I allready let you know dedupebackend just throws verything in a large folder, but that does not mean you can not add structure to the storage. Just not on a filesystem level. You should add structure by adding relations to other models. It is easy enough to add categories or something:
class FileCategory(models.Model): files = models.ManyToManyField('dedupebackend.UniqueFile') name = models.TextField()
If you want to add a filter to the dedupebackend admin, try something like this:
from dedupebackend.admin import UniqueFileAdmin from dedupebackend.models import UniqueFile admin.site.unregister(UniqueFile) class CategoryUniqueFileAdmin(UniqueFileAdmin): list_filter = UniqueFileAdmin.list_filter + ('filecategory__name',) admin.site.register(UniqueFile, CategoryUniqueFileAdmin)
that might need some work, I never tested it :p
There are some fields in dedupebackend you can use instead of the django FileField and ImageField. You get a picker added to that, you can use to select a file from the existing uploaded files.
Use something like this:
from dedupebackend.fields import * class KoeHenkModel(model.Model): name = models.TextField() file = UniqueFileField("A normal file, nothing special") image = UniqueImageField("an image")
How does it work?
Well, for each uploaded file, dedupebackend creates a file on disk named after the hash of the file. Mostly the same as git does (I actually tried to use libgit2 for this, but git is bad with deletions). Next to that file, a table holds a record with some information about the file. The primary key of this table is the hash value of the file. So it is really impossible to add duplicates (but but, hash collisions).
The fields actually render a file form field on a foreign key model field. The storage backend returns the hash value as the file name. And it can return file objects when given such a hash value.