Skip to main content

Synchronize the metadata from local files in the DB

Project description

Use this app to search for local files and index some metadata information and also the content of the file (text) in the database (FileMetadata model). Multiple directories can be configured to be indexed (settings.FILEMETADATA_LOOKUP_DIRS). With the information registered in the DB it is possible to use the features of django (filters, export, etc.) or other apps to manipulate the data. This app can be used, for example, as a basis for implementing protected download pages or for searching the content of files in the search tool.

This version is supported on Python 3.6+; and Django 2.2+.

Installation

Install the package with pip:

$ pip install django-filemetadata

Add the App in the INSTALLED_APPS

INSTALLED_APPS=(
    "django-filemetadata",
),

Execute makemigrations/migrate.

Configuration

Configure the directories to look for the files in the settings:

FILEMETADATA_LOOKUP_DIRS=[‘/folder1/folder2’, ‘/folder3/folder4’]

Utilization

Index the data with the management command

usage:  filemetadata_index [-f FOLDERS] [-c] [-d] [-s] [-x] [-n] [-a]

Update the the file-metadata found in the directories into the DB.

optional arguments:
  -f FOLDERS            Folder(s) to index (coma separated)
  -c                    Clear the data before reindex
  -d                    Delete only the data from these folders and exit
  -s                    Index the symlinks (Do not follow it)
  -x                    Extract the content of the file (text)
  -n                    Non-reentrant mode (Not recursive)
  -a                    Abort on errors

e.g.

Reindex the files configured in settings

python filemetadata_index

or inform the directories

python filemetadata_index -f /folder1/folder2,/folder3

Or just delete the data from these folders (not recursive in this case):

python filemetadata_index -d -n -f /folder1/folder2,/folder3

Go to Admin and check the data in the FileMetadata model.

Customization

Support for .pdf files This app is compatible with the ‘PyPDF4’ library. If it is installed it can be used to extract the content from pdf files if necessary.

Custom extractor It is possible to override the function that extracts the contents of the files by a more specific one if necessary. To do this, overload the function ‘func_extract_text’ in the indexer.py module

from filemetadata import indexer

def my_extractor(posixpath_obj):
    ...
    return file_content

indexer.func_extract_text = my_extractor

or the extract_text method of the FileIndexer class

from filemetadata.indexer import FileIndexer

class MyFileIndexer(FileIndexer):
  def extract_text(self, file_obj):
    ...
    return file_content

Tests

To run the tests

python load_tests.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-filemetadata-1.0.4.tar.gz (17.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page