A decorator that connects django model and data generator function

These details have not been verified by PyPI

Project links

Homepage

Project description

django-pipe2db

django-pipe2db

Concepts

A decorator that written by wrapping orm method of django models
It maps the relationship between the models and data via nested dictionary

Features

It bridges Python functions and django models
Create and update data to database via models
Automatically create and modify tables by wrapping manage.py commands from django as makemigrations and migrate
Load minimum django settings for can use django orm as standalone that without using the django project
Insertion of data with the same relationship as foreignkey and manytomany fields
Inserting a content file object as an image field

Install and Import

pip install django-pipe2db

# crawler.py
from pipe2db import pipe
from pipe2db import setupdb

Quick Start

1. Using django orm as standalone

Create models.py in the directory that will be used as the Django app
example for minimum project directory structure. see

Project
│  __main__.py
│
└─bookstore
    │  insert.py
    │  
    └─db
          models.py

# models.py
from django.db import models


class Author(models.Model):
    email = models.EmailField('Email', unique=True)
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)
    date_of_birth = models.DateField(null=True, blank=True)
    date_of_death = models.DateField('Died', null=True, blank=True)

    class Meta:
        db_table = 'author'

# insert.py
from pipe2db import pipe, setupdb


setupdb() # find models automatically
# setupdb('bookstore.db') # or more explicitly 

# The key of the data and the field names of the model are matched
author1 = {
    'email': 'xman1@google.com',
    'first_name': 'charse',
    'last_name': 'javie',
    'date_of_birth': '1975-07-25',
    'date_of_death': '1995-07-11'
}
author2 = {
    'email': 'yman1@google.com',
    'first_name': 'jin',
    'last_name': 'gray',
    'date_of_birth': '1925-07-25',
    'date_of_death': '1999-01-21'
}


@pipe({
    'model': 'db.Author', 
    'unique_key': 'email', # unique values of table as pk
    # 'method': 'update' # If uncomment, works in update mode
})
def insert(*args, Author, **kwargs):
    # You Can get model class via argumenting at generator function

    # from django.apps import apps # or via get_model method of django
    # Author = apps.get_model('db.Author') 

    queryset = Author.objects.all()

    yield from [author1, author2, author3]

run examples

python bookstore/insert.py

2. Using with django project

Since DJANGO_SETTINGS_MODULE is already setted, it's not need to call setupdb
django site example

run via shell which excuted by 'python manage.py shell' command of django manage
python manage.py shell
In [1]: from yourpackage.insert import insert
In [2]: insert()

id	email	first_name	last_name	date_of_birth	date_of_death
1	xman1@google.com	charse	javie	1975-07-25	1995-07-11
2	yman1@google.com	jin	gray	1925-07-25	1999-01-21
3	batman1@google.com	wolverin	jack	1988-07-25	NULL

Useage

Argument of pipe decorator as context

A context is a dictionary that describes the relationship between the model and the data
In the following examples, the elements that make up the context are explained step by step

model

django model to pipe data written as string literals

# some_crawler.py
from pip2db import pipe

@pipe({
    'model': 'db.Author'
    # 'model': 'yourapp.YourModel' on django project
})
def abc_crawler():
    ...
    yield row

It is also a good way to assign and use a variable to increase reusability When expressing nested relationships in relational data, not assigning them as variables can result in repeatedly creating the same context.

# assign to variable crawler.py

# It seems to better way
context_author = {
    'model': 'db.Author'
}

@pipe(context_author)
def abcd_crawler(*args, **kwargs):
    yield ..

It is also possible to specify the model by directly importing it, but in the case of standalone, you must declare setupdb before importing the model

# dose not look good.py

from pipe2db import setupdb, pipe

setupdb()
from .db.models import Author

context_author = {'model': Author}

@pipe(context_author)
def abc():
    yield ..

Another way to refer to the model class

Using Django's apps module

from django.apps import apps

Author = apps.get_model('db.Author')

Specify the model name as an argument to the generator function

# An example of controlling a generator based on data in a database
@pipe(context_author)
def abc_crawler(rows, response, Author):
    visited = Author.objects.values_list('review_id', flat=True)
    for row in rows:
        if row['id'] in visited:
            break
        yield row

unique_key

key to identify data like as primary key
If you don't specify it, creating data will be duplicated
To identify data with one or several keys as unique_together

# models.py

# unique key model
class Author(models.Model):
    ...
    first_name = models.CharField(max_length=100, unique=True)
    ...

# uniqufy_by_one.py

context_author = {
    'model': 'db.Author',
    'unique_key': 'first_name'
}

If uniqueness is not guaranteed with one key, add another

# models.py

# unique together model
class Author(models.Model):
   ...
   first_name = models.CharField(max_length=100)
   last_name = models.CharField(max_length=100)

   class Meta:
       unique_together = ['first_name', 'last_name']
   ...

#unique_together.py

context_author = {
   'model': 'db.Author',
   'unique_key': ['first_name', 'last_name']
}

method

Creates or updates data with a unique key specified
Defaults is create
In create mode, data is inserted based on unique.
In update mode as wrapper update_or_create of django method, creates records if they don't exist, otherwise modifies existing records

# incorrect create.py
from pipe2db import pipe

author_incorrect = {
    'email': 'batman1@google.com',
    'first_name': 'who', # incorrect
    'last_name': 'jackman',
    'date_of_birth': '1988-07-25', # incorrect
    'date_of_death': None
}

context = {
    'model': 'db.Author',
    'unique_key': 'email',
    # 'method': 'create' no need to specify if create
}

@pipe(context)
def gen_author(...):
    yield author_incorrect

result table

id email first_name last_name date_of_birth date_of_death

3 batman1@google.com who jackman 1988-07-25 NULL

id	email	first_name	last_name	date_of_birth	date_of_death
3	batman1@google.com	who	jackman	1988-07-25	NULL

# correct as update.py
from pipe2db import pipe

author_corrected = {
    'email': 'batman1@google.com',
    'first_name': 'Hugh', # correct
    'last_name': 'jackman',
    'date_of_birth': '1968-10-12', # correct
    'date_of_death': None
}

context = {
    'model': 'db.Author',
    'unique_key': 'email',
    'method': 'update', # for update record by corrected data
}

@pipe(context)
def gen_author(...):
    yield author_corrected

result table

id email first_name last_name date_of_birth date_of_death

3 batman1@google.com Hugh jackman 1968-10-12 NULL

id	email	first_name	last_name	date_of_birth	date_of_death
3	batman1@google.com	Hugh	jackman	1968-10-12	NULL

rename_fields

Dictionary of between data and model as key:field mapping
Used when the data key and the model field name are different

# models.py
from django.db import models


class Author(models.Models):
    ...
    ...

class Book(models.Model):
    title = models.CharField(max_length=200) 
    isbn = models.CharField('ISBN', max_length=13, unique=True)

    class Meta:
        db_table = 'book'

# book_crawler.py

context = {
    'model': 'db.Book',
    'unique_key': 'isbn',
    'rename_fields': {
        'header' : 'title', 
        'book_id': 'isbn',
    }
}
# map header -> title, book_id -> isbn

@pipe(context)
def book_crawler(abc, defg, jkl=None):
    book_list = [
        {
            'header': 'oh happy day', # header to title
            'book_id': '1234640841',
        },
        {
            'header': 'oh happy day',
            'book_id': '9214644250',
        },
    ]
    yield from book_list

exclude_fields

List of keys to excluds
Used when the data has a key that is not in the field names in the model
Filter too much information from data that model cannot consume

# bookcrawler.py
from pipe2db import pipe
...
...

context = {
    'model': 'db.Book',
    'unique_key': 'isbn',
    'rename_fields': {
        'header' : 'title', 
        'book_id': 'isbn',
    },
    'exclude_fields': ['status'] # exclude
}

@pipe(context)
def book_crawler(abc, defg, jkl=None):
    book_list = [
        {
            'header': 'oh happy day', # header to title
            'book_id': '1234640841',
            'status': 'on sales', # status is not needed in Book model
        },
        {
            'header': 'oh happy day',
            'book_id': '9214644250',
            'sstatus': 'no stock',
        },
    ]
    yield from book_list

Mapping of Relative Data

foreignkey_fields

Creat records by generation according to the foreign key relationship between tables
Recursively nest parent dict to children dict
There are two way of create relationship data

# models.py
# two models of related with foreign key
from django.db import models


class Author(models.Model):
    email = models.EmailField('Email', unique=True)
    name = models.CharField(max_length=100)

    class Meta:
        db_table = 'author'


class Book(models.Model):
    author = models.ForeignKey('Author', on_delete=models.CASCADE, null=True) # fk
    isbn = models.CharField('ISBN', max_length=13, unique=True)
    title = models.CharField(max_length=200)

    class Meta:
        db_table = 'book'

# some crawler.py
from pipe2db import pipe

# 1. Generate data of book author nested

context_author = {
    'model': 'db.Author',
    'unique_key': 'email',
    'method': 'update'
}

context_book = {
    'model': 'db.Book',
    'unique_key': 'isbn',
    'foreignkey_fields': {
        'book': context_author
    }
}

# author data is nested in book data
@pipe(context_book)
def parse_book():
    author1 = {
        'email': 'pbr112@naver.com',
        'name': 'hs moon',
    }
    book = {
        'author': author1,
        'title': 'django-pipe2db',
        'isbn': '291803928123'
    }
    yield book

# some crawler.py 
from pipe2db import pipe

# 2. Generate data of author and book sequentially

@pipe(context_author)
def parse_author():
    author1 = {
        'email': 'pbr112@naver.com',
        'name': 'hs moon',
    }
    yield author1

# create author first
author1 = parse_author()

# create book after and connect fk relation to author
@pipe(context_book)
def parse_book():
    book = {
        'author': author1['email'], # Since the author has already been created, it possible to pass email as pk of author only
        # 'author': author1, # or same as above
        'title': 'django-pipe2db',
        'isbn': '291803928123'
    }
    yield book

manytomany_fields

Create data for manytomany relationships
Generate data with nesting the children m2m data in the parent data key in the form of a list

# models.py 
from django.db import models


class Book(models.Model):
    title = models.CharField(max_length=200)
    isbn = models.CharField('ISBN', max_length=13, unique=True)

    genre = models.ManyToManyField('db.Genre')

    class Meta:
        db_table = 'book'


class Genre(models.Model):
    name = models.CharField(max_length=200, unique=True)

    class Meta:
        db_table = 'genre'

# m2m_generator.py
from pipe2db import pipe

context_genre = {
    'model': 'db.Genre',
    'unique_key': 'name'
}

context_book = {
    'model': 'db.Book',
    'unique_key': 'isbn',
    'manytomany_fields': {
        'genre': context_genre
    }
}

@pipe(context_book)
def gen_book_with_genre():
    genre1 = {'name': 'action'}
    genre2 = {'name': 'fantasy'}

    book1 = {
        'title': 'oh happy day', 'isbn': '2828233644', 'genre': [genre2], # nest genres to list
    }
    book2 = {
        'title': 'python', 'isbn': '9875230846', 'genre': [genre1, genre2],
    }
    book3 = {
        'title': 'java', 'isbn': '1234640841', # has no genre
    }
    yield from [book1, book2, book3]

See complicate context and data nested level example

Create record with contentfiles

contentfile_fields

Saving file via ContentFile class from django.core.files module
source_url_field is specified as meta data for determinding file name

# models.py
from django.db import models

class BookImage(models.Model):
    img = models.ImageField()

    class Meta:
        db_table = 'bookimage'

from pipe2db import pipe

@pipe({
    'model': 'db.BookImage',
    'contentfile_fields': {
        'img': {
            'source_url_field': 'src',
        }
    },
    'exclude_fields': ['src'] # when model dose not need src data
})
def image_crawler(response):
    image_data = {
        'img': 'response_content',
        'src': response.url #  needed for extracting filename as source_url_field
    }
    yield image_data

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.3

Aug 2, 2022

1.0.2

Jul 26, 2022

1.0.1

Jul 25, 2022

0.0.8

Jul 23, 2022

0.0.7

Jul 23, 2022

0.0.6

Jul 23, 2022

0.0.5

Jul 22, 2022

0.0.4

May 11, 2022

0.0.3

Aug 28, 2021

0.0.2

Aug 21, 2021

0.0.1

Aug 18, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-pipe2db-1.0.3.tar.gz (15.1 kB view details)

Uploaded Aug 2, 2022 Source

File details

Details for the file django-pipe2db-1.0.3.tar.gz.

File metadata

Download URL: django-pipe2db-1.0.3.tar.gz
Upload date: Aug 2, 2022
Size: 15.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.9

File hashes

Hashes for django-pipe2db-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`01719bf9ef3d40bae5823585520d545fbbab97d6ad924f1ff93c21365fbadf0f`
MD5	`1a2ea7b30a1c2a94546fefb1e3a4952a`
BLAKE2b-256	`799ec62cb230ff160a29063830171c910c0c013a9f737a8ac1371cf71a31b2e3`

See more details on using hashes here.

django-pipe2db 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

django-pipe2db

Contents

Concepts

Features

Install and Import

Quick Start

1. Using django orm as standalone

2. Using with django project

Useage

Argument of pipe decorator as context

model

unique_key

method

rename_fields

exclude_fields

foreignkey_fields

manytomany_fields

contentfile_fields

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes