Module 2.2

The Data has Value -- Let's store it!

The web applications that we built in the previous part of this course handled and stored data within the application. This means that when the web server or the application is restarted, the data is lost.

Almost every web application has the need for storing and retrieving data. The data can be stored into the file system as files by the application, or the responsibility of storing the data can be given out to a separate application such as a database management system. Database management systems have been built with the explicit purpose of storing and retrieving data in a robust and efficient manner, and they can reside both on the same server as the web application or somewhere elsewhere on the internet.

Although there are vast differences between the term database management system and database, we will use the term 'database' to cover these both. Similarly, although there are many types of database systems, we will mostly focus on relational databases.

Python and SQLite

We will use SQLite with python since it is ridiculously easy to use. SQLite is a naive SQL database engine, which makes it very easy to use, for example one does not need to care about setting up database user rights because there are no users. While SQLite is an ideal engine for learning SQL, more serious projects should use more refined database engines such as MySQL or PostgreSQL. Note that all of them support vanilla SQL commands, the differences are within performance, extensions to SQL, as well as access rights.

SQLite stores its database in a file, for example db.sqlite. Accessing the file in Python can be as follows.

import sqlite3

conn = sqlite3.connect('db.sqlite')
cursor = conn.cursor()

Once the cursor has been established, we can use cursor.execute() to execute a single SQL command or cursor.executescript() to execute multiple SQL commands. If we modify the database, then we should save the changes with conn.commit(). For more information, see Python's sqlite library.


Objects and Databases

When working with classes, objects and databases, the need for transforming database query results into objects arises (see Object-relational mapping, ORM). As this is a very typical task, an abundance of tools have been created for the task.

ORM tools offer — among other things — the functionality needed to transform existing classes into database schemas and to build queries using objects. This has created a situation where large parts of the typical database interaction is no longer implemented by the developer, but by a framework. The developer effectively only needs to implement the database interaction in those parts, where the existing approaches are not sufficient.

Django Models

The convention is that persistent data structures that app wants to store in a database is defined The data structures should be defined a subclass of models.model.


from django.db import models

class Person(models.Model):
    name = models.TextField()
    age = models.IntegerField()

The class allows us to manipulate the database. We can add a new Person with

bob = Person.objects.create(name='Bob', age=42)
alice = Person.objects.create(name='Alice', age=37)

We can fetch all persons with

persons = Person.objects.all()

We can query the database with

bobs = Person.objects.filter(name='Bob')
middleaged = Person.objects.filter(age__gte=40)
alice = Person.objects.get(name='Alice') # Works only if there is a unique entry

For query notation, see the documentation.

We can update the entries (as long as we save them)

bob.age = 45

The models are saved in a database. The default database is a sqlite stored in db.sqlite file. The same database is also used for storing user sessions, as well as, registered users and admins (something that django provides as a built-in service).

Whenever the model specifications are changed, Django needs to be told to update the schema in db.sqlite. Two commands are required

python3 makemigrations
python3 migrate

The first command will make a migration Python file, located in pages/migrations and starting with a number, while the second command updates the database. During the early development cycle it may be the case that migrating the database is too cumbersome with the existing database. If there is no valuable information in db.sqlite a valid cop-out strategy is to delete the migration files and the database and sync the database from scratch. Naturally, this is not a good idea if the database has any valuable information.


Database Transactions

Transactions are used to verify that all the database operations in a group are executed, or that none of them are. Database management systems offer support for implementing transactions, but, as we often work outside the database, additional steps are needed.

Transactions matter also if there are multiple database users, which may lead to unintented consequences. Consider the following classic problem of transforming money from one account to another.

def transfer(sender, receiver, amount):
    acc1 = Account.objects.get(iban=sender)
    acc2 = Account.objects.get(iban=receiver)

    acc1.balance -= amount
    acc2.balance += amount

Consider two threads A and B calling transfer at the same time with the following sequence:

  • Thread A retrieves the accounts
  • Thread B retrieves the accounts
  • Thread B updates and saves the accounts
  • Thread A updates and saves the accounts

If both sender and receiver were the same, the net effect of this operation is that the action of Thread B will be overwritten by Thread A. While this sequence of events is unlikely to happen, we should obviously prevent it from happening in the first place.

In order to do that, Django provides transaction.atomic either as a decorator for a function

from django.db import transaction

def transfer(sender, receiver, amount):
    acc1 = Account.objects.get(iban=sender)
    acc2 = Account.objects.get(iban=receiver)

    acc1.balance -= amount
    acc2.balance += amount

or as a context manager within a function

from django.db import transaction

def transfer(sender, receiver, amount):
	with transaction.atomic():
		acc1 = Account.objects.get(iban=sender)
		acc2 = Account.objects.get(iban=receiver)

		acc1.balance -= amount
		acc2.balance += amount

What happens in the background is the following. Whenever transaction.atomic is reached Django sends BEGIN TRANSACTION command to SQLite (Actually it sends SAVEPOINT but it is almost the same thing). Once the function is done Django sends COMMIT to SQLite which completes the transaction.

SQLite does not allow any writes whenever there is an open (second) transaction. So in our previous example, both threads A and B will fail, by throwing an exception. More importantly, the fail will happen only during the commit. That is, the local objects acc1 and acc2 that are currently held in memory during the call of transfer are updated, and have different values, but the database itself is not updated.

SQLite locks the entire database, that is, as long as transaction is open no writes are possible. To see this effect in practice, we can open a connection to the database and open a transaction without closing it

$ sqlite3 src/db.sqlite3 
SQLite version 3.30.1 2019-10-10 20:19:45
Enter ".help" for usage hints.

As long as this connection is open, no other connection (manual or Django) can write to the database. This is a very conservative approach (and one of the downsides of SQLite). Fancier database engines do not lock the database, and select for update is probably needed to lock the objects that are about to be modified.


Handling object relationships

When working with databases, information in one table can refer to information in another table. A customer — for example — can have multiple orders, and each order points to a specific customer. In Django, the references for such a case would be written using a ForeignKey.

from django.db import models

from django.contrib.auth.models import User

class Account(models.Model):
    owner = models.ForeignKey(User, on_delete=models.CASCADE)
    iban = models.TextField()

Here User is a Django built-in model for users.

The above statement allows users to have multiple accounts. If only one account is allowed per user one can use OneToOneField.

We can access the User object from Account object with

acc = Account.objects.get(pk=0)
user = acc.owner

We can also cross-search accounts using owner's information

accounts_owned_by_johns = Account.objects.filter(owner__first_name='John')
You have reached the end of this section! Continue to the next section:

Remember to check your points from the ball on the bottom-right corner of the material!