The Data has Value -- Let's store it!
The web applications that we built in the previous part of this course handled and stored data within the application. This means that when the web server or the application is restarted, the data is lost.
Almost every web application has the need for storing and retrieving data. The data can be stored into the file system as files by the application, or the responsibility of storing the data can be given out to a separate application such as a database management system. Database management systems have been built with the explicit purpose of storing and retrieving data in a robust and efficient manner, and they can reside both on the same server as the web application or somewhere elsewhere on the internet.
Although there are vast differences between the term database management system and database, we will use the term 'database' to cover these both. Similarly, although there are many types of database systems, we will mostly focus on relational databases.
Python and SQLite
We will use SQLite with python since it is ridiculously easy to use. SQLite is a naive SQL database engine, which makes it very easy to use, for example one does not need to care about setting up database user rights because there are no users. While SQLite is an ideal engine for learning SQL, more serious projects should use more refined database engines such as MySQL or PostgreSQL. Note that all of them support vanilla SQL commands, the differences are within performance, extensions to SQL, as well as access rights.
SQLite stores its database in a file, for example db.sqlite
. Accessing the file
in Python can be as follows.
import sqlite3
conn = sqlite3.connect('db.sqlite')
cursor = conn.cursor()
Once the cursor has been established, we can use cursor.execute()
to execute a single
SQL command or cursor.executescript()
to execute multiple SQL commands. If we
modify the database, then we should save the changes with conn.commit()
.
For more information, see Python's sqlite library.
Objects and Databases
When working with classes, objects and databases, the need for transforming database query results into objects arises (see Object-relational mapping, ORM). As this is a very typical task, an abundance of tools have been created for the task.
ORM tools offer — among other things — the functionality needed to transform existing classes into database schemas and to build queries using objects. This has created a situation where large parts of the typical database interaction is no longer implemented by the developer, but by a framework. The developer effectively only needs to implement the database interaction in those parts, where the existing approaches are not sufficient.
Django Models
The convention is that persistent data structures that app wants to store in a database
is defined models.py
. The data structures should be defined a subclass of models.model
.
# models.py
from django.db import models
class Person(models.Model):
name = models.TextField()
age = models.IntegerField()
The class allows us to manipulate the database. We can add a new Person with
bob = Person.objects.create(name='Bob', age=42)
alice = Person.objects.create(name='Alice', age=37)
We can fetch all persons with
persons = Person.objects.all()
We can query the database with
bobs = Person.objects.filter(name='Bob')
middleaged = Person.objects.filter(age__gte=40)
alice = Person.objects.get(name='Alice') # Works only if there is a unique entry
For query notation, see the documentation.
We can update the entries (as long as we save them)
bob.age = 45
bob.save()
The models are saved in a database. The default database is a sqlite stored in
db.sqlite
file. The same database is also used for storing user sessions, as
well as, registered users and admins (something that django provides as a
built-in service).
Whenever the model specifications are changed, Django needs to be told to
update the schema in db.sqlite
. Two commands are required
python3 manage.py makemigrations
python3 manage.py migrate
The first command will make a migration Python file, located in
pages/migrations
and starting with a number, while the second command updates
the database. During the early development cycle it may be the case that
migrating the database is too cumbersome with the existing database. If there
is no valuable information in db.sqlite
a valid cop-out strategy is to delete
the migration files and the database and sync the database from scratch.
Naturally, this is not a good idea if the database has any valuable information.
Database Transactions
Transactions are used to verify that all the database operations in a group are executed, or that none of them are. Database management systems offer support for implementing transactions, but, as we often work outside the database, additional steps are needed.
Transactions matter also if there are multiple database users, which may lead to unintented consequences. Consider the following classic problem of transforming money from one account to another.
def transfer(sender, receiver, amount):
acc1 = Account.objects.get(iban=sender)
acc2 = Account.objects.get(iban=receiver)
acc1.balance -= amount
acc2.balance += amount
acc1.save()
acc2.save()
Consider two threads A and B calling transfer
at the same time with the following sequence:
- Thread A retrieves the accounts
- Thread B retrieves the accounts
- Thread B updates and saves the accounts
- Thread A updates and saves the accounts
If both sender and receiver were the same, the net effect of this operation is that the action of Thread B will be overwritten by Thread A. While this sequence of events is unlikely to happen, we should obviously prevent it from happening in the first place.
In order to do that, Django provides transaction.atomic
either as a decorator for a function
from django.db import transaction
@transaction.atomic
def transfer(sender, receiver, amount):
acc1 = Account.objects.get(iban=sender)
acc2 = Account.objects.get(iban=receiver)
acc1.balance -= amount
acc2.balance += amount
acc1.save()
acc2.save()
or as a context manager within a function
from django.db import transaction
def transfer(sender, receiver, amount):
with transaction.atomic():
acc1 = Account.objects.get(iban=sender)
acc2 = Account.objects.get(iban=receiver)
acc1.balance -= amount
acc2.balance += amount
acc1.save()
acc2.save()
What happens in the background is the following. Whenever transaction.atomic
is reached Django sends BEGIN TRANSACTION
command to SQLite (Actually it
sends SAVEPOINT
but it is almost the same thing).
Once the function is done Django sends COMMIT
to SQLite which completes the transaction.
SQLite does not allow any writes whenever there is an open (second) transaction.
So in our previous example, both threads A and B will fail, by throwing an exception.
More importantly, the fail will happen only during the commit. That is, the local
objects acc1
and acc2
that are currently held in memory during the call of transfer
are updated,
and have different values, but the database itself is not updated.
SQLite locks the entire database, that is, as long as transaction is open no writes are possible. To see this effect in practice, we can open a connection to the database and open a transaction without closing it
$ sqlite3 src/db.sqlite3
SQLite version 3.30.1 2019-10-10 20:19:45
Enter ".help" for usage hints.
sqlite> BEGIN TRANSACTION;
sqlite>
As long as this connection is open, no other connection (manual or Django) can write to the database. This is a very conservative approach (and one of the downsides of SQLite). Fancier database engines do not lock the database, and select for update is probably needed to lock the objects that are about to be modified.
Handling object relationships
When working with databases, information in one table can refer to information
in another table. A customer — for example — can have multiple orders, and
each order points to a specific customer. In Django, the references for such a
case would be written using a ForeignKey
.
from django.db import models
from django.contrib.auth.models import User
class Account(models.Model):
owner = models.ForeignKey(User, on_delete=models.CASCADE)
iban = models.TextField()
Here User
is a Django built-in model for users.
The above statement allows users to have multiple accounts.
If only one account is allowed per user one can use OneToOneField
.
We can access the User object from Account object with
acc = Account.objects.get(pk=0)
user = acc.owner
We can also cross-search accounts using owner's information
accounts_owned_by_johns = Account.objects.filter(owner__first_name='John')
Remember to check your points from the ball on the bottom-right corner of the material!