Kubernetes, Local to Production with Django: 3 - Postgres with Migrations on Minikube

Mark Gituma
12 min readJan 15, 2018

--

So far, we have created and deployed a basic Django app to the minikube Kubernetes cluster. However, to experience the true power of Django, it needs to be connected to a persistent datastore such as a PostgreSQL database.

The code for this part of the series can be found on Github in the part_3-postgres branch.

Edits

  • The following updates were made in July 2019 - added storageClassName: manual to the PersistentVolume and PersistentVolumeClaim spec files.
  • Made the following major updates in November 2020 - updated to python 3.8, updated to Django 3.1.2, updated to postgres 13, added a run script for migrations as opposed to jobs.

Objectives

The objectives for this tutorial are:

  • To utilize the Secret resource API to handle credentials used by the PostgreSQL and Django pods.
  • To utilize the Persistent Volume subsystem for PostgreSQL data storage.
  • Initialize PostgreSQL pods to use the persistent volume object.
  • Expose the database as a service to allow for access within the cluster.
  • Update the Django application in order to access the PostgreSQL database.
  • Run migrations using the Job API and alternatively as a shell command using a deployed pod.

Requirements

Some basic knowledge of Kubernetes is assumed, if not, refer to the previous tutorial post for an introduction to minikube.

Minikube needs to be up and running which can be done by:

$ minikube start

The minikube docker daemon needs to be used instead of the host docker daemon, this can be done by running:

$ eval $(minikube docker-env)

To view the resources that exist on the local cluster, the minikube dashboard will be utilized using the command:

$ minikube dashboard

This opens a new tab on the browser and displays the objects that are on the cluster.

From the github repo, the Kubernetes manifest files can be found in:

$ kubernetes_django/deploy/..

The rest of the tutorial will assume the above is the current working directory when applying the Kubernetes manifests.

TL;DR

To get the tutorial code up and running, execute the following sequence of commands:

# Setup project
$ git clone git@github.com:gitumarkk/kubernetes_django.git
$ cd kubernetes_django/deploy/kubernetes/
$ git checkout part_3-postgres
# Configure minikube
$ minikube start
$ eval $(minikube docker-env)
$ minikube dashboard # Open dashboard in new browser
# Apply Manifests
$ kubectl apply -f postgres/
$ kubectl apply -f django/
# Show service in browser
$ minikube service django-service # Wait if not ready
# Delete cluster when done
$ minikube delete

1. Setting up PostgreSQL

1.1 What is PostgreSQL?

PostgreSQL is a popular open source Object Relational Database Management System with a strong reputation for reliability, data integrity and correctness. It’s the database of choice for many Django projects as it’s easy to integrate.

Even though we are going to install a database within a Kubernetes cluster, my personal preference is to use a database as a service offering such as AWS RDS. Such services provide features that can be hard to implement and manage well, which include:

  • Managed high availability and redundancy.
  • Ease of scaling up or down (i.e. vertical scaling) by increasing or decreasing the computational resource for the database instance to handle variable load.
  • Ease of scaling in and out (i.e. horizontal scaling) by creating replicated instances to handle high read traffic.
  • Automated snapshots and backups with the ability to easily roll back a database.
  • Performing security patches, database monitoring etc.

However, there exists certain scenarios where a cloud native database solution is not feasible and thus Kubernetes is used to manage the database instances e.g. when running locally or on premise.

1.1 Persistent Volumes

Data in PostgreSQL needs to be persisted for long term storage. The default location for storage in PostgreSQL is /var/lib/postgresql/data. However, Kubernetes pods are designed to have ephemeral storage, which means once a pod dies all the data within the pod is lost. To mitigate against this, Kubernetes has the concept of volumes. There are several volume options, but the recommended type for databases is the PersistentVolume subsystem.

Kubernetes focuses on loose coupling, it aims to decouple how volume is provided on the platform vs how it’s consumed by pods. This is because the volume provision definition is usually dependent on the platform the cluster is running on e.g. the volume definition for cloud natives solutions might be different from on premise solutions. As a result, Kubernetes provides two API resources which are the PersistentVolume and PersistentVolumeClaim.

PersistentVolume

A PersistentVolume (PV) resource is a piece of storage on the cluster analogous to a node being either a physical device or a VM. It has a different lifecycle to a pod and it captures the details of how storage is implemented e.g. NFS, iSCSI, or a cloud-provider-specific storage system.

In layman’s terms, the PV says, “I am going to create 100 GBs of memory on the cluster based on a platform specific storage class e.g. AWS Elastic Block Storage, I have no idea who is going to use it, but whoever wants it can get it”. The manifest file to create a PV is:

Taking a logical walk through the manifest file:

  • The spec: capacity: storage field states how much capacity to create.
  • The spec: accessModes defines how the pod will utilize the volume
  • The spec: hostPath field states that the volume should in the local file system. This is ideal for minikube but is not ideal for a production application. (In Mac OS it’s tempting to set the hostPath as a Users\..\<data> path, however, from experience setting a root path other than \data\ prevents the Postgres pod from initializing due to a permissions issue, I am not sure if it’s a bug or a constraint).

To create the volume, run the following:

$ kubectl apply -f postgres/volume.yaml

To get the volume details, the command to execute is:

$ kubectl get pvNAME          CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                  STORAGECLASS   REASON    AGEpostgres-pv   2Gi        RWO            Retain           Bound     default/postgres-pvc   standard                 3m

This result can also be ascertained by viewing the minikube dashboard.

PersistentVolumeClaim

A PerstentVolumeClaim (PVC) is a request for storage by the user and allows the user to consume abstract storage resources on the cluster. Claims can request specific sizes and access modes in the cluster. In layman’s terms, the PV says, “I want 2 GBs of memory somewhere on the network, I don’t know where it is or how is it’s made but I assume it exists and I want to lay a claim to it”. In this way the volume definition and volume consumption remains separate. The manifest to create a PVC is:

To create the volume claim, run the following:

$ kubectl apply -f postgres/volume_claim.yaml

To see our created persistent volume claim, we run:

$ kubectl get pvcNAME           STATUS    VOLUME        CAPACITY   ACCESS MODES   STORAGECLASS   AGEpostgres-pvc   Bound     postgres-pv   2Gi        RWO            standard       3m

1.2 Postgres Credentials

In order to create the Postgres database, credentials are required as well as when accessing the database in our Django application. Furthermore, we don’t want the credentials to be stored in version control or directly on the image. To this end, Kubernetes provides the Secret object to store sensitive data. A Secret object can be created using the declarative file specification or from the command line. For this section, we will use the declarative file specification.

The points of interest are the data: user field as well as the data: password field. These contains base64 encoded strings where the encoding can be generated from the command line by running:

$ echo -n "<string>" | base64

It’s worth noting even though base64 is an encoded format, it’s not encrypted and care should be taken when managing the secrets file. The Secret object is then added to the kubernetes cluster using:

$ kubectl apply -f postgres/secrets.yaml

The result can be ascertained by viewing the minikube dashboard.

1.3 Postgres Deployment

The Postgres manifest for deployment is similar to the Django manifest with the exception that we use the postges:9.6.6 container image which will be fetched from the hub.docker.com registry. The database credentials stored in the Secret object is also passed in and the volume claim is mounted into the pod. The PersistentVolumeClaim is mounted to var/lib/postgresql/data which is the default location used to store the database, however the data is physically stored in the PersistentVolume.

The deployment is created in our cluster by running:

$ kubectl apply -f postgres/deployment.yaml

The result can be ascertained by viewing the minikube dashboard, but can take a few minutes to complete successfully.

1.4 Postgres Service

The postgres service is similar to the Django service we described in the previous tutorial, and the configuration file is as follows.

Kubernetes allows the use of the service name i.e. postgres-service for domain name resolution to the pod IP. This means that instead of an IP address we will use postgres_service as the host in order to find our deployed database instance from Django.

The beauty of using Kubernetes, is that with a slight modification to the above Service description, we can use a database external to the cluster, and as long as the service name remains constant, Kubernetes will be able to find the external database. We will investigate this in detail when we focus on deploying the Kubernetes cluster to cloud native solutions (see part 5 of this series). To add your Postgres service to the Kubernetes cluster, run:

$ kubectl apply -f postgres/service.yaml

The result can be ascertained by viewing the minikube dashboard.

2. Django

2.1 settings.py

By default, Django uses the sqlite database configuration. To update the database configuration, the following modifications needs to be made to the DATABASES variable in the settings.py file.

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'kubernetes_django',
'USER': os.getenv('POSTGRES_USER'),
'PASSWORD': os.getenv('POSTGRES_PASSWORD'),
'HOST': os.getenv('POSTGRES_HOST'),
'PORT': os.getenv('POSTGRES_PORT', 5432)
}
}

The database credentials will be inferred from the environment variables as it’s not a good idea to store them directly in the settings file which is to be checked into version control.

In addition, the django-health-check python module is used to determine whether a successful connection to the database has been made. In a more advance configuration, the module provides convenience endpoints which can be utilized by Kubernetes to determine the health status of a pod.

Once this changes have been made, the django needs to be built again using:

$ docker build -t <IMAGE_NAME>:<TAG> .

Note, this should be conducted within the minikube docker daemon to provide local docker image discovery by kubernetes. The <TAG> parameter should be different from the previous build to allow the deploment to be updated in the cluster.

2.1 Updating the Django Deployment

Once the Django docker image has been built, we need to update our existing Django deployment in django/deployments.yaml to use the new Django image as well as provide the required environmental variables.

The update to the deployment file includes:

  • Updating the project image to the new version.
  • Setting the database credentials as environmental variables which will be passed into the settings file. The POSTGRES_USER and POSTGRES_PASSWORD variables are extracted from the Secret object that was created earlier. The POSTGRES_HOST variable takes on the value of the Postgres service that we created. As was explained, Kubernetes is clever enough to map the postgres service name to the actual pod IP.
  • Add a bash script i.e. run.sh which contains the following commands to run the migrations (more on migrations later on) as well as to start a gunicorn server.

We now apply the updated deployment to the cluster using the kubectl apply -f <filename>.yaml . Once the deployment has been completed, we can check if the database has been configured. To do this we get the Django service name using.

$ kubectl get service

And with the retrieved service name, we can use minikube to launch the service on the browser by running:

$ minikube service <service_name>

This opens a browser with our running Django application. To see whether the database has been connected and initialized properly, we need to navigate to the health check endpoint, the path set for this project is / (to create your own, follow the django-health-check documentation). This should show the following status:

This is the expected result due to the fact the database has not been initialized properly. To initialize the database (assuming it has been set up properly), Django requires that Database migrations need to be run to set up the tables so that the health check can pass.

3 Database Migrations

Database migrations are utilized in many environments to update or revert database schema. There are a few ways to run database migrations, but for this tutorial we will focus on three ways, which are;

  • Using a bash script as seen earlier (this is my usual go to).
  • Using the Job API.
  • Directly as a shell command via a pod.

All of them have their advantages and disadvantages which we will briefly touch on.

3.1 Using a bash script

This is the simplest way that I have found to run migrations. In my experience simply adding a bash script and initiating it as a container command (as seen earlier) prevents a lot of headaches and complexity and it is quite easy to automate. The following is an example with 2 simple commands, but can be more complex depending on the use case.

3.2 Job Resource

A Job creates a pod which runs a single task to completion. This task can be anything, but in our case, it is the Django migration. The job configuration file is:

The manifest to create a Job object is similar to a Deployment, with some slight modifications such as the migrations are run by setting the appropriate management command in the spec: template: spec: containers: command field.

The Job is scheduled on the cluster using:

$ kubectl apply -f django/job-migration.yaml

To see if the Job was completed, we need to get all the pods:

$ kubectl get pods --show-allNAME                                  READY     STATUS      RESTARTS   AGE$ django-migrations-sw5s4               0/1       Completed   0          1m$ django-pod-fdbc9bc9b-mkqt4            1/1       Running     0          28m$ postgres-deployment-946db9dfb-2xcxm   1/1       Running     0          28m

We can see that the django-migration job has been competed. The full name for the migration pod is then used for the logs which shows that the migration was completed successfully based on the logs.

$ kubectl logs django-migrations-sw5s4Operations to perform:
Apply all migrations: admin, auth, contenttypes, db, sessions
Running migrations:
Applying contenttypes.0001_initial... OK

To confirm that Django is connected to the database, navigate back to the root url on the browser, and if everything worked out well, you should see the following.

The downsides of using the Job resource to run migrations, is that the migrations cannot be run again without modifying the manifest file i.e. by updating the image name. This should usually be the case when deploying new versions of the codebase online, however in the scenario where the migrations needs to be re-run with the same image, the Job object needs to be deleted from the server before it can be run again:

$ kubectl delete -f deploy/kubernetes/django/job-migration.yaml

3.3 With the CLI

Migrations can also be executed from the shell of a running container using the kubectl exec command. In order to run the migrations, we need to get the name of the running pod of interest by:

$ kubectl get pods

Once the pod name has been found, the migrations can be run by:

$ kubectl exec <pod_name> -- python /app/manage.py migrate

This should show the same results as running the migrations using the job.

4. Conclusion

Now that we have seen how to install postgres and run on a Kubernetes cluster locally, the next step will be to add Redis for caching and as a message broker as well as adding Celery for asynchronous task processing.

If you have any questions or anything needs clarification, you can book a time with me on https://mbele.io/mark

5. Troubleshooting

During the course of this tutorial, I kept running into problems with minikube which include and not limited to:

  • Minikube unable to open the dashboard or service in a new browser.
  • Kubectl external service not exposed properly.

To fix this problems, on occasion I had to delete all my deployments and services, after which I restarted minikube and redeployed. On a particular extreme case I had to restart my laptop and repeat the steps for deploying.

6. Credits

7. Tutorial Links

Part 1, Part 2, Part 3, Part 4, Part 5, Part 6

--

--