Kubernetes, Local to Production with Django: 4 - Celery with Redis and Flower

So far we have covered how to deploy a Django application in a local Kubernetes cluster, we have then integrated it with a PostgreSQL database and run migrations on the database using the Job Controller.

  • The following updates were made in November 2020 — updated to kubernetes 1.19.2 updated to python 3.8, updated to Django 3.1.2, updated to postgres 13, updated to celery 4.4.7.

In this part of the tutorial, we will look at how to deploy a celery application with Redis as a message broker and introduce the concept of monitoring by adding the Flower module, thus the following points are to be covered:

  • Deploy Redis into our Kubernetes cluster, and add a Service to expose Redis to the django application.
  • Update the Django application to use Redis as a message broker and as a cache.
  • Create celery tasks in the Django application and have a deployment to process tasks from the message queue using the celery worker command and a separate deployment for running periodic tasks using the celery beat command.
  • Add the celery flower package as a deployment and expose it as a service to allow access from a web browser.

Some basic knowledge of Kubernetes is assumed, if not, refer to the previous tutorial post for an introduction to minikube.

Minikube needs to be up and running which can be done by:

$ minikube start

The minikube docker daemon needs to be used instead of the host docker daemon which can be done by:

$ eval $(minikube docker-env)

To view the resources that exist on the local cluster, the minikube dashboard will be utilized using the command:

$ minikube dashboard

This opens a new tab on the browser and displays the objects that are in the cluster.

The code for this part of the series can be found on Github in the part_4-redis-celery branch. From the github repo, the Kubernetes manifest files can be found in:

$ kubernetes_django/deploy/..

The rest of the tutorial will assume the above is the current working directory when applying the Kubernetes manifests.

To get the tutorial code up and running, execute the following sequence of commands:

# Setup project
$ git clone git@github.com:gitumarkk/kubernetes_django.git
$ cd kubernetes_django/deploy/kubernetes/
$ git checkout part_4-redis-celery
# Configure minikube
$ minikube start
$ eval $(minikube docker-env)
$ minikube dashboard # Open dashboard in new browser
# Apply Manifests
$ kubectl apply -f postgres/ # See dashboard in browser
$ kubectl apply -f redis/
$ kubectl apply -f django/
$ kubectl apply -f celery/
$ kubectl apply -f flower/
# Show services in browser
$ minikube service django-service # Wait if not ready
$ minikube service flower-service
# Delete cluster when done
$ minikube delete

1. Background

In a typical web application, we have the critical request/response cycle which needs to have a short latency e.g. user authentication. We also have the longer running background tasks that can have a more tolerable latency, hence does not immediately impact the user experience e.g image/document processing. As such, background tasks are typically run as asynchronous processes outside the request/response thread.

Celery is a popular python task queue with a focus on real time processing. It has good Django integration making it easy to set up. It utilizes the producer consumer design pattern where:

  • A producer creates the task.
  • The task is placed in a messaging queue.
  • Consumers subscribed to the messaging queue can receive the messages and process the tasks in a different queue.

In the case of celery, it’s both a producer and a consumer i.e. it acts as a producer when an asynchronous task is called in the request/response thread thus adding a message to the queue, as well as listening to the message queue and processing the message in a different thread. This means we can use the exact same codebase for both the producer and consumer.

For celery to work effectively, a broker is required for message transport. There are several brokers that can be utilized, which include RabbitMQ, Redis, Kafka etc. For this tutorial we will use Redis as a message broker, even though not as complete as RabbitMQ, Redis is quite good as a cache datastore as well and thus we can cover 2 use cases in one.

Finally, we will add basic monitoring for celery by adding the Flower package, which is a web based tool for monitoring and administering Celery clusters.

2. Deploying Redis

In a high availability setup, Redis is run using the Master Slave architecture which has fault tolerance and allows for faster data accessibility in high traffic systems. In this setup the Redis application is replicated across a number of hosts that have copies of the same data so that if one host goes down, the data is still available. The master is the host that writes data and coordinates sorts and reads on the other host called slaves.

For our use case though, we will be running a trivial application where celery will be deployed on a single host thus one master and no slaves.

The Deployment Controller manifest to manage the Redis application on the cluster is:

Where:

  • The spec: selector: matchLabels field specifies which pods this deployment applies to i.e. the pods containing the labels in matchLabels .
  • The spec: replicas field shows only 1 pod is to be deployed.
  • The spec: template: labels field contains key value pairs that are used to identify the pod.
  • The spec: template: spec: containers field indicates the name of the Redis container as well as the image to use i.e. image: redis which is pulled from the docker registry. We introduce the concept of resources: requests where we can limit the cpu and memory foot print to prevent the pod from overusing the host resources. The containerPort field shows that we are exposes port 6379 from the container to the pod (but cannot be accessed outside the pod without creating a service).

The deployment is created in our cluster by running:

$ kubectl apply -f redis/deployment.yaml

The result can be verified by viewing the minikube dashboard.

To allow Redis to be accessed outside the pod, we need to create a Kubernetes service. The Service manifest file is as follows:

Where:

  • The spec: ports field shows the 6379 targetPort in the pod should be mapped to the 6379 port in the cluster.
  • The spec: selector field indicates the pods which this service should be applied to based on the specified key value pair.

The service is created in our cluster by running:

$ kubectl apply -f redis/service.yaml

The result can be verified by viewing the minikube dashboard.

3. Updating the Code base

In order to add celery functionality, a few updates are needed to be made to the Django application.

The following requirements file is required to make sure our application works as expected.

celery==4.4.7
Django==3.1.2
django-health-check==3.14.3
django-redis==4.12.1
flower==0.9.5
gunicorn==20.0.4
kombu==4.6.10
psycopg2-binary==2.9.6
redis==3.5.3

We need to add Celery configuration as well as caching configuration. Both should have access to the Redis service that was created which exposes the Redis deployment.

# REDIS
REDIS_URL = "redis://{host}:{port}/1".format(
host=os.getenv('REDIS_HOST', 'localhost'),
port=os.getenv('REDIS_PORT', '6379')
)
# CELERY
CELERY_BROKER_URL = REDIS_URL
CELERY_RESULT_BACKEND = REDIS_URL
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
# CACHE
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": REDIS_URL,
"OPTIONS": {
"CLIENT_CLASS": "django_redis.client.DefaultClient"
},
"KEY_PREFIX": "example"
}
}

The CELERY_BROKER_URL is composed of the REDIS_HOST and REDIS_PORT that are passed in as environmental variables and combined to form the REDIS_URL variable. The REDIS_URL is then used as the CELERY_BROKER_URL and is where the messages will be stored and read from the queue. Caching uses the django_redis module where the REDIS_URL is used as the cache store.

The <mysite>/<mysite>/celery.py file then needs to be created as is the recommended way that defines the Celery instance. The file should have the following configuration:

# http://docs.celeryproject.org/en/latest/django/first-steps-with-django.htmlfrom __future__ import absolute_import, unicode_literals
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', '<mysite>.settings')app = Celery('<mysite>')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
app.conf.beat_schedule = {
'display_time-20-seconds': {
'task': 'demoapp.tasks.display_time',
'schedule': 20.0
},
}
@app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))

In order to ensure that the app get’s loaded when django starts, the celery.py file needs to be imported in <mysite>/<mysite>/__init__.py file:

from __future__ import absolute_import, unicode_literals# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
__all__ = ['celery_app']

The demoapp/task.py file contains a simple function to display the time and then returns. This is used by celery beat as defined in the <mysite>/<mysite>/celery.py file.

from datetime import datetime
from celery import shared_task
@shared_task
def display_time():
print("The time is %s :" % str(datetime.now()))
return True

Now that the codebase has been updated, the docker image needs to be rebuilt and the tag needs to be updated. Before this happens, make sure the minikube docker daemon is active by running:

$ eval $(minikube docker-env)

The command to build the Django docker image with the updated codebase is:

$ docker build -t <IMAGE_NAME>:<TAG>

The <TAG> parameter should be different from the previous build to allow the deployment to be updated in the cluster.

4. Deploying Django with Celery

Once the changes have been made to the codebase and the docker image has been built, we need to update the Django image in the cluster; as well as create new deployments for the celery worker and the celery beat cron job. Finally the Flower monitoring service will be added to the cluster. However, as we will soon see, the Deployment Controller manifests file for all 4 will be similar where the only difference is the containerPort definition and the command used to run the images. For the sake of this tutorial, the duplication of code will be allowed but in later tutorials, we will look at how to use Helm to parametrize the templates.

The reason separate deployments are needed as opposed to one deployment containing multiple containers in a pod, is that we need to be able to scale our applications independently. Consider the following scenarios:

  • There is a high front end traffic and low asynchronous tasks, this means our django web application replica count will increase to handle the load while everything else remains constant.
  • Other times the asynchronous task load might spike when processing numerous tasks while the web requests remain constant, in this scenario we need to increase the celery worker replicas while keeping everything else constant.
  • The flower monitoring tool and the cron job usually have a much lower load so the replica count will remain low.

The Django image in the cluster needs to be updated with the new image as well as passing the now required REDIS_HOST which is the name of the Redis service that was created. Thus, the Django Controller manifest needs to be updated to the following:

The only update we made to the Deployment manifest file is updating the image and passing in the REDIS_HOST. The deployment is created in our cluster by running:

$ kubectl apply -f django/deployment.yaml

The result can be verified by viewing the minikube dashboard.

The celery worker manifest file is similar to the django deployment manifest file as can be seen below:

The only difference is that we now have a start command to start the celery worker as well as we don’t need to expose a container port as it’s unnecessary. The deployment is created in our cluster by running:

$ kubectl apply -f django/worker-deployment.yaml

The result can be verified by viewing the minikube dashboard.

To have a celery cron job running, we need to start celery with the celery beat command as can be seen by the deployment below.

The deployment is created in our cluster by running:

$ kubectl apply -f django/celery-beat-deployment.yaml

The result can be verified by viewing the minikube dashboard.

The flower deployment needs to be created to enable Flower monitoring on the Celery Kubernetes cluster, the Deployment manifest is:

Similar to the Celery deployments, it has different command to run the container. In addition port 5555 is exposed to allow the pod to be accessed from outside. Some environmental variables which are not necessary are removed, however the REDIS_HOST is still required. To prevent an overuse of resources, limits are then set. The deployment is created in the cluster by running:

$ kubectl apply -f flower/worker-deployment.yaml

The result can be verified by viewing the minikube dashboard.

The flower deployment exposes the container on port 5555, however this cannot be accessed from outside the pod. To allow for internet access, a service needs to be created by using the following manifest file:

The service is created in the cluster by running:

$ kubectl apply -f flower/service.yaml

The result can be verified by viewing the minikube dashboard.

5. Results

To confirm the celery worker and cron jobs are running, the pod names need to be retrieved by running:

$ kubectl get podsNAME                             READY     STATUS    RESTARTS   AGE
celery-beat-fbd55d667-8qczf 1/1 Running 0 5m
celery-worker-7b9849b5d6-ndfjd 1/1 Running 0 5m
django-799d45fb77-7cn48 1/1 Running 0 22m
flower-f7b5479f5-84dhc 1/1 Running 0 19m
postgres-68fdbc869b-rmljm 1/1 Running 3 4d
redis-847499c948-jfvb6 1/1 Running 0 8m

To view the results of the cron job i.e. celery beat:

$ kubectl logs celery-beat-fbd55d667-8qczf[2018-01-22 16:51:41,132: INFO/MainProcess] beat: Starting...[2018-01-22 17:21:17,481: INFO/MainProcess] Scheduler: Sending due task display_time-20-seconds (demoapp.tasks.display_time)[2018-01-22 17:21:17,492: DEBUG/MainProcess] demoapp.tasks.display_time sent. id->4f9ea7fa-066d-4cc8-b84a-0231e4357de5[2018-01-22 17:21:17,493: DEBUG/MainProcess] beat: Waking up in 19.97 seconds.

This shows the periodic tasks are running every 20 seconds, and pushes the tasks to the Redis queue.

To view the results of the worker:

$ kubectl logs celery-worker-7b9849b5d6-ndfjd[2018-01-22 16:51:41,250: INFO/MainProcess] Connected to redis://redis-service:6379/1[2018-01-22 17:21:37,477: INFO/MainProcess] Received task: demoapp.tasks.display_time[4f9ea7fa-066d-4cc8-b84a-0231e4357de5][2018-01-22 17:21:37,478: WARNING/ForkPoolWorker-1] The time is 2018-01-22 17:21:37.478215 :[2018-01-22 17:21:37,478: INFO/ForkPoolWorker-1] Task demoapp.tasks.display_time[4f9ea7fa-066d-4cc8-b84a-0231e4357de5] succeeded in 0.0007850109977880493s: True

The cron job tasks are then received where the relevant function is run, in this case it’s the display_time command.

To confirm that all the health checks are okay:

$ minikube service django-service

This should open a new browser tab where the following output displayed by the django-health-check library.

To make sure the celery flower dashboard is running:

$ minikube service flower-service

This should open a new browser tab where the following output is displayed:

6. Conclusion

A lot of ground has been covered in this tutorial i.e. creating a Redis deployment, running asynchronous task deployments in Kubernetes as well as implement monitoring.

The next tutorial will focus on deploying the cluster to AWS using Kops.

If you have any questions or anything needs clarification, you can book a time with me on https://mbele.io/mark

7. Credits

8. Tutorial Links

Part 1, Part 2, Part 3, Part 4, Part 5, Part 6

Ask me anything or request a 10 minute video call on https://mbele.io/mark