IAN WALDRON IAN WALDRON

Adding a Limited User to Dockerfile Upends My Kubernetes Deployment

Adding a limited user to Dockerfile without considering downstream affects causing unanticipated problems requiring debugging that could have been avoided.
January 27, 2024

Background

I have an app ready for deployment to a Kubernetes cluster. The plan is to first containerize the project with Docker, test the container locally with Docker Compose, and finally deploy to the k8 cluster and expose with a NGINX reverse proxy service. Within each container, I'm running Gunicorn as an application-level WSGI server for multiprocess handling. Without thinking too deeply on the matter, I bind Gunicorn to port 80 to be consistent across configuration layers.

There isn't a need use port 80 with Gunicorn since this isn't the public service but this is a first pass and configuring my deployment. I suspect there's even official documentation advising against using port 80, but I wasn't able to find any mention of it when browsing last. But I wasn't too worried about it since the plan was to return to fine tune and tighten security. 

Deployment

With a good test on Docker Compose, I deployed the project to the cluster. The app was deployed with a LoadBalancer service listening on port 80. The deployment passed initial testing so I began revising my configurations. Additionally, I added Github Actions workflows to the project to manage CI/CD. The workflows will automatically build the container, push to the container to the container registry, and update the image on the cluster

Where the Problems Begin

Returning to my Dockerfile, I add a limited user so I'm not working as root which will tighten security a bit. The addition looked like:


# Dockerfile
...
RUN \
   apt-get update && apt-get upgrade && \
   # project dependencies, etc
   ...
       adduser \
       --disabled-password \
       --no-create-home \
       my-limited-user

USER my-limited-user

Also, my k8 Deployment set's an environment variable that's captured by Gunicorn:


# deployment.yml

...
spec:
   template:
      ...
      spec:
         ...
         containers:
            ...
            - env:
               - name: PORT
               value: "80"

# entrypoint.sh

#!/bin/sh
APP_PORT=${PORT:-8000}
cd /app/
/py/bin/gunicorn --worker-tmp-dir /dev/shm app.wsgi:application \
--bind "0.0.0.0:${APP_PORT}"

Right away I have issues. Fortunately, the issue is really easy to resolve after checking the logs:


[2024-01-28 04:26:21 +0000] [7] [INFO] Starting gunicorn 21.2.0
[2024-01-28 04:26:21 +0000] [7] [ERROR] Retrying in 1 second.
[2024-01-28 04:26:22 +0000] [7] [ERROR] Retrying in 1 second.
[2024-01-28 04:26:23 +0000] [7] [ERROR] Retrying in 1 second.
[2024-01-28 04:26:24 +0000] [7] [ERROR] Retrying in 1 second.
[2024-01-28 04:26:25 +0000] [7] [ERROR] Retrying in 1 second.
[2024-01-28 04:26:26 +0000] [7] [ERROR] Can't connect to ('0.0.0.0', 80)

I can't connect to port 80 because I'm a limited user and not root. In linux, ports under 1024 are privileged. Not being root, I run into a permissions issue while trying to bind to this port. In this case, I had forgotten to change the environment variable 'PORT' to something more reasonable for Gunicorn, such as port 8000. All I need to do to resolve this problem is adjust my environment variable to 8000 as well as the Deployment's containerPort and the LoadBalancer Service's targetPort directives.

Worse, Not Better

With the CI/DI workflows, a push to the main branch will automatically rebuild the container, push it to the container registry, and pull the image into the cluster's pods. I fully expected this fix to deploy without a hitch. I happen to have the Github Actions dashboard up and notice the workflow called by my last push was burning through more minutes than anticipated and still going. Something appears to be holding up the workflow but can't image what the problem could be.

I check on the pods to see if the patch was deployed successfully with kubectl get pods -w. To my surprise, the pods were stuck in a CrashLoopBackOff cycle. It looks like the update was not successful. After checking the logs another time, it's clear that original issue remains unsolved: Gunicorn is trying to bind to port 80 and running into a permissions issue as a limited user. I confirm this issue by logging the port values with echo $PORT and echo $APP_PORT. Both of these logs return 80.

That doesn't make any sense because I'm sure I updated the environment variable PORT to port 8000. Why would Gunicorn be using port 80 still? I then spend considerable time attempting to uncover why my environment variable was being overwritten. Somewhere in my configs or environ secrets there must be something responsible for reverting the port back to 80.

The Fix

After a bit of struggle, I dawns on me to review how I'm handling the container build and deployment in the Github Actions workflow. One thing immediately stands out: I'm using kubectl set image <image> to update the pods' container image. That's an issue because I updated an environment variable in the Deployment. The image was being updated successfully but the environment variables weren't changing as expected. The PORT environment wasn't being overwritten, but rather not being set from the start.

All I needed to do was apply the deployment in order to have the environment variables updated so I ran kubectl apply -f /app/deployment.yml once again. A quick check on the pods showed all running smoothly. Problem solved.

Final Thoughts

Once again, I found myself chasing solutions to avoidable problems. I'd like to say there's a lesson learned in this story; that better planning will occur in the future. But, that's not the case. I continue to makes these mistakes and often. On the bright side, debugging problems forces time spent in the docs, forums etc., and I find myself more knowledgeable on my stack afterwards.