Background
I have two applications that live in separate resource groups (a collection of app servers, database servers, object storage, and supporting resources). After evaluating the resource demands of these applications, I determined both could live in a single cloud resource group with name spacing rather than independent projects.
Typically, I'd dive right in and start standing up components. In this case, however, a little more discretion is needed since these are public-facing apps in production. These are my notes.
Specifically, I need to:
- Dump the existing db, provision a new db and populate the new db with the data,
- Migrate object storage not related to app-specific static files (e.g., dynamic user content) to a new bucket,
- Review, restructure my CI/CD workflows and update actions secrets as necessary,
- Provision a new namespace on the app server group and standup deployments, services, ingress and a certificate,
- Test deployments were successful,
- Destroy old resources (so we're no longer billed, e.g., the whole point of this exercise).
The items mentioned aren't perfect chronological steps nor do I imagine it's an exhaustive list. But they should be pretty close. I'll probably bounce around as necessary. Instead, think of these items as a rough sketch of the high-level tasks that lay ahead.
And because this discussion is high-level, not every step will be acknowledged, nor every shell command identified. Project-specific tasks and configurations would be too specific to be useful. What I'm looking to accomplish is establish a broad enough roadmap that would allow me to configure a deployment of a dissimilar app in a mostly similar fashion.
Time to get to work.
Terminology
I'm using "resource groups" to refer to a set of cloud resources which is common terminology for Azure and AWS. The cloud used in my examples is Digital Ocean which uses the term "projects" instead. When you see one or the other, know I'm using these terms interchangeably.
Database
Migrating a database often seems more daunting than it is. While there's plenty of tools and documentation around the job, the stakes are high so a little hesitation is reasonable. We can approach this a couple of different ways. First, we can just dump and restore the database, schema and all, with pg_dump
and pg_restore
respectively for PostgreSQL databases. Alternatively, we can separately configure the destination database and transfer only the data.
Personally, I always prefer manual steps through shell where I have granular control, when given the choice. I'm going to provision the replacement database on the destination cluster as if I'm starting a new project. This way I'll know for sure that roles, permissions and so forth are structured exactly how I'd like them to be. The one exception to this is creating high-level objects where there isn't much advantage to using shell.
To begin, I'm going to create the database instance on the database cluster as well as my user accounts using the Digital Ocean UI. I'll create two users: a limited user that the application will use for reads/writes and a more privileged user for table creations, mods and drops. The naming convention I use for these resources is <project_name>_db
, <db_name>_user
, and <db_name>_migrator
respectively.
SQL Naming Coventions
Underscores are used in the mentioned resource names. Names in SQL generally consist of letters, numbers and underscores and begin with a letter. Avoid special characters including dashes. Names with special characters often require being wrapped in quotes to perform operations. Refer to your SQL database documentation for a list of special characters and the recommended treatment.
When each user account is created, Digital Ocean will generate a password. Copy database credentials over to whatever password/secrets vault you're using for your project. In my case, I'm using GitHub actions workflows in my CI/CD pipeline. Therefore, I'll copy the usernames and respective passwords over to GitHub as they're created.
Since I'm here, I might as well add the rest of the database configuration to my secrets vault which will be later used as environment variables. Here's an example of what those key/value pairs may look like.
DB_REQUIRE_SSL=true
DB_NAME=<db_name>
DB_HOST=<db_host>
DB_PORT=5432
DB_USER=<limited_user>
DB_PASSWORD=<not-a-secure-password>
DB_MIGRATOR=<privileged_user>
DB_MIGRATOR_PASSWORD=<not-a-secure-password>
Docker PostgreSQL
If you're using the official Docker PostregSQL image, your environment variable keys need to be adjusted. For example, "DB_PASSWORD" should instead be "POSTGRES_PASSWORD." See official documentation at hub.docker.com/_/postgres for more information.
I'll go into managing secrets and environment variables into a bit more detail in the section "CI/CD Workflows."
Last, I want to configure my database, create and update roles, and establish the minimum necessary permissions to run my project. To make these changes to the database, I'll establish shell access using psql
. Your credentials are available through the Digital Ocean UI.
What your permissions and so forth should look like is relative to your specific project. If you're looking for guidance on how to do this for a Django application, check out this article Postgresql: Better security for Django Applications. Again, the objective is to establish the least necessary permissions to get the job done.
Lock down Database Connections
Make sure you lock down your database to only accept connections from origins that needs access (application cluster, machines you need shell access with, etc.)
With these steps complete, we now have a database our application is able to connect to once deployed. It still doesn't have any tables specific to our project in it, and thus no data, but we'll address that in a later. The reason being, I want to create my tables using the migrations files from my Django application. To do this, I need a running pod to execute migrate
.
Migrate Complete Database
As mentioned earlier, we can just migrate the entire database including schema, table structures, and so on, in addition to data. While the multi-step process is my preference, migrating the entire database is much simpler and less work. If you're very confident there's no database-level changes to be made and everything exists perfectly how you'd like it to be from the source, then consider the following approach.
Assuming you have the PostgreSQL shell utilities already installed on your machine, we'll first create a dump file using pg_dump
. Two things to note here. First, the client version must match the database version on the cluster. If your machine is running PG 14 but the cluster is running PG 15, the command will fail. Second, pg_restore
requires a non-text archive file. The command will fail with plain text SQL files, the default output of pg_dump
.
Accommodating for the above, we can dump the entire database with the following:
# dump database
pg_dump -Fc -f data.dump "<connection_string>"
The "-Fc" flag is short for --format=custom
and outputs the dump into the PostgreSQL archive format that pg_restore
expects. The "-f" flag identifies the output file of "data.dump" which is currently set up to write to the current working directory of the local machine. Make sure there aren't bandwidth limitations that will affect the transfer or you should consider standing up a pod in the destination cluster to perform the operations from. Last, the connection string contains your credentials for accessing the source database as well as identifying the database.
Next, we'll restore from the dump file.
# restore database
pg_restore -d "<connection_string>" data.dump
The connection string in this case is for the destination database. The "data.dump" file is the output file created in the previous step. These two operations alone should have your database ready to go.
Migrate Object Storage
Next up on the to-do list, I need to move objects between buckets. While Digital Ocean currently supports moving objects within buckets (e.g., between folders), you can't move objects between separate buckets in different regions using the UI. Instead, we'll do this from shell.
Is Moving Buckets Necessary?
Object storage lives outside of your resource group to begin with so moving buckets isn't strictly necessary. To be thorough, I'm going to demonstrate this process but just know you could just as easily continue using the same bucket(s) in these and similar situations.
Before starting, I want to identify what it is that I want to move. Primarily, I want images and resources accumulated through engagement with the system. That is, user and admin generated content.
While there's quite a number of static assets related to the application itself (e.g., CSS, JavaScript files, and so on.) since my deployment is stateless, these files also live in the codebase and therefore can be easily re-established. I'm looking for the stuff that's irreplaceable.
To move files between buckets I'm going to use rclone
. This is a shell utility that allows you to work directly with your cloud storage resources. My article How to Migrate Data Between S3 Buckets Using RCLONE goes into detail on how to configure and work with this tool if you're unfamiliar. Specifically, the command we're looking to run to transfer files between buckets is as follows:
# sync files between buckets
rclone sync spaces-<origin_bucket_region>:<origin_bucket_name> spaces-<destination_bucket_region>:<destination_bucket_name>
Please note, this will only copy the contents. If you're interested in removing them from the origin, you'll need to delete them as well.
# delete bucket
rclone delete spaces-<origin_bucket_region>:<origin_bucket_name>
This can be refined to a single step with move
. Contents will be transferred and won't persist in the origin directory. However, I prefer the discrete steps I mentioned above. That way, if I make an error along the way, I can sleep well knowing the files are still unaffected in the origin bucket. Only after I've thoroughly tested the destination resources (double checking object count to make sure the transfer was successful, and so on.) will I go back and delete the origin resources.
With those operations out of the way, we should have our resources in the correct destination. If you're interested in how to set up cloud storage for a Django-specific application and how to serve those resources over a CDN, check out my article How to Set Up Django with Separate S3 Buckets for Media and Static Files.
The last thing I'll do while working with object storage is set up an API key more appropriate for production. This key should again have the least privileges required to run the application and be restricted to only the project's bucket. With those credentials generated, I'll go ahead and add those to my secrets vault. We need the access key and the secret key. I'll use key/environment variable names in this situation like:
STORAGE_ACCESS_KEY=<access_key>
STORAGE_SECRET_KEY=<secret_key>
API Key Security Risk
Do not use the full access key used with shell operations. Instead, create a key with only the minimum necessary permissions and privileges and scoped only to the relevant buckets.
With that, we have storage set up for our application.
Kubeconfig & Namespace
Before we jump into CI/CD workflows, I'm going to pull forwards our objective of provisioning a namespace on the destination resource group. In Kubernetes, this is relatively easy. First, we need to navigate over to the Digital Ocean UI and download our kubeconfig.yaml
file.
The config file is what grants us access to the resource group. Treat this as carefully as you would with any API key. Personally, I store this file along with my other Kubernetes configurations. The difference is, I exclude the file from version control. Pushing an unencrypted config file to a remote repo is a very bad idea from a security perspective. You can always download it again.
I keep mine on my local machine only and explicitly exclude it from version control in .gitignore
. Furthermore, I encrypt it using GPG. When I need to work in a cluster, I'll decrypt the file. And when I'm done with whatever operations, I re-encrypt and then clear the environment variable KUBECONFIG
.
Next, we need the Kubernetes utility that allows us to work with our resources remotely. This is of course kubectl
. If not already installed, refer to the docs for comprehensive instructions on how to install. For my macOS machine, I'll double check its installed (it is, I use it everyday - but you know, for the sake of this article), install it if not, and make sure it's up-to-date.
# install/update kubectl
# make sure brew is up-to-date
brew update
# check if kubectl is installed
kubectl version --client
# if it is, let's run update to make sure it's the latest package
brew upgrade kubectl
# and if not, install it
brew install kubectl
With our config file downloaded and kubectl
installed, we're finally ready to create that namespace. All we need to do is set the environment variable KUBECONFIG
that the kubectl
utility depends on and we're able to run our "create namespace" command.
# export environment variable and create namespace
export KUBECONFIG=~/path-to-config/kubeconfig.yaml
kubectl create namespace <namespace-name>
Let's confirm the resource was established.
# confirm namespace was created
kubectl get namespace
After running this command, you should see your newly created namespace listed in the output.
The "kubectl get" command is semantic meaning that "kubectl get namespaces" (plural) works in addition to "kubectl get namespace" which is the singular and correct form of the resource. The works the same with "get pods" and so on.
We're now ready to update our workflows.
CI/CD Workflows
With some of the structural components out of the way, we'll now turn our attention to workflows. CI/CD workflows are generally project specific so I'm not going to go into detail on this subject. Instead, I'll point out what I'm looking for needing updating.
First, I want to update any references to namespaces in my project. It's possible the same namespace was used on the origin cluster that will be used on the destination cluster. Even for clusters hosting a single application I'm still going to namespace my app's deployment to provide separation from other supporting services and to give greater control over security. I wouldn't deploy a project to the default namespace.
But let's assume for conversation the namespaces are different. This is why I established the destination namespace in the previous step. We'll update any references to namespace using this value.
Next up and most likely to be affected, we need to update references to the cluster. Workflows will either have variables or hard-coded cluster references that need to be updated to the destination resource.
Last and somewhat project-specific, I'm going to ensure that any remaining environment variables my project depends on are entered into my secrets vault so I can utilize a workflow to get the secrets into the cluster's namespace. Check out my article Securely Update Kubernetes Secrets with Manual GitHub Workflows for ideas on how to accomplish this.
Standup Deployments, Services, Ingress & Certificate
With our database, storage, namespace, and workflows configured, the infrastructure exists to finally deploy the project. I like to keep all my yaml configuration files in a repository specific to the cluster but separate from the project. If multiple projects exist, I create directories for each project's namespace. For example:
.
├── .kube/
│ └── kubeconfig.yaml
├── project-1/
│ ├── app-certificate.yaml
│ ├── app-deployment.yaml
│ ├── app-ingress.yaml
│ └── app-service.yaml
├── project-2/
│ ├── app-certificate.yaml
│ ├── app-deployment.yaml
│ ├── app-ingress.yaml
│ └── app-service.yaml
├── ingress-nginx/
│ ├── charts/
│ │ └── ...
│ ├── configmaps.yaml
│ └── values.yaml
└── .gitignore
From the project's directory, I'll deploy each resource using:
kubectl apply -f <resource_filename> -n <namespace>
Generally, I start with deployments and work backwards. Next I'll deploy the services followed by the project's ingress configuration. Last, I'll deploy the certificate after ensuring DNS is configured and live.
But first, we need an image to deploy.
# build image
docker build . -t <tag_name>:latest
# log in to image repo, e.g., Digital Ocean
doctl registry login
# or your provider
# tag image for repo
docker tag <tag_name>:latest <image_repo_location>/<image_repo>/<tag_name>:latest
# push to repo
docker push <image_repo_location>/<image_repo>/<tag_name>:latest
One the image is on the image repository, we're ready to deploy. When I'm deploying resources, I like to watch that the pods are starting up without any issues. In a separate shell, I'll run:
# watch pods
kubectl get pods -n <namespace> -w
The "-w" flag for "watch" allows me to ensure each pod is running as expected or address any issues if pods are crashing before continuing on deploying resources.
Migrate Data
At this stage, our application should be running and exposed to the outside world. The only problem is that there isn't any data. I'll now grab the data from the origin database. This will look substantially similar to dumping and restoring the entire database expect this time we'll use the "--data-only" flag.
# dump database
pg_dump -Fc --data-only -f data.dump "<connection_string>"
# restore database
pg_restore --data-only -d "<connection_string>" data.dump
We should now be up and running. But let's say I want to make a change to the data - risky, but there's situations where this is necessary. One such scenario, image we changed the schema name between the origin and destination databases. It may be easier to modify the data in plain text SQL form.
To accomplish this, I'll still use pg_dump
but I'll drop the "-Fc" flag. Additionally, I'll change the file extension of the output file to ".sql" to more appropriately align with what's generated.
# dump database
pg_dump --data-only -f data.sql "<connection_string>"
With an SQL file, we can no longer use pg_restore
. Fortunately, good old psql
handles SQL files without much hassle.
# restore database data
psql "<connection_string>" < data.sql
And with that, our application migration is complete.
Last Steps
Now comes the fun part: testing. I thoroughly inspect all data transferred be it my database, object storage, whatever. I check object counts, I cherry pick specific resources and inspect its contents. This stage is really an optimization across the sensitivity of the data and the value of your time. If the data is low stakes, perhaps a bit of high-level testing is sufficient. If the data is mission-critical, maybe you should spend some time in the weeds ensuring the migration was successful.
Once we're comfortable the migration was successful, and only then, we can go back to our cloud provider and destroy the origin resources. For object storage, I'm going to do this through shell using rclone
. For resources like clusters, the UI is probably sufficient. Be sure to move slowly and methodically during these operations. There's no rush and certain things cannot be undone. Destroying the wrong resource could really ruin your day so be careful.
Final Thoughts
Migrating a project takes a bit of work and a bit of focus to make sure the process is done correctly. I like to clear my day for jobs like this. The process should't take a day, really it shouldn't take more than a couple hours, but I want to make sure my attention is completely directed on the job at hand to avoid missteps.
Here we looked at the many steps involved in the process from database and object storage migration, to configuring resources and workflows, to finally standing up deployments in the destination cluster. While migrating projects isn't the most common job in devops, it can be a necessary one when there's opportunities to more efficiently utilize resources.