How to Migrate Data Between S3 Buckets Using RCLONE

Learn how to efficiently migrate data between S3-compatible buckets using rclone. This step-by-step guide covers installation, configuration, syncing files, and secure deletion for seamless S3-compatible storage transfers.

Background

When it comes to managing S3-compatible buckets ("S3 bucket[s]" henceforth), there's only so much you can do through your cloud provider's UI. I would go so far as to say there's only so much you'd want to do through the UI even if the capabilities exist. Considering the sensitive nature of storage operations, especially for resources in production, you want to perform these operations in the most stable and predictable environment possible.

That's where shell utilities come into play. Commands are strictly defined and (hopefully) well documented. You know what you're going to get without having to assume the abstracted UI workflow will perform how you expect it to.

The shell utility I prefer for S3 buckets is rclone. The tool has a comprehensive suite of commands covering the vast majority of what you're likely to need. It's easy to configure and get up and running quickly. And, the commands are simple to use and well documented.

What follows is a discussion on how to configure the utility, sync/migrate files between buckets and how to purge orphaned files once your operations are complete.

Shell Environment

I'm on Mac. Certain commands will be specific to macOS/Zsh.

Configuration

The first thing you'll need to do is acquire an API key for use with rclone. Make sure you configure this key with sufficient privileges for the buckets you need to work with and the operations you need to perform.

I'm setting up a full-access API key with privileges across buckets to make it easy. When the work I need to do is complete, I'll delete this key. If you're unsure on how to generate API keys for storage buckets, refer to your cloud provider's documentation.

With my key generated and in-hand, I'll move over to my shell, Zsh in my case since I'm on Mac. From shell, either download the program or make sure it's up-to-date if already on your system.

# obtain rclone

# first make sure brew is up-to-date
brew update

# see if rclone already exists on your system
rclone --version

# if not, install it
brew install rclone

# if it is, update it
brew upgrade rclone

For more information on installing rclone, installation in other environments, etc., refer to their docs on installation.

Next, we'll set up rclone to connect with our cloud provider. We'll need to create a config file. To do this, I prefer to use nano. In my case, I'm using Digital Ocean Spaces Object Storage. Where you see references to Digital Ocean, replace with your cloud provider's details if different.

# rclone config

nano ~/.config/rclone/rclone.conf

# enter the following replacing placeholders as necessary
[spaces-<bucket_region>]
type = s3
provider = DigitalOcean
env_auth = false
access_key_id = <your_spaces_access_key>
secret_access_key = <your_spaces_secret_key>
endpoint = <bucket_region>.digitaloceanspaces.com
acl = private

Prefix

Note, I'm using a "spaces-" prefix on my region names. This is just to indicate this is a Digital Ocean resource. Modify as necessary for your specific cloud provider.

Repeat this process for however many regions you'll be accessing.

Now exit nano with Ctrl + X and enter Y > Enter to save.

Since this file grants permission to our resources, let's make sure only we have access.

# restrict permissions for your config file

chmod 600 ~/.config/rclone/rclone.conf

This command grants the owner (us), read and write permissions (4 [read] + 2 [write] = 6).

Assuming our configuration was entered without errors. We're now ready to begin working with our cloud resources.

Inspect Data

Before we begin performing operations across our data, it may be wise to inspect what resources exist where. I like to cross-reference what's presented through my cloud provider's UI against what's returned through shell commands just to ensure things are how I expect them to be. Working with cloud storage can be very unforgiving. Move slowly and thoughtfully.

First, let's take a look at what remotes we have configured.

# list what remotes you have in your config file

rclone listremotes

We'll see as output the labels associated with each region entered (e.g., spaces-<bucket_region>). Now let's see what buckets exist in those regions with rclone lsd.

# see what buckets exist at a particular region

rclone lsd spaces-<bucket_region>:

Make sure the colon is present or the command will fail. You can take a look within a particular bucket by appending its name to the far side of that colon.

# inspect the contents of a particular bucket

rclone lsd spaces-<bucket_region>:<bucket_name>

No trailing colon needed this time. And we can further inspect within directories.

# inspect the contents of a particular bucket directory

rclone lsd spaces-<bucket_region>:<bucket_name>/<a_particular_directory>

If we're fully satisfied that the data we're looking to work with exists where we expect it to, then we're ready to proceed with operations.

Transfer Data

The command I find myself using most often is rclone sync. Whether I'm taking a snapshot or migrating data due to a restructuring, this is generally what I'll use. This command doesn't come without its hazards, however.

Potential for Data Loss

The "sync" command updates the destination to match the source. If there are existing resources, they will be destroyed. You should always test this operation with the "--dry-run/-n" OR "--interactive/-i" flags.

Because sync updates the destination to match the source, existing data will be overwritten. If this is a potential issue, consider using the copy command instead, which will preserve existing data in the destination.

Let's give sync a try.

# sync files between buckets

rclone sync spaces-<origin_bucket_region>:<origin_bucket_name> spaces-<destination_bucket_region>:<destination_bucket_name>

This operation will take a bit of time depending on how many resources are being relocated. Just like when we inspected data in the previous section, this command can also be performed deeper at the directory level instead of just the bucket level as demonstrated above.

Cross-Cluster Copy Issues

Note: Some S3-compatible providers, such as DigitalOcean Spaces, do not support server-side copy operations (e.g., CopyObject), even within the same region, because buckets may be treated as "cross-cluster" internally. It works for operations within the same bucket, and cross-region transfers default to client-side anyway. To work around this, add the "--disable copy" flag to your command, forcing rclone to handle the transfer client-side by downloading from the source and uploading to the destination. Note that this may consume more bandwidth and time compared to server-side operations.

Next, let's take a look at how to destroy data.

Destroy Data

First, let's take a look at rclone delete.

# delete bucket

rclone delete spaces-<origin_bucket_region>:<origin_bucket_name>

The delete command will remove files within a particular path not otherwise excluded by the --exclude filter and similar flags. Additionally, if you supply the --rmdirs flag, the command will remove empty directories but preserve the root.

If you need to destroy the entire bucket, then you're going to need more fire power. The command rclone purge is what you're looking for.

# the nuclear option

rclone purge spaces-<origin_bucket_region>:<origin_bucket_name>

As with sync, I also perform a dry run using the --dry-run/-n flags before performing this operation.

Use Caution With Delete/Purge

Delete/purge operations are often permanent unless versioning, recovery, snapshots, etc., are explicitly enabled. Only run these operations once you're entirely sure the origin resources are no longer needed and the destination has been thoroughly tested.

With our files transferred and the origin destroyed, we can remove the configuration.

Cleanup

I'll now delete my full-access API key through my cloud provider's UI since this key poses a significant security risk given its broad permissions. I'll also delete the config file from my local machine I used with rclone. With the key no longer in existence, there's no reason to keep it around.

# delete rclone config file

rm ~/.config/rclone/rclone.conf

If I were to keep both the key and config file alive, I would absolutely encrypt these files using something like GPG. However, I never do because it just doesn't take that much time to configure each time I need to perform operations like what's discussed here.

Other Considerations

In the workflows demonstrated here, I first performed copy operations followed by delete operations to move resources from one location to another. "But isn't there a command move for this exact situation!?" Yes, and please don't yell.

Using move would be roughly equivalent to sync/copy followed by delete. Again, it's real up there in the cloud. Mistakes can happen and sometimes you can't get yourself out of the hole you've dug. I prefer to execute my operations step-by-step and be able to inspect the destination before destroying the origin. That's simply my preference.

Final Thoughts

Every now and again we find ourselves needing to perform manual operations on our cloud storage. Perhaps we're migrating existing data to a new location or maybe we're taking a snapshot of a bucket to create a backup copy. Either way, the utility rclone is a great tool for these scenarios. Finally, be sure to move carefully and deliberately for some things cannot be undone.

How to Migrate Data Between S3 Buckets Using RCLONE

Background

Shell Environment

Configuration

Prefix

Inspect Data

Transfer Data

Potential for Data Loss

Cross-Cluster Copy Issues

Destroy Data

Use Caution With Delete/Purge

Cleanup

Other Considerations

Final Thoughts

Details

Topics

Tags

Next

How to Set Up Django with Separate S3 Buckets for Media and Static Files