Tidy Cloud AWS #50 - To restore files and faith

Hello all!

Welcome to the next issue of the Tidy Cloud AWS newsletter! In this issue, I rant a bit about a recent experience with restoring individual items from a backup of S3 in AWS Backup.


AWS Backup bad experience

Recently, I had a task to write a guideline for how to make a file/object restore from a backup made with AWS Backup, for S3. You may ask, why would you need to back up S3? The durability is super great, and you can have versioning enabled for multiple copies!

All that is true. However, making a restore with plain AWS tools (AWS Console or AWS CLI) is not a joyful experience. AWS says the experience using AWS Backup is better, so that sounded like a good peace of mind if something goes wrong.

The project set up backup vault and plans via AWS CDK. Backups were running just fine. Now was the time to do a restore and document how that is done!

In the AWS Backup service view, you pick the backup vault to do a restore from, and then select a recovery point, and select the Restore action.

In the view that is now presented, you can select whether to restore the whole bucket, or restore individual items. If you choose individual items, you will see a text input field to enter an S3 URI into. There is no option to browse the content.

You can enter up to 5 URIs, and a URI can refer to a directory.

Next, you can select an S3 bucket to restore to which can be the original one, or a different bucket. Here, you actually have a drop-down-list of buckets.

For encryption of the restored data, you can choose to use the same encryption key as the original, or a different one.

You also need to pick an IAM role that AWS Backup shall assume to run the restore job with. The default AWS Backup role does not work for our setup, but we had created a suitable role in AWS CDK. However, not so easy to spot the name though, given that the CDK-generated names of resources are messy.

The restore

Finally, after entering all the data, it is time to start the restoration job. The object in S3 that I had picked was about 10KB in size, so I thought it should hopefully be quick. I was wrong…

The restore job took about 27 minutes to complete! I looked in the destination bucket, but the object was not there… I looked in the restore job itself, but there was no information about what had been restored, and no sign of any error.

I tried again, double-checking the name of the object, and entering the data again. This time the job only took 8 minutes, but there was no restored file in the destination bucket.

Next, I entered an S3 URI for a directory as an item to restore, and kicked off another restore job. This time it took 24 minutes to complete, and when I looked into the destination bucket, I could see the expected directories and objects! Success!

For my final test, I restored a complete bucket to another S3 bucket. This took 6 minutes, with maybe somewhere between 50 and 100 GB data.

Why things suck?

After these tests I thought that there must be a better way to do this restore. Why do AWS have a user interface that is so horrible?

I checked the boto3 documentation (the AWS Python SDK) to see what could be done. Perhaps not so surprising, the APIs provide no insight into the contents of a recovery point and you can provide up to 5 URIs to restore individual items.

So the user interface in the AWS Console just reflects the limitations of the API itself. This API is not specific to S3 either. I assume the API was not originally intended for a restoration of single files or small groups of items, and that has been bolted on the original interface.

Since the whole bucket restore actually was faster than single object or single directory restores, my guess is that they may do a complete restoration of one or more snapshots under the hood, then extract the individual items. Since snapshots likely comprise a full snapshot and a certain amount of delta snapshots, the restore time may depend on the amount of snapshots they need to process.

Also, my second attempt at extracting an individual item was significantly less slow than the first attempt, which may show that there may be some caching involved.

Customer obsession?

AWS prides itself with being a customer obsessed company, meaning that they listen to their customers for deciding what to build. They are also very keen on backward compatibility and seldom deprecating or shutting down a service or API.

These are good things. All the service teams also seem to work independently of each other, which makes them move pretty fast. It also results in inconsistent experiences.

S3 backup and restore of individual items may be a checkbox item that some customers needed for compliance reasons, and they may not even care that much. Since S3 is extremely durable, how many have bothered to check that they can restore items from an S3 backup? Probably not enough to complain to AWS.

If this is something you care about, let AWS know about it! If not, tick that checkbox, sigh, and move on, or use something else.

I do not think AWS has a very good track record of good usability experiences for the user. Somehow that does not seem to make it to the top of the customer obsession priorities most times. It makes for a large ecosystem of companies selling 3rd party tools and consultancy to make up for that shortcoming.

You can find older newsletters and more at Tidy Cloud AWS. You will also find other useful articles around AWS automation and infrastructure-as-software.

Until next time,