boto-rsync – Limitations and Workarounds

boto-rsync is a great tool for interacting with object storage systems like S3, but it’s not without limitation. We all know about the 5GB limit for a single PUT, which isn’t a problem for clients that can handle multipart upload. Sadly, boto-rsync doesn’t handle that, and until someone patches it, we need a way to break up large objects. This can crudely be done with split:

This disadvantage to this is that retrievals need to manually be catted together, which obviously isn’t always a good solution.

boto-rsync’s other weakness is in handling UTF8 filenames. Improperly-encoded filenames will throw a 400 Bad Request and cause the script to choke and die, rather than gracefully skipping the failing file and moving on. Re-encoding files with proper UTF8 fixes this:

Not pretty, but it works. Note that directories need to be checked and renamed first before handling files specifically.

UPDATE – These issues have both been addressed in https://github.com/dreamhost/boto_rsync