Using rsync to backup from OSX to linux

Warning; The discussion below relates to my attempts to use rsync to back up a directory on a MAC to a directory on a remote linux host. I never worked out how to get it to back up 100% of all the unusual metadata that MAC filesystems include, but it works well enough for my purposes.

I’ve been using rsync for quite some time to backup directories between linux and other types of Unix systems. It works great. It’s often as simple as;

rsync -av -e ssh source_dir user@remotehost:destdir

Little did I know, rsync to/from a Mac to a non-Mac unix is actually quite complicated. Unlike most other Unix filesystems, the HFS+ filesystem that Macs use includes a bunch of extra ‘metadata’ such as resource forks, finder-flags, finder-locks, a special type of creation-date, bsd-flags etc. Obviously, if you’re copying a Mac directory elsewhere for backup purposes you’d like to be able to restore all your files plus this ‘extra metadata’.

Apparently the oldish version of rsync (v2.6.9 on my OSX 10.5.7) has the Apple specific -E special option to allow it to capture this extra ‘metadata’, but it only really works if you’re copying to another directory on the same server, or to another Mac running the same modded Apple rsync. Apparently, the more recent versions of rsync (latest is 3.0.6) include features that allow them to save/restore most of the OSX specific ‘metadata’.

I ended up installing the latest version of rsync on my macbook using macports. Interestingly, it is v3.0.5 plus some additional patches for OSX (fileflags and crtimes patches). It installs under /opt/local/bin. I also installed backup-bouncer which can create an OSX volume containing a set of test files, each with some of the more unusual features of OSX. You then use your directory copying program to copy the test files elsewhere and then use backup-bouncer to check if all the OSX metadata were copied across.

Using the macports rsync, it is really good at rsyncing between two local directories or to a directory on another Mac. For example, if I do the following, backup-bouncer’s verify tells me my ‘testdir’ passes all tests;

sudo /opt/local/bin/rsync -av --crtimes --hard-links --acls --xattrs --fileflags /Volumes/Src testdir

However, I can’t do that when transferring to a non-mac. rsync generally requires that the version of rsync at the destination end also understands your command line arguments. I attempted to compile rsync on my Debian Lenny server using the same build technique the macports version uses (ie. with the crtimes and fileflags patches). This does not work, since those two extra patches require a bunch of OSX specific header files. So that left me with compiling the normal rsync 3.0.5 from http://www.samba.org/rsync/download.html. That compiled fine on my Debian box, and I ended up copying it over the /usr/bin/rsync that I already had on the server.

So I couldn’t use the crtimes or fileflags patches. Backup-bouncer doesn’t class those things as ‘critical’, so I thought from a backup point of view, I’d leave them out. So I tried this to copy data to the linux box ‘host’

/opt/local/bin/rsync -av --hard-links --acls --xattrs -e ssh /Volumes/Src user@host:/backup

That also did not work. The /backup filesystem on the linux host was ext3, but you need to use additional mount options if you want to use extended attributes or ACLs, so I remounted my backup partition;

umount /backup
mount -t ext3 -o user_xattr,acl /dev/hdc1 /backup

Now, it works a bit better, but a lot of stuff still fails. The basic problem is that there are still a bunch of OSX specific metadata bits that rsync on the Mac can read, but the linux end does not know how to store them. For this, rsync has the –fake-super option. It basically encodes the extra stuff into the extended attributes on the remote system.

I finally ended up with the following that MOSTLY works. It still fails on some of the non-critical backup-bouncer tests. It also cannot handle large amounts of metadata

  • Back up a directory from the Mac using;
    /opt/local/bin/rsync -aHv --acls --xattrs -e ssh  \
    --rsync-path="rsync --fake-super"  source_dir user@linuxhost:/backup/latest
  • Restore a directory using.
    /opt/local/bin/rsync -aHv --acls --xattrs -e ssh \
    --rsync-path="rsync --fake-super"   user@linuxhost:/backup/latest restore_dir

One thing I have read is that the size that ext3 allows for extended attributes is quite small, and if you have a lot of metadata for a file, then this small space can fill up. I’ve noticed this with backups of photos. The resource fork info for each photo is quite substantial and I get alerts like this when I run rsync;

rsync: rsync_xal_set: lsetxattr("Pictures/iPhoto Library/Originals/2008/8:11:2008/IMG_0023.JPG",
"user.com.apple.ResourceFork") failed: No space left on device (28)

On OSX, you can examine attributes with the OSX xattr command;

$ cd 'Pictures/iPhoto Library/Originals/2008/8:11:2008/'
$ xattr IMG_0023.JPG
com.apple.FinderInfo
com.apple.ResourceFork
com.apple.metadata:kMDItemSupportFileType
$ xattr -pl com.apple.ResourceFork IMG_0023.JPG

When you enter that last command you end up with a hex dump of the resourcefork thing. My output ends with B0B0 as the last line … which is about 45Kbytes. I think ext3 has a limit of 3.9K for all extended attributes, hence the error. I found this post that suggested using XFS as it can handle much larger extended attribute data (maybe 64K).

I tried XFS, but ultimately gave up on it, since I always had bad delete performance. Now, I’ve settled on reiserfs. It can also handled a large extended attribute set (NB: You have to specify acl,user_xattr with reiserfs, just like the examples for ext3). I still get a few errors with my backups re issues with backing up metadata, but it works well enough for my purposes.