Rotating Backup Directories using cp -al (hardlinks) to Save Disk Space
The copy command “cp -al” found on all versions of Unix/Linux creates what’s called a “hard link” to a file. The nice thing about this command is that it doesn’t create an actual copy of the file on disk – instead, it creates a “link” or pointer to the file data on the disk. Basically a “snapshot” of that directory in-time. The net result is that you can have 10 “copies’ of a 10G file that only take up a total of 10G.
This nifty behavior makes cp -al, when combined with rsync, ideal for backup systems. One can use the cp -al command to take a “snapshot” of a given directory tree at a given time, at the expense of very little additional disk space. I use this script in concert with my rsync_backup.pl script to keep 21 days of “snapshot” backups of each of my machines.
#!/usr/bin/perl
use POSIX;
# Rotates backup directories w/ cp -al (hardlinks)
# Deletes directories older than $KEEP_DAYS
# Runs each night ahead of backup process
# (c) 2009 eddie@eddieoneverything.com
$KEEP_DAYS=21;
$LOGFILE = "/var/log/rotate_backups";
@BACKUP_DIRS=(
'/mnt/backup/hansel',
'/mnt/backup/tiger',
'/mnt/backup/june'
);
$ts = get_timestamp();
open hLOG, ">>$LOGFILE";
print hLOG "=" x 80, "\n";
print hLOG "Run START at " . `date` . "\n";
print hLOG "=" x 80, "\n";
## Do the rotation
print hLOG "Do today's rotation\n";
foreach $dir ( @BACKUP_DIRS ){
print hLOG "\t" , `date`;
$newfn = $dir ."_" . $ts;
$cmd = "cp -al $dir $newfn";
print hLOG "Execute Command: $cmd\n";
`$cmd`;
}
## Delete old directories
print hLOG "Delete Old Directories\n";
foreach $dir ( @BACKUP_DIRS ){
$dir=~m/^(.+)\/(.+?)$/;
$base= $1;
$stub=$2;
#print "dir is $dir\nBASE: $base\nSTUB:$stub\n";
opendir hDIR, "$base" or die "can't open directory $base";
@dirlist=grep { /^$stub\_/ && !/^$stub$/ } readdir(hDIR);
closedir hDIR;
foreach $d (@dirlist){
print hLOG "\t" , `date`;
$d=~/$stub\_([0-9]+)_([0-9]+)_([0-9]+)_.+$/;
$year= $1;
$month = $2;
$day = $3;
if (dirIsOlder($year, $month, $day)){
#print "$d\n";
$remove_dir= $base . '/' . $d;
$cmd = "rm -Rf \"$remove_dir\"";
print hLOG "Execute command: $cmd\n";
`$cmd`;
}else{
print hLOG "Keep $d\n";
}
}
}
print hLOG "=" x 80, "\n";
print hLOG "Run END at " . `date` . "\n";
print hLOG "=" x 80, "\n";
close hLOG;
# ------------- Subroutines & functions -----------------
sub get_timestamp {
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$mon+=1;
if ($mon < 10) { $mon = "0$mon"; }
if ($mday < 10) { $mday = "0$mday"; }
if ($hour < 10) { $hour = "0$hour"; }
if ($min < 10) { $min = "0$min"; }
if ($sec < 10) { $sec = "0$sec"; }
$year=$year+1900;
return $year . '_' . $mon . '_' . $mday . '__' . $hour . '_' . $min . '_' . $sec;
}
sub dirIsOlder{
($fyear, $fmonth, $fday) = @_;
#print "Check $year - $month - $day \n";
$now = mktime(localtime());
$then = mktime (0, 0 , 0, $fday-1, $fmonth-1, $fyear-1900 , 0, 0);
$diff_sec = $now - $then;
$days_since = $diff_sec / 24 / 60 / 60;
#print "n: $now. t: $then. Diff ($diff_sec) = $days_since\n";
#print "$days_since $fyear-$fmonth-$fday\n";
## Subtract days from today
if ($days_since > $KEEP_DAYS){
return 1;
}else{
return 0;
}
}


January 24th, 2010 at 8:30 pm
Ugh, the article couldn’t be more wrong about the hard link principle. It is appaling how the author misrepresents the POSIX behaviour of hardlinks. For starters, the hardlink is never “broken” when new information is written to one of the files. This behaviour represents Copy-on-write filesystems, that are as of now, not yet implemented on mainstream linux. Reiser4 and future btrfs (and cow-ext3) are three filesystems I know of that exibit this feature.
Plain hardlink never breaks. A simple operation:
echo -n “TEST” > a
cp -al a b
echo ” ME” >> b
cat a # shows “TEST ME”
cat b # shows “TEST ME”
would show that file a and b are identical. Any writes to file a will show up in b, and vice versa. This is what a hardlink is – multiple file handles pointing to the same bunch of sectors.
So unless the each and every program checks first if the file is hard linked, and does unlink and copy, there is no way a system will do that automatically.
February 9th, 2010 at 10:41 am
You’re right, Leszek. I have updated the post with a more accurate description. Thanks for the heads up.