| « Jetty, Jersey and MySQL inside Eclipse | Shared mouse with Synergy » |
Tar'ing files from a file
November 9th, 2011I recently had to take a copy of a web server to test some layout stuff but when I zipped it up, it was over 6GB. This was too much for me to transfer, so I had a look to see what was taking up all the space. It turned out that most of it was video files and tutorials. I didn’t really need them for the work I was doing so I wanted to filter them out and tar up the rest.
I did a find to get a list of all the files under the document root, but found that when I invoked the tar command it kept including the video files.
find docroot -print > files.txt
grep -v -i “\.wmv$” files.txt > filtered-files.txt
tar -cvj -T filtered-files.txt -f docroot.tar.bz2
It took a while to realise what was going on, so I thought I’d document the trap I fell into. The find command lists all the files and directories which meant that the tar command was effectively:
tar -cvj -f docroot.tar.bz2 docroot/file1.txt docroot/movies_dir docroot/file2.txt
The grep I issued didn’t make any difference because I was including the movie’s parent directory, which contained all the files.
So I modified my find command to only include files and it worked as expected.
find docroot -type f -print > files.txt
grep -v -i “\.wmv$” files.txt > filtered-files.txt
tar -cvj -T filtered-files.txt -f docroot.tar.bz2
I kept finding different movie formats so the grep command was getting longer and longer. In retrospect it would have been better to use more of the features of the tar command:
tar -cvj -f docroot.tar.bz2 –exclude="*.wmv” –exclude="*.mpg” –exclude="*.mpeg” docroot

