Tutorial for unpacking and cataloguing lots of Java archives
This is a tutorial and example of how Bash increases productivity.
We know that a Java archive is just a zip archive so it contains lots of files. We want to create our own index of what they contain so we are going to unpack them and create an index of their contents. Here is the situation.
$ ls -l
total 48616
-rwxr-xr-x 1 davidnewcomb staff 12952991 28 Nov 13:57 MRN.ear
-rwxr-xr-x 1 davidnewcomb staff 11642532 15 Sep 2017 MRN.war
-rwxr-xr-x 1 davidnewcomb staff 289943 21 May 2018 client.jar
Let's practice with one to start with. The Java comes with a jar
program to help you play with the file format. First we need a folder name:
file="MRN.ear"
dir=`echo $file | sed 's/\./_/'`
This writes a line with MRN.ear into the pipe, sed
takes each line and looks for a period followed by everything and replaces it with nothing. The result is assigned to a variable called dir
.
Next we are going to do the work using our new variable. First make a directory:
mkdir $dir
Then go into it:
mkdir $dir ; cd $dir
Next unpack the archive, we have to remember that we are actually a directory below the location of the archive so we need to prefix ../
to the path.
mkdir $dir ; cd $dir ; tar -xf ../$file
Finally we need to make the archive catalogue. So back up a directory level find
what we want:
mkdir $dir ; cd $dir ; tar -xf ../$file ; cd .. ; find $dir > $dir.find.txt
That'll work with one of them, but we have hundreds! No problem, we can just wrap it all in a loop.
for file in `ls`; do dir=`echo $file | sed 's/\./_/'` ; mkdir $dir ; cd $dir ; tar -xf ../$file ; cd .. ; find $dir > $dir.find.txt ; done
for
runs once for each item in a white space (or line break) separated list. We are running ls
to generate the list. Each item in the list will be assigned to $file
and then we just follow our normal path from before.
We have some choices now depending on what kind of resources you have. Bash has the amazing ability to run tasks in the background so waiting for a long command to complete just means you can do something else for a bit! You can push a task into the background by adding an ampersand to the end. For one command it's
echo "hello" &
or we can chain them together like so:
(cat /etc/hosts | grep 127) &
(cd bin ; ls) &
Let's add that to our command:
for file in `ls`; do (dir=`echo $file | sed 's/\./_/'` ; mkdir $dir ; cd $dir ; tar -xf ../$file ; cd .. ; find $dir > $dir.find.txt ) & done
There's no need for a semi-colon at the end because the closing parenthesis marks the end of the statement too so we can drop it.
It can be a bit tricky to write all of this first time and get it all right so there are various little tricks you can do while you are trying. Firstly write protect our starting files, we don't want to overwrite them by mistake:
chmod -w *
In the for loop I would create an expression that just produces one result. So you could try either of these:
for file in `ls | head -1`
for file in MRN.ear
Also you want to start from a clean slate each time, so remove anything with an underscore in it or anything ending with .txt
:
rm -rf *_* *.txt ; for file in `ls | head -1`
When we think we have got it right then remove the |head -1
and run it for real.
I've talked it through in terms of the mental thought processes behind building the command so it has taken a bit longer to go through it. When you work in this environment, it's not long before you start to think in terms of running commands together using the background with lots of piping to get what you want. It becomes second nature, you can do some incredible things. Sometimes those lines will go into a bigger bash program or used in a scheduled task, but most of the time, surprisingly, they are just thrown away because that thing you wanted to do is now done and you won't have to do it again. It's not even worth putting it in a file!!!
No feedback yet
Form is loading...