What's eating all my disk? Using 'du' in Linux
I've just been awakened courtesy of an alert from Nagios kindly advising that a host is down. That was very quickly resolved, but whilst I was awake I noticed a warning for disk space utilisation on one of the Selenium test runners used in our development environment, so I decided to take a quick look.
If you are already a linux user, than you might be familiar with the 'df' command. df can be used to show how much free space there is on, by default, all linux file systems. Running the 'df' command elicits results such as:
$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda4 118G 6.2G 106G 6% / none 4.0K 0 4.0K 0% /sys/fs/cgroup udev 7.9G 4.0K 7.9G 1% /dev tmpfs 1.6G 1.6M 1.6G 1% /run none 5.0M 0 5.0M 0% /run/lock none 7.9G 26M 7.8G 1% /run/shm none 100M 16K 100M 1% /run/user /dev/sda7 98M 34M 64M 35% /boot/efi /dev/sda5 102G 3.6G 93G 4% /home
Passing the '-h' flag into df means produce human readable output. As you can see, there's very little utilisation in my file systems, but for the purposes of this article, assume I'm running out of space on /dev/sda4 which is mounted as '/'. How can I find out what's eaten all of the available disk space? I give you 'du', short for disk usage. Issuing the following command will start to give a picture of what size each top level folder is in the root of our file system:
du -sh /*
If you aren't running as the root user, you may need to use sudo to avoid getting a series of permission denied errors
sudo du -sh /*
And you may still want to redirect standard error output to /dev/null, as to avoid seeing errors for special files and directories, such as /proc/*
$ sudo du -sh * 2>/dev/null 9.8M bin 80M boot 4.0K cdrom 4.0K dev 25M etc 7.1G home 0 initrd.img 324M lib 3.5M lib32 4.0K lib64 16K lost+found 8.0K media 28K mnt 374M opt 0 proc 150M root 2.3M run 16M sbin 4.0K srv 0 sys 52K tmp 4.5G usr 725M var 0 vmlinuz
The flag '-s' supplied to du means summarise. Or in other words, return the total amount of disk usage for each file/folder in the supplied path. the '-h' again means return human readable output. Right now we can see, almost at a glance, what's is using the most disk space within the root of our file system ('/'). We could then drill down into a folder. The '/usr' folder is using 4.5 GB of disk, so lets take a peek.
$ sudo du -sh /usr/* 2>/dev/null 148M /usr/bin 44K /usr/games 39M /usr/include 2.5G /usr/lib 76M /usr/lib32 33M /usr/local 23M /usr/sbin 1.6G /usr/share 128M /usr/src
Now we can see that the biggest culprit is the '/usr/lib' folder. Now I know already that there are are lot of files within that folder as it's where most of the shared libraries live. Running the same command on that folder would elicit lots of output and completely spam our standard output. So, I'm going to use one last trick with the du command. I'm going to use it to find just the ten biggest files in the /usr/lib folder:
sudo du -s -B 1048576 /usr/lib/* 2>/dev/null | sort -nr | head -n 10 816 /usr/lib/x86_64-linux-gnu 250 /usr/lib/i386-linux-gnu 239 /usr/lib/libreoffice 190 /usr/lib/chromium-browser 119 /usr/lib/python2.7 94 /usr/lib/firefox 92 /usr/lib/nvidia-331 82 /usr/lib/jvm 81 /usr/lib/thunderbird 61 /usr/lib/virtualbox
So instead of asking for human readable output, I've specified a block size ('-B') of 1048576 bytes or, 1 Megabyte. I could have used '-BM' instead, but I wanted to be explicit in the example as to how block size works. I've then numericall sorted that output, reversed the sort order (largest first) and then used head to return just the first ten rows.