What's eating all my disk? Using 'du' in Linux
I've just been awakened courtesy of an alert from Nagios kindly advising that a host is down. That was very quickly resolved, but whilst I was awake I noticed a warning for disk space utilisation on one of the Selenium test runners used in our development environment, so I decided to take a quick look.
If you are already a linux user, than you might be familiar with the 'df' command. df can be used to show how much free space there is on, by default, all linux file systems. Running the 'df' command elicits results such as:
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda4 118G 6.2G 106G 6% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 7.9G 4.0K 7.9G 1% /dev
tmpfs 1.6G 1.6M 1.6G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 7.9G 26M 7.8G 1% /run/shm
none 100M 16K 100M 1% /run/user
/dev/sda7 98M 34M 64M 35% /boot/efi
/dev/sda5 102G 3.6G 93G 4% /home
Passing the '-h' flag into df means produce human readable output. As you can see, there's very little utilisation in my file systems, but for the purposes of this article, assume I'm running out of space on /dev/sda4 which is mounted as '/'. How can I find out what's eaten all of the available disk space? I give you 'du', short for disk usage. Issuing the following command will start to give a picture of what size each top level folder is in the root of our file system:
du -sh /*
If you aren't running as the root user, you may need to use sudo to avoid getting a series of permission denied errors
sudo du -sh /*
And you may still want to redirect standard error output to /dev/null, as to avoid seeing errors for special files and directories, such as /proc/*
$ sudo du -sh * 2>/dev/null
9.8M bin
80M boot
4.0K cdrom
4.0K dev
25M etc
7.1G home
0 initrd.img
324M lib
3.5M lib32
4.0K lib64
16K lost+found
8.0K media
28K mnt
374M opt
0 proc
150M root
2.3M run
16M sbin
4.0K srv
0 sys
52K tmp
4.5G usr
725M var
0 vmlinuz
The flag '-s' supplied to du means summarise. Or in other words, return the total amount of disk usage for each file/folder in the supplied path. the '-h' again means return human readable output. Right now we can see, almost at a glance, what's is using the most disk space within the root of our file system ('/'). We could then drill down into a folder. The '/usr' folder is using 4.5 GB of disk, so lets take a peek.
$ sudo du -sh /usr/* 2>/dev/null
148M /usr/bin
44K /usr/games
39M /usr/include
2.5G /usr/lib
76M /usr/lib32
33M /usr/local
23M /usr/sbin
1.6G /usr/share
128M /usr/src
Now we can see that the biggest culprit is the '/usr/lib' folder. Now I know already that there are are lot of files within that folder as it's where most of the shared libraries live. Running the same command on that folder would elicit lots of output and completely spam our standard output. So, I'm going to use one last trick with the du command. I'm going to use it to find just the ten biggest files in the /usr/lib folder:
sudo du -s -B 1048576 /usr/lib/* 2>/dev/null | sort -nr | head -n 10
816 /usr/lib/x86_64-linux-gnu
250 /usr/lib/i386-linux-gnu
239 /usr/lib/libreoffice
190 /usr/lib/chromium-browser
119 /usr/lib/python2.7
94 /usr/lib/firefox
92 /usr/lib/nvidia-331
82 /usr/lib/jvm
81 /usr/lib/thunderbird
61 /usr/lib/virtualbox
So instead of asking for human readable output, I've specified a block size ('-B') of 1048576 bytes or, 1 Megabyte. I could have used '-BM' instead, but I wanted to be explicit in the example as to how block size works. I've then numericall sorted that output, reversed the sort order (largest first) and then used head to return just the first ten rows.