return to first page linux journal archive
keywordscontents

Hints & Tips for Optimizing Linux Disk Usage

It seems that no matter how much hard disk storage you have, it quickly becomes full. Because Linux is free, many users are on a limited budget and can't afford to simply buy more hardware. This article describes some simple techniques for making the best use of the disk storage you have now.

By Jeff Tranter

Half Full or Half Empty?

The first question to be answered is, how full is your disk? The best way to do this is with the "df" command. This is what my system shows:

% df
Filesystem  1024-blocks   Used   Available   Capacity   Mounted on
/dev/hdal   4247          73427    16108       82%               /

In this case I have slightly more that 16 megabytes available on my single 94 megabyte partition, making it 82% full. If you have multiple disk partitions and/or drives, you will see a entry in the listing for each mounted partition.

You may think that as long as you are under 100%, everything is fine. This is not the case, for several reasons.

First, you need some space in reserve for temporary files. Compiling, for example, requires that various intermediate files be stored in a temporary directory. If you don't have enough space for these files, the command will fail, possibly in strange ways. A normal user could even bring the system down by filling up the root disk.

Second, once your disk approaches full capacity, performance starts to degrade. The file system has to spend more time looking for free blocks on the disk to use. The amount of performance degradation depends on how full the disk is, and the type of file system. A good rule of thumb is to keep the disk below 80% full if possible, and definitely below 90%.

Load Balancing

You may have multiple disk drives, and you may have divided one or more drives into separate partitions. This is one way, for example, to avoid having the root partition filled by user data. The disadvantage is that you may find that one partition is full, even though others have space available.

If this is the case, one option is to repartition the drives to better balance the load. This will require backing up all your data first, since repartitioning is destructive.

A second choice is to move some of the files from one partition or disk to another. Using symbolic links, you may be able to do this without affecting other applications.

You likely have at least one swap partition. Choosing its size is a tradeoff between having adequate swap space and losing valuable filesystem space. If you find you have more swap space than needed (the "free" command is useful here), the n you can consider repartitioning to make the swap partition smaller. If you only rarely require a lot of swap space (e.g. during large compiles) an option is to use a smaller swap partition, and to add a swap file on those occasions when extra swap is needed.

If your drives are load balanced and disk space is still a problem, or if you have a single partition, then you are ready for the techniques described in the rest of this article.

Finding the Big Files

A useful concept is the Pareto or 80/20 principle. Applied here, it says that 80% of the disk space is contained in 20% of the files (more or less). How do you find this 20%? One tool is the "find" command. The following command line, for example, will find all of the files on the system that are larger than 1 million bytes (you may have to run this as root so you have access to all the directories; expect it to take a while to run, depending on disk size):

    % find / -size+1000000c -ls

This command tells find to locate any file larger than 1 million characters and printing out its directory information. If you are not familiar with the find command, read up on i t in the man page. It can be a useful tool for identifying areas to attack.

Removing Files

The obvious solution to freeing up disk space is to delete files. With some detective work, you may find a lot of baggage on your system that you don't need. First though, two important warnings:

On an SLS or Slackware system, it is easy to identify and remove software packages. Review the packages that you have installed, see how much disk space they require, and delete the ones you don't need.

Some of the things that can save a lot of space if they are not needed are:

You can remove the nroff source for the man pages (in /usr/man/man?) and keep only the compressed, formatted pages (in /usr/man/cat?). Make sure all man pages have been formatted first.

The full distribution of emacs lisp source files (*.el) is quite large. You can just keep the byte compiled (*.elc) files.

If you have compiled the Linux kernel, remove the object files (e.g. cd /usr/src/linux ; make clean).

You may have accumulated some old libraries in /lib and /usr/lib. Some binaries may still require them though. You may be able to determine whether they are used by running ldd on your binaries. This is one area I suggest you avoid unless you know what you are doing, because you can easily cause your system to break if you delete needed libraries.

If you run the X window system, the font files take a lot of disk space (they are usually in /usr/lib/X11). Make sure the .pcf font files are compressed. You may be able to delete some of the more obscure fonts as well. If you compress or delete any fonts, be sure to run the mkfontdir command afterward.

I found on my system that there were a number of duplicate files in /bin and /usr/bin. Eliminating duplicates can save space, but watch that you are not just deleting links to the same file. The diff command is useful here for comparing director of files (e.g. diff /bin /usr/bin).

Certain files grow without limit and should be periodically purged. These include:

 
 /etc/wtmp 

 /etc/ftmp 

 /var/log/notice 

In general you should truncate these files to zero length (e.g., cp /dev/null /var/log/notice) rather than deleting the files, or the programs that use them may not create new files.

Finally, there may be files that you need periodically but do not need on-line all the time (e.g. the Linux kernel source). Consider backing these up to floppy or tape and restoring them on an as-needed basis.

Compressing Files

If you cannot delete files, there are several options available for making them smaller.

Compression programs can make files significantly smaller. The GNU gzip program provides excellent compression. It is ideal for compressing data files such as documents and source code that do not need to be on-line at all times. The best option to zip will provide the highest level of compression.

Emacs can also be configured to automatically uncompress files when editing; see the lisp file uncompress.el included with UGN emacs version 19. Another space savings with emacs is to compress info pages; GNU emacs version 19 automatically handles reading compressed info pages.

If you maintain multiple revisions of files, particularly source code, then a revision control system such as RCS can save space. Only the differences between file revisions are stored.

Compressing man pages was mentioned previous.ly Traditionally this is done using the standard compress program Some man programs can handle gzip compressed man page as well (you may need to continue using the .Z file extension).

Existing binary files sometimes contain debug information. They can be made smaller using the strip command. This was the case, for example, for the GNU emacs binary I uploaded from an archive site.

For new compiles, you can minimize the size of the binaries (once they are debugged) by compiling without debug and with full optimization (e.g. -O2 for gcc). The linker flags -s and -N can also be useful; see the gcc and ld man pages for more details.

Advanced Techniques

More advanced users can try some of the more sophisticated techniques described in this section.

The Linux second extended filesystem reserves a portion of the disk for use only by root. This is useful for preventing users from filling the disk, but may not be desirable in some cases (e.g. a disk containing user files only). By default 5% of the disk is reserved. You can change this when formatting a partition; see the mke2fs man page for more details.

Transparently Compressed Executables (TCX), is a utility for compressing binary files in a way that still allows them to be executed. The author is Stewart Forster. It saves disk space at the expense of a minor delay in startup time. It is most useful for large programs that are infrequently executed. I found this package to be quite effective and reliable; it can be found on your local Linux archive site.

For compressing data files, the zlibc package can be used. It works by patching the Linux system calls such as open that read data files, and uncompresses the files on the fly. I have not used this package, but the author claims that compression ratios of 1:3 can easily be achieved. The package can be found on Linux archive sites in both source and binary form; the author is Alain Knaff.

There have been discussions about implementing a compressed filesystem for Linux, similar in concept to what exists for MS-DOS. The ext2 filesystem already has some hooks to allow this to be added. This may be the ultimate solution for optimizing disk usage.


For More Information:

More information on the ideas discussed in this article can be found in the following sources. All are available on the major Linux archive sites.

Linux Kernel Hacker's Guide, Michael K. Johnson

Linux System Administrator's Guide, Lars Wirzenius.

Linux man pages

See also the documentation included with the packages mentioned in this article ( zlibc, tcx, etc...).

By day, Jeff Tranter is a software designer for a large telecommunications manufacturer in Kanata, Ontario (Canada's Silicon Valley North). For him, Linux is the realization of a dream to have an affordable Unix compatible system at home that rivals commercial workstations and is just plan fun to hack around with.

  Previous    Next