"The Linux Gazette...making Linux just a little more fun!"


(?) The Answer Guy (!)


By James T. Dennis, answerguy@ssc.com
Starshine Technical Services, http://www.starshine.org/


(?) Listing "Just the Links": It's the only way, Luke

From Jerry Giles on Thu, 05 Nov 1998

Sorry for the intrusion but I came across your name while browsing for Linux. I am currently in a CIS program at the local college and a recent test had an item I still can't find the answer to. The professor asked what command to use to list "only the linked files" in a directory. He is expecting us to use ls with flags, I guess, but I've looked at all the flags given in the text and nothing seems to address this. Can you help?

Thanks, jerry giles

(!) Either you misunderstand, or your professor isn't being very precise. The 'ls' command "lists links" --- all directory entries are links! Some of these are symbolic links; others are "hard" links (which we think of as "normal" directory entries. The 'ls' command can't list anything but links. I can list other information that it extracts from the inodes to which each of these links points (via the stat() function).
So, the question is essentially meaningless as you've relayed it.
Now, if the question was about listing symbolic links there are a couple of simple answers that do make sense.
ls -l | grep ^l
... this filters a "long" listing of all the links (hard and "symbolic") and displays only those which start with the letter l. In a "long" directory listing the first block of characters (field) is a string which encodes the type and permissions of the files to which these directory links point. (l is "symlink", d for "directory", s for "socket", p for "FIFO/named pipe", b and c for "block" and "character" special device nodes --- normally only found under the /dev/ directory --- and "-" (dash) for "regular" files).
The second field in a long listing is the "link count." This tells you how many "hard links" point to the same inodes that this one does.
Here's an example of my own root directory
drwxr-xr-x  14 root     root         1024 Sep 27 17:19 .
drwxr-xr-x  14 root     root         1024 Sep 27 17:19 ..
-rw-r--r--   2 root     root       219254 Sep 27 17:19 System.map
drwxr-xr-x   2 root     root         2048 Sep 12 03:25 bin
drwxr-xr-x   2 root     root         1024 Sep 27 17:20 boot
drwxr-xr-x   2 root     root         1024 Aug 31 06:40 cdrom
drwxr-xr-x  21 root     root         4096 Nov  4 03:12 etc
lrwxrwxrwx   1 root     root           15 Apr 20  1998 home -> /usr/local/home
drwxr-xr-x   5 root     root         2048 Sep 16 23:48 lib
drwxr-xr-x   2 root     root        12288 Mar 10  1998 lost+found
drwxr-xr-x   9 root     root         1024 Aug 31 06:40 mnt
lrwxrwxrwx   1 root     root           14 Mar 31  1998 opt -> /usr/local/opt
dr-xr-xr-x  63 root     root            0 Oct 13 02:25 proc
drwx--x--x  13 root     root         2048 Oct 31 17:47 root
drwxr-xr-x   5 root     root         2048 Sep 16 23:48 sbin
drwxrwxrwt   8 temp     root         3072 Nov  5 09:33 tmp
drwxr-xr-x  30 root     root         1024 Aug 31 13:32 usr
lrwxrwxrwx   1 root     root           13 Aug 31 06:40 var -> usr/local/var
-rw-r--r--   1 root     root       732668 Sep 27 17:19 vmlinuz
This was generated with the command: 'ls -al /'
The number in the second field (the first number on each of these lines) is the "link count." This is the number of hard links (non-symlinks) that point to the same inode. Thus my rood directory has 14 links to it. The ".." entry for each of /'s subdirectories points back up to it. In other words /usr/.. points back to /, so do /etc/.., /dev/.., and all the others that are just one level down from it. /usr/local/.. points to /usr and so on.
We see that 'System.map' has a link count of 2. That means that there is another name for this file. Somewhere on this filesystem there is another hard link to it.
Most Unix newbies are using to thinking of the 'ls' command as a listing of files. This is wrong. The 'ls' command is a listing of links to files. When you add parameters like "-l" to the 'ls' command, you are listing the links, AND SOME INFORMATION ABOUT THE FILES TO WHICH THEY POINT. (Under the hood the 'ls' command is "stat()'ing each of these entries). A Unix/Linux directory consists of a list of names and inodes. All of the rest of the information that we associate with the file (its type, ownership, permissions, link count, all three time/date stamps, size, and --- most importantly --- the list of blocks that contains the file's contents, all of this is stored in the inode).
To understand the difference better, create a subdirectory (~/tmp/experiment). Put a few arbitrary links into that (use the 'ln' command to make "hard links" and the 'ln -s' command to make some symlinks, and maybe some 'cp' commands to copy in a few files). Now use the 'chmod' command to remove your own execute ("x") rights to that directory ('chmod a-x ~/tmp/experiment').
You should be able to do an 'ls' command (be sure to use the real 'ls' command --- NOT SOME ALIAS, SHELL FUNCTION OR SCRIPT). That should work. (If it doesn't --- you probably have 'ls' alias'ed to 'ls --color' or something like that --- try issuing the command /bin/ls, or try the command 'unalias ls' for the duration of this experiment). When you can issue the 'ls' command, with no arguments and get a list of the file names in the "~/tmp/experiment" directory then try 'ls -l' or 'ls -i'
You should get a whole stream of "Permission denied" messages. Note that you also have to do all of this from outside of the directory. Issuing the 'cd' command to get into a directory requires that you have "execute" permission to that directory.
The reason that you get these "Permission denied" errors is because, to give any other information about a file (other than the link names) the 'ls' command needs to access the 'inodes' (which requires "execute" permissions for a directory). You can do an 'ls' or an 'ls -a' on the directory --- because these only provide lists of the link names. These variations of the command don't need access to any other information about the files (which is all stored in the inode).
So, now that you (hopefully) understand what links really are --- you can understand something about the 'rm' command.
'rm' doesn't remove files. 'rm' remove links to files. The filesystem driver then checks the link count. If that's "zero" (and there are no open file descriptors, processes with the file open) then the file is actually removed.
Note the important element here: file removal happens indirectly, as part of the filesystem's maintenance. The 'rm' and similar commands just call "unlink()" (the system call).
There was also an extra clause I snuck in. If I open a file (with and editor, for example) and then I use 'rm' to remove that file, what happens? (Let's assume that there was only one hard link to the file).
Nothing spectacular. The link count is zero but the file is open. The filesystem maintenance routines leave the inode and the data blocks to the file alone so long as the file is open. As soon as the file is close, these routines will detect the zero link count and then remove the file. If a dozen processes have the file open --- than all of them must close it before the file is truly removed.
Removal actually involves a few steps. All of the data blocks that are allocated to the file are reassigned to the "free list." You can think of the free list as a "special file" that "owns" all of the free space on the disk. The actual implementation is different for different fileystems. Then the inode is marked as deleted, or its "zero'd out" (filesystem and version specific).
Now, back to your original question:
A more precise way to find all of the "symlinks" in a directory is to use the 'find' command. Try the command:
find / -type l -maxdepth 1 -print
... (GNU 'find' defaults to "-print" so you can leave that off under Linux).
The "maxdepth 1" part is to prevent 'find' from traversing down the whole file tree. (Note: I tend to use "file tree" or "file hiearchy" to refer to all the files *and all the mounted filesystems* below a point, and "filesystem" to refer to all of the files on a single mounted fs. This is a subtle point of confusion).
Now, if the question was "find all of the regular files with a link count greater than 1" you'd use:
find ... -type f -maxdepth 1 -links +1
... where the ellipsis is a list of one or more directories and/or filenames and the other parameters test for the various conditions that I described (and prevent traversal down the tree, of course). In GNU find many of the numeric conditions can be specified as "+x" "x" or "-x" --- where +x means "more than 'x'", -x means "less than 'x'" and just x means "exactly x." That's a subtlety of the 'find' command.
A last interpretation of this question that I can imagine is: find all of the links to a given file (inode). To do this you start with the inode. If it is not a directory (*) and it has a link count of more than one then search the whole filesystem for any other link that has a matching inode. This is a non-trivial question to a first term Unix student. It entails writing a script in a few parts.
* (We don't have to search for the additional hard links to directories, because they should all be in ./*/.. --- that is they are all . or .. entries in the current directory and the ones just below us. If you were to use some custom code for force the creation of some other hard link to a directory --- fsck would probably have fits about the anomaly in the directory structure. Some versions of Unix have historically allowed root (superuser) to create hard links to directories --- but the GNU utilities under Linux won't allow it --- so you'd have to write your own code or you'd have to directly modify the fs with a hex editor).
I'll just walk through one example to get us warmed up:
In my root directory example above I saw that System.map had a link count of 2. It's a regular file. So I want to find the other link to it.
First I find the inode.
'ls -ail /' gives us:
      2 drwxr-xr-x  14 root     root         1024 Sep 27 17:19 .
      2 drwxr-xr-x  14 root     root         1024 Sep 27 17:19 ..
     13 -rw-r--r--   2 root     root       219254 Sep 27 17:19 System.map
   4019 drwxr-xr-x   2 root     root         2048 Sep 12 03:25 bin
  56241 drwxr-xr-x   2 root     root         1024 Sep 27 17:20 boot
     14 lrwxrwxrwx   1 root     root           13 Aug 31 06:40 var
(etc).
... the numbers in the first field here are the inodes --- the filesystem data structures to which these links point. We note that the '.' and '..' (current and parent directories) both point to the same inode *for the root directory*. (For any other directory this would not be the case).
... so I want to find all links on this filesystem (*) which point to inode number 13.
So, here's the command to do that:
find / -mount -inum 13
... whoa! That was easy. The "-mount" option tells the find command not to traverse across any mount points (it's the same as the -xdev option).
To do this for each of the items in a directory -- the hard part is to find the root of the filesystem on which each file resides. In my example this was deceptively easy because the link I was looking at was in the root directory (which obviously is at the root of its filesytem).
If I had a script or program that would "find the root of the filesystem on which a given file resided" (let's call it "fsrootof" --- then I could write the rest of this script:
find ... -type f -links +1 -printf "%i %p\n" | while read i f; do
find $(fsrootof $f) -mount -inum $i
done
... this is a bit of shell script code that uses 'find' to generate a list of the inodes and names/paths (the -printf option to the first 'find') of "regular files" with link counts greater than 1. That list is fed into a simple shell loop (a mill) that reads each line as a "inode" and a "patch" (later referred to as $i and $f respectively). The body of that loop calls my mythical script or program to find the "root of the filesystem of the file" --- and use that as the search point for the second find command.
Just off hand I can't think of a way to implement this 'fsrootof' command using simple shell scripting. It would probably best be done as a C program or a Perl script (making direct use of some system calls to stat the file and some other trick to traverse upwards (following .. links) until we cross a mountpoint. I'd have to dig up the sources to the 'find' command to see how they do that.
So, maybe I'll leave that as the "Linux Gazette Reader Challenge" (implement 'fsrootof' as described above).


Copyright © 1998, James T. Dennis
Published in The Linux Gazette Issue 35 December 1998


[ Answer Guy Index ] office largedisk links yamaha magickeys
passwd ftproot pvtmail netware crypto
relay project bootmethod sysadmin ipscript
loopfs mrtg slimscan rpm modutil libc dell remoteroot


[ Table Of Contents ] [ Front Page ] [ Previous Section ] [ Next Section ]