The recursive-lengths-list-many-files
function requires a list
of files as its argument. For our test examples, we constructed such
a list by hand; but the Emacs Lisp source directory is too large for
us to do for that. Instead, we need to use the directory-files
function to construct a list for us.
The directory-files
function takes three arguments: the first
argument is the name of a directory, a string; a non-nil
second
argument causes the function to return the files' absolute pathnames;
and the third argument is a selector. If it contains a regular
expression (rather than nil
), only pathnames that match that
regular expression are returned.
Thus, on my system,
(length (directory-files "../lisp" t "\\.el$"))
tells me that my version 19.25 Lisp sources directory contains 307 `.el' files.
An expression to sort the list returned by
recursive-lengths-list-many-files
looks like this:
(sort (recursive-lengths-list-many-files (directory-files "../lisp" t "\\.el$")) '<)
Our immediate goal is to generate a list that tells us how many
function definitions contain fewer than 10 words and symbols, how many
contain between 10 and 19 words and symbols, how many contain between
20 and 29 words and symbols, and so on. With a sorted list of
numbers, this is easy: count how many elements of the list are smaller
than 10, then, after moving past the numbers just counted, count how
many are smaller than 20, then, after moving past the numbers just
counted, count how many are smaller than 30, and so on. Each of the
numbers, 10, 20, 30, 40, and the like, is one larger than the top of
that range. We can call the list of such numbers the
top-of-ranges
list.
If we wanted to, we could generate this list automatically, but it is simpler to write a list manually. Here it is:
(defvar top-of-ranges '(10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300) "List specifying ranges for `defuns-per-range'.")
To change the ranges, we edit this list.
Next, we need to write the function that creates the list of the
number of definitions within each range. Clearly, this function must
take the sorted-lengths
and the top-of-ranges
lists
as arguments.
The defuns-per-range
function must do two things again and
again: it must count the number of definitions within a range
specified by the current top-of-range value; and it must shift to the
next higher value in the top-of-ranges
list after counting the
number of definitions in the current range. Since each of these
actions is repetitive, we can use while
loops for the job.
One loop counts the number of definitions in the range defined by the
current top-of-range value, and the other loop selects each of the
top-of-range values in turn.
Several entries of the sorted-lengths
list are counted for each
range; this means that the loop for the sorted-lengths
list
will be inside the loop for the top-of-ranges
list, like a
small gear inside a big gear.
The inner loop counts the number of definitions within the range. It
is a simple counting loop of the type we have seen before.
(See section A Loop with an Incrementing Counter.)
The true-or-false test of the loop tests whether the value from the
sorted-lengths
list is smaller than the current value of the
top of the range. If it is, the function increments the counter and
tests the next value from the sorted-lengths
list.
The inner loop looks like this:
(while length-element-smaller-than-top-of-range (setq number-within-range (1+ number-within-range)) (setq sorted-lengths (cdr sorted-lengths)))
The outer loop must start with the lowest value of the
top-of-ranges
list, and then be set to each of the succeeding
higher values in turn. This can be done with a loop like this:
(while top-of-ranges body-of-loop... (setq top-of-ranges (cdr top-of-ranges)))
Put together, the two loops look like this:
(while top-of-ranges ;; Count the number of elements within the current range. (while length-element-smaller-than-top-of-range (setq number-within-range (1+ number-within-range)) (setq sorted-lengths (cdr sorted-lengths))) ;; Move to next range. (setq top-of-ranges (cdr top-of-ranges)))
In addition, in each circuit of the outer loop, Emacs should record
the number of definitions within that range (the value of
number-within-range
) in a list. We can use cons
for
this purpose. (See section cons
.)
The cons
function works fine, except that the list it
constructs will contain the number of definitions for the highest
range at its beginning and the number of definitions for the lowest
range at its end. This is because cons
attaches new elements
of the list to the beginning of the list, and since the two loops are
working their way through the lengths' list from the lower end first,
the defuns-per-range-list
will end up largest number first.
But we will want to print our graph with smallest values first and the
larger later. The solution is to reverse the order of the
defuns-per-range-list
. We can do this using the
nreverse
function, which reverses the order of a list.
For example,
(nreverse '(1 2 3 4))
produces:
(4 3 2 1)
Note that the nreverse
function is "destructive"---that is,
it changes the list to which it is applied; this contrasts with the
car
and cdr
functions, which are non-destructive. In
this case, we do not want the original defuns-per-range-list
,
so it does not matter that it is destroyed. (The reverse
function provides a reversed copy of a list, leaving the original list
as is.)
Put all together, the defuns-per-range
looks like this:
(defun defuns-per-range (sorted-lengths top-of-ranges) "SORTED-LENGTHS defuns in each TOP-OF-RANGES range." (let ((top-of-range (car top-of-ranges)) (number-within-range 0) defuns-per-range-list) ;; Outer loop. (while top-of-ranges ;; Inner loop. (while (and ;; Need number for numeric test. (car sorted-lengths) (< (car sorted-lengths) top-of-range)) ;; Count number of definitions within current range. (setq number-within-range (1+ number-within-range)) (setq sorted-lengths (cdr sorted-lengths))) ;; Exit inner loop but remain within outer loop. (setq defuns-per-range-list (cons number-within-range defuns-per-range-list)) (setq number-within-range 0) ; Reset count to zero. ;; Move to next range. (setq top-of-ranges (cdr top-of-ranges)) ;; Specify next top of range value. (setq top-of-range (car top-of-ranges))) ;; Exit outer loop and count the number of defuns larger than ;; the largest top-of-range value. (setq defuns-per-range-list (cons (length sorted-lengths) defuns-per-range-list)) ;; Return a list of the number of definitions within each range, ;; smallest to largest. (nreverse defuns-per-range-list)))
The function is straightforward except for one subtle feature. The true-or-false test of the inner loop looks like this:
(and (car sorted-lengths) (< (car sorted-lengths) top-of-range))
instead of like this:
(< (car sorted-lengths) top-of-range)
The purpose of the test is to determine whether the first item in the
sorted-lengths
list is less than the value of the top of the
range.
The simple version of the test works fine unless the
sorted-lengths
list has a nil
value. In that case, the
(car sorted-lengths)
expression function returns
nil
. The <
function cannot compare a number to
nil
, which is an empty list, so Emacs signals an error and
stops the function from attempting to continue to execute.
The sorted-lengths
list always becomes nil
when the
counter reaches the end of the list. This means that any attempt to
use the defuns-per-range
function with the simple version of
the test will fail.
We solve the problem by using the (car sorted-lengths)
expression in conjunction with the and
expression. The
(car sorted-lengths)
expression returns a non-nil
value so long as the list has at least one number within it, but
returns nil
if the list is empty. The and
expression
first evaluates the (car sorted-lengths)
expression, and
if it is nil
, returns false without evaluating the
<
expression. But if the (car sorted-lengths)
expression returns a non-nil
value, the and
expression
evaluates the <
expression, and returns that value as the value
of the and
expression.
This way, we avoid an error.
See section forward-paragraph
: a Goldmine of Functions, for more information about and
.
Here is a short test of the defuns-per-range
function. First,
evaluate the expression that binds (a shortened)
top-of-ranges
list to the list of values, then evaluate the
expression for binding the sorted-lengths
list, and then
evaluate the defuns-per-range
function.
;; (Shorter list than we will use later.) (setq top-of-ranges '(110 120 130 140 150 160 170 180 190 200)) (setq sorted-lengths '(85 86 110 116 122 129 154 176 179 200 265 300 300)) (defuns-per-range sorted-lengths top-of-ranges)
The list returned looks like this:
(2 2 2 0 0 1 0 2 0 0 4)
Indeed, there are two elements of the sorted-lengths
list
smaller than 110, two elements between 110 and 119, two elements
between 120 and 129, and so on. There are four elements with a value
of 200 or larger.
Go to the first, previous, next, last section, table of contents.