Programming in Emacs Lisp - Words and Symbols

Go to the first, previous, next, last section, table of contents.

What to Count?

When we first start thinking about how to count the words in a function definition, the first question is (or ought to be) what are we going to count? When we speak of `words' with respect to a Lisp function definition, we are actually speaking, in large part, of `symbols'. For example, the following multiply-by-seven function contains the five symbols defun, multiply-by-seven, number, *, and 7. In addition, in the documentation string, it contains the four words `Multiply', `NUMBER', `by', and `seven'. The symbol `number' is repeated, so the definition contains a total of ten words and symbols.

(defun multiply-by-seven (number)
  "Multiply NUMBER by seven."
  (* 7 number))

However, if we mark the multiply-by-seven definition with C-M-h (mark-defun), and then call count-words-region on it, we will find that count-words-region claims the definition has eleven words, not ten! Something is wrong!

The problem is twofold: count-words-region does not count the `*' as a word, and it counts the single symbol, multiply-by-seven, as containing three words. The hyphens are treated as if they were interword spaces rather than intraword connectors: `multiply-by-seven' is counted as if it were written `multiply by seven'.

The cause of this confusion is the regular expression search within the count-words-region definition that moves point forward word by word. In the canonical version of count-words-region, the regexp is:

"\\w+\\W*"

This regular expression is a pattern defining one or more word constituent characters possibly followed by one or more characters that are not word constituents. What is meant by `word constituent characters' brings us to the issue of syntax, which is worth a section of its own.

Go to the first, previous, next, last section, table of contents.