It's fairly common for programs to have a need to do some simple kinds
of lexical analysis and parsing, such as splitting a command string up
into tokens. You can do this with the strtok
function, declared
in the header file `string.h'.
strtok
.
The string to be split up is passed as the newstring argument on
the first call only. The strtok
function uses this to set up
some internal state information. Subsequent calls to get additional
tokens from the same string are indicated by passing a null pointer as
the newstring argument. Calling strtok
with another
non-null newstring argument reinitializes the state information.
It is guaranteed that no other library function ever calls strtok
behind your back (which would mess up this internal state information).
The delimiters argument is a string that specifies a set of delimiters that may surround the token being extracted. All the initial characters that are members of this set are discarded. The first character that is not a member of this set of delimiters marks the beginning of the next token. The end of the token is found by looking for the next character that is a member of the delimiter set. This character in the original string newstring is overwritten by a null character, and the pointer to the beginning of the token in newstring is returned.
On the next call to strtok
, the searching begins at the next
character beyond the one that marked the end of the previous token.
Note that the set of delimiters delimiters do not have to be the
same on every call in a series of calls to strtok
.
If the end of the string newstring is reached, or if the remainder of
string consists only of delimiter characters, strtok
returns
a null pointer.
Warning: Since strtok
alters the string it is parsing,
you always copy the string to a temporary buffer before parsing it with
strtok
. If you allow strtok
to modify a string that came
from another part of your program, you are asking for trouble; that
string may be part of a data structure that could be used for other
purposes during the parsing, when alteration by strtok
makes the
data structure temporarily inaccurate.
The string that you are operating on might even be a constant. Then
when strtok
tries to modify it, your program will get a fatal
signal for writing in read-only memory. See section Program Error Signals.
This is a special case of a general principle: if a part of a program does not have as its purpose the modification of a certain data structure, then it is error-prone to modify the data structure temporarily.
The function strtok
is not reentrant. See section Signal Handling and Nonreentrant Functions, for
a discussion of where and why reentrancy is important.
Here is a simple example showing the use of strtok
.
#include <string.h> #include <stddef.h> ... char string[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *token; ... token = strtok (string, delimiters); /* token => "words" */ token = strtok (NULL, delimiters); /* token => "separated" */ token = strtok (NULL, delimiters); /* token => "by" */ token = strtok (NULL, delimiters); /* token => "spaces" */ token = strtok (NULL, delimiters); /* token => "and" */ token = strtok (NULL, delimiters); /* token => "punctuation" */ token = strtok (NULL, delimiters); /* token => NULL */
The GNU C library contains two more functions for tokenizing a string which overcome the limitation of non-reentrancy.
strtok
this function splits the string into several
tokens which can be accessed be successive calls to strtok_r
.
The difference is that the information about the next token is not set
up in some internal state information. Instead the caller has to
provide another argument save_ptr which is a pointer to a string
pointer. Calling strtok_r
with a null pointer for
newstring and leaving save_ptr between the calls unchanged
does the job without limiting reentrancy.
This function was proposed for POSIX.1b and can be found on many systems which support multi-threading.
strsep
move the pointer along the tokens
separated by delimiter, returning the address of the next token
and updating string_ptr to point to the beginning of the next
token.
This function was introduced in 4.3BSD and therefore is widely available.
Here is how the above example looks like when strsep
is used.
#include <string.h> #include <stddef.h> ... char string[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *running; char *token; ... running = string; token = strsep (&running, delimiters); /* token => "words" */ token = strsep (&running, delimiters); /* token => "separated" */ token = strsep (&running, delimiters); /* token => "by" */ token = strsep (&running, delimiters); /* token => "spaces" */ token = strsep (&running, delimiters); /* token => "and" */ token = strsep (&running, delimiters); /* token => "punctuation" */ token = strsep (&running, delimiters); /* token => NULL */
Go to the first, previous, next, last section, table of contents.