Go to the first, previous, next, last section, table of contents.


Collation Functions

In some locales, the conventions for lexicographic ordering differ from the strict numeric ordering of character codes. For example, in Spanish most glyphs with diacritical marks such as accents are not considered distinct letters for the purposes of collation. On the other hand, the two-character sequence `ll' is treated as a single letter that is collated immediately after `l'.

You can use the functions strcoll and strxfrm (declared in the header file `string.h') to compare strings using a collation ordering appropriate for the current locale. The locale used by these functions in particular can be specified by setting the locale for the LC_COLLATE category; see section Locales and Internationalization.

In the standard C locale, the collation sequence for strcoll is the same as that for strcmp.

Effectively, the way these functions work is by applying a mapping to transform the characters in a string to a byte sequence that represents the string's position in the collating sequence of the current locale. Comparing two such byte sequences in a simple fashion is equivalent to comparing the strings with the locale's collating sequence.

The function strcoll performs this translation implicitly, in order to do one comparison. By contrast, strxfrm performs the mapping explicitly. If you are making multiple comparisons using the same string or set of strings, it is likely to be more efficient to use strxfrm to transform all the strings just once, and subsequently compare the transformed strings with strcmp.

Function: int strcoll (const char *s1, const char *s2)
The strcoll function is similar to strcmp but uses the collating sequence of the current locale for collation (the LC_COLLATE locale).

Here is an example of sorting an array of strings, using strcoll to compare them. The actual sort algorithm is not written here; it comes from qsort (see section Array Sort Function). The job of the code shown here is to say how to compare the strings while sorting them. (Later on in this section, we will show a way to do this more efficiently using strxfrm.)

/* This is the comparison function used with qsort. */

int
compare_elements (char **p1, char **p2)
{
  return strcoll (*p1, *p2);
}

/* This is the entry point---the function to sort
   strings using the locale's collating sequence. */

void
sort_strings (char **array, int nstrings)
{
  /* Sort temp_array by comparing the strings. */
  qsort (array, sizeof (char *),
         nstrings, compare_elements);
}

Function: size_t strxfrm (char *to, const char *from, size_t size)
The function strxfrm transforms string using the collation transformation determined by the locale currently selected for collation, and stores the transformed string in the array to. Up to size characters (including a terminating null character) are stored.

The behavior is undefined if the strings to and from overlap; see section Copying and Concatenation.

The return value is the length of the entire transformed string. This value is not affected by the value of size, but if it is greater or equal than size, it means that the transformed string did not entirely fit in the array to. In this case, only as much of the string as actually fits was stored. To get the whole transformed string, call strxfrm again with a bigger output array.

The transformed string may be longer than the original string, and it may also be shorter.

If size is zero, no characters are stored in to. In this case, strxfrm simply returns the number of characters that would be the length of the transformed string. This is useful for determining what size string to allocate. It does not matter what to is if size is zero; to may even be a null pointer.

Here is an example of how you can use strxfrm when you plan to do many comparisons. It does the same thing as the previous example, but much faster, because it has to transform each string only once, no matter how many times it is compared with other strings. Even the time needed to allocate and free storage is much less than the time we save, when there are many strings.

struct sorter { char *input; char *transformed; };

/* This is the comparison function used with qsort
   to sort an array of struct sorter. */

int
compare_elements (struct sorter *p1, struct sorter *p2)
{
  return strcmp (p1->transformed, p2->transformed);
}

/* This is the entry point---the function to sort
   strings using the locale's collating sequence. */

void
sort_strings_fast (char **array, int nstrings)
{
  struct sorter temp_array[nstrings];
  int i;

  /* Set up temp_array.  Each element contains
     one input string and its transformed string. */
  for (i = 0; i < nstrings; i++)
    {
      size_t length = strlen (array[i]) * 2;
      char *transformed;
      size_t transformed_lenght;

      temp_array[i].input = array[i];

      /* First try a buffer perhaps big enough.  */
      transformed = (char *) xmalloc (length);

      /* Transform array[i].  */
      transformed_length = strxfrm (transformed, array[i], length);

      /* If the buffer was not large enough, resize it
         and try again.  */
      if (transformed_length >= length)
        {
          /* Allocate the needed space. +1 for terminating
             NUL character.  */
          transformed = (char *) xrealloc (transformed,
                                           transformed_length + 1);

          /* The return value is not interesting because we know
             how long the transformed string is.  */
          (void) strxfrm (transformed, array[i], transformed_length + 1);
        }

      temp_array[i].transformed = transformed;
    }

  /* Sort temp_array by comparing transformed strings. */
  qsort (temp_array, sizeof (struct sorter),
         nstrings, compare_elements);

  /* Put the elements back in the permanent array
     in their sorted order. */
  for (i = 0; i < nstrings; i++)
    array[i] = temp_array[i].input;

  /* Free the strings we allocated. */
  for (i = 0; i < nstrings; i++)
    free (temp_array[i].transformed);
}

Compatibility Note: The string collation functions are a new feature of ISO C. Older C dialects have no equivalent feature.


Go to the first, previous, next, last section, table of contents.