return to first page linux journal archive
keywordscontents

Linux Programming Hints

Strange I/O

For several good reasons, the Linux standard libary implements standard I/O (stdio) in a somewhat strange way. Unfortunately, many programs make unwarranted assumptions about how stdio is implemented that cause the programs not to compile properly under Linux. I have alluded to this problem before in this column; this month I will explain how to fix such source code to compile under any operating system, including Linux.

by Michael K. Johnson

Linux stdio is not exactly non-standard; that would imply that there is a real standard for how standard I/O is supposed to be implemented. Theoretically, all I/O operations that use the stdio library should only use the "published" FILE mechanisms, which are abstract, and should not pay any attention to details. Unfortunately, many stdio implementations are rather slow, and do not provide functionality that programs need.

Instead of writing a working replacement for stdio, many programmers chose to abuse the stdio interface by directly accessing "private" members of the FILE structure that are not guaranteed to be the same from system to system. In practice, this worked very well from system to system, because almost all the systems came from the same source and a prototype using the same variable names was widely available.

Programmers learned, for instance, that the _cnt member of the FILE structure contained the number of bytes which had been read by the library but not yet read by the application, and that the _ptr member contained a pointer to the buffer in which the characters that had been pre-fetched by the library were stored. It was general knowlege that behind the scenes, the _filbuf() (sometimes called _ _filbuf() ) macro was called to cause the stdio library to read more characters.

This worked as long as everyone used similar stdio implementations. Many well-respected applications used these methods to get around stdio; GNU emacs and the Rand MH mail handler are among them.

Linux is different.

The Linux stdio is based on the GNU libg++ iostream I/O. The FILE structure looks, in part, like this (from libio.h):

int _flags;          /* High-order word is 
_IO_MAGIC; rest is flags. */
#define _IO_file_flags _flags
char* _IO_read_ptr;   /* Current read pointer */
char* _IO_read_end;   /* End of get area. */
char* _IO_read_base;   /* Start of putback+get area. */
char* _IO_write_base; /* Start of put area. */
char* _IO_write_ptr;  /* Current put pointer. */
char* _IO_write_end;  /* End of put area. */
char* _IO_buf_base;   /* Start of reserve area. */
char* _IO_buf_end;    /* End of reserve area. */

This isn't at all the same. It is better optimized: instead of having one _ptr element, it has one pointer for reading, and one for writing, and a buffer for each as well. Instead of keeping track of the number of characters in the buffer, a pointer to the end of each buffer is kept, as well as the curernt pointer. It makes it easier to use all sorts of things as files, including shared memory, SYSV IPC, and anything else that fits the paradigm; it is dynamically extensible. It is also shared between C++ and C, and makes the C++ iostream implementation more robust because of the extra testing it gets as a standard io package.

If you have worked with the Linux or GNU C libraries in the past, you will notice that the names have changed. They used to be shorter names like _pbase and _pptr that looked like they were related to the old stdio names. In November 1993, the names were changed to what you see above. It is not anticipated that these will change again in the foreseeable future. See the sidebar "Old Names to New" for a listing of how the names changed.

Replacing direct access to the members of the FILE structure with abstract macros can make it possible to compile offending source on any system. Since the Linux stdio makes a distinction between reading and writing, the first thing to determine is whether each code fragment is reading or writing. Then you replace the direct use of the members of the FILE structure with macros; ones that are specific to reading an writing. Finally, you write the macros; one set for Linux, and one set for "standard" stdio. Here are some of mine:

Old Names to New

Under Linux or other similar stdio implementation:

#ifdef _STDIO_USES_IOSTREAM /* defined in libio.h */
#define FWptr(f) ((f)->_IO_write_ptr)
#define FRptr(f) ((f)->_IO_read_ptr)
#define Fptr(f)  (((f)->_IO_file_flags &&	\
		      _IO_CURRENTLY_PUTTING) ?	\
		  FWptr(f) :	\
		  FRptr(f))
#define FWcnt(f) (((f)->_IO_write_end - \
		  (f)->_IO-write_ptr) > 0 ? 0 : \
		  (f)->_IO_write_end - (f)->	\
		       _IO_write_ctr)
#define FRcnt(f) (((f)->_IO_read_end -	\
		  (f)->_IO_read_ptr) > 0 ? 0 :	\
		  (f)->_IO_read_end - (f)->	\
		       _IO_read_ctr)
#define Fcnt(f)  (((f)->_IO_file_flags &&	\
		      _IO_CURRENTLY_PUTTING) ?	\
		  FWcnt(f) :	\
		  FRcnt(f))
#define Ffill(f) __underflow(f)
#define Fflsh(f) __overflow(f)

Under "standard" stdio:

#else /* standard stdio */
#define Fptr(f)  ((f)->_ptr)
#define FWptr(f) Fptr(f)
#define FRptr(f) Fptr(f)
#define Fcnt(f)  ((f)->_cnt)
#define FWcnt(f) Fcnt(f)
#define FRcnt(f) Fcnt(f)
#define Ffill(f) _filbuf(f)
#define Fflsh(f) _flsbuf(f)
#endif

Note that some code may use f->_cnt as an lvalue (a variable to which something is assigned). In these cases, f->_ptr will always also be assigned to; both need to be updated at the same time in the standard stdio library. Since the "count" values in these abstraction macros are calculations for iostream-based stdio, they cannot be lvalues. However, since they depend on the "pointer" values and the "end" values, and the "pointer" values are updated, and the end" values don't change, they do not need to have the updated values assigned to them. Therefore,

f->_ptr++;
f->_cnt; ;

becomes (assuming that the code is reading using this pointer):

FRptr(f)++;
#ifndef _STDIO_USES_IOSTREAM
FRcnt(f); ;
#endif

or simply

Fptr(f)++;
#ifndef _STDIO_USES_IOSTREAM
Fcnt(f); ;
#endif

if you are not sure whether the code is reading or writing.

I will warn you: trying to apply these instructions and macros to code you are porting without understanding the code you are working on is likely to be disastrous. Your application may compile, but quietly lose data if you put the wrong macros in. It is most important not to use the FR*() macros when the library is writing, and not to use the FW*() macros when the library is reading. If you can't tell which is being done, you are far better off using the generic versions, Fptr() and Fcnt(), than you are guessing.

There are other mistakes waiting to be made, and I can't cover them all, because I don't know what they all are. The source code to the Linux C libary is available via ftp from tsx-11.mit.edu and sunsite.unc.edu, and is distributed with many Linux distributions. Reading the libc source code (usually found in /usr/src/libc-linux/libio/), and understanding what it is doing, is the safest route to knowing what to do when porting code that makes assump-tions about stdio. This article alone can only help you along your way; you will still have to understand the program you are porting and the Linux stdio to achieve success.

Michael K. Johnson may be reached by e-mail as johnsonm@merengue.oit.unc.edu

  Previous    Next