Go to the first, previous, next, last section, table of contents.


GNU tar and POSIX tar

GNU tar was based on an early draft of the POSIX 1003.1 ustar standard. GNU extensions to tar, such as the support for file names longer than 100 characters, use portions of the tar header record which were specified in that POSIX draft as unused. Subsequent changes in POSIX have allocated the same parts of the header record for other purposes. As a result, GNU tar is incompatible with the current POSIX spec, and with tar programs that follow it.

We plan to reimplement these GNU extensions in a new way which is upward compatible with the latest POSIX tar format, but we don't know when this will be done.

In the mean time, there is simply no telling what might happen if you read a GNU tar archive, which uses the GNU extensions, using some other tar program. So if you want to read the archive with another tar program, be sure to write it using the `--old-archive' option (`-o').

@FIXME{is there a way to tell which flavor of tar was used to write a particular archive before you try to read it?}

Traditionally, old tars have a limit of 100 characters. GNU tar attempted two different approaches to overcome this limit, using and extending a format specified by a draft of some P1003.1. The first way was not that successful, and involved `@MaNgLeD@' file names, or such; while a second approach used `././@LongLink' and other tricks, yielding better success. In theory, GNU tar should be able to handle file names of practically unlimited length. So, if GNU tar fails to dump and retrieve files having more than 100 characters, then there is a bug in GNU tar, indeed.

But, being strictly POSIX, the limit was still 100 characters. For various other purposes, GNU tar used areas left unassigned in the POSIX draft. POSIX later revised P1003.1 ustar format by assigning previously unused header fields, in such a way that the upper limit for file name length was raised to 256 characters. However, the actual POSIX limit oscillates between 100 and 256, depending on the precise location of slashes in full file name (this is rather ugly). Since GNU tar use the same fields for quite other purposes, it became incompatible with the latest POSIX standards.

For longer or non-fitting file names, we plan to use yet another set of GNU extensions, but this time, complying with the provisions POSIX offers for extending the format, rather than conflicting with it. Whenever an archive uses old GNU tar extension format or POSIX extensions, would it be for very long file names or other specialities, this archive becomes non-portable to other tar implementations. In fact, anything can happen. The most forgiving tars will merely unpack the file using a wrong name, and maybe create another file named something like `@LongName', with the true file name in it. tars not protecting themselves may segment violate!

Compatibility concerns make all this thing more difficult, as we will have to support all these things together, for a while. GNU tar should be able to produce and read true POSIX format files, while being able to detect old GNU tar formats, besides old V7 format, and process them conveniently. It would take years before this whole area stabilizes...

There are plans to raise this 100 limit to 256, and yet produce POSIX conformant archives. Past 256, I do not know yet if GNU tar will go non-POSIX again, or merely refuse to archive the file.

There are plans so GNU tar support more fully the latest POSIX format, while being able to read old V7 format, GNU (semi-POSIX plus extension), as well as full POSIX. One may ask if there is part of the POSIX format that we still cannot support. This simple question has a complex answer. Maybe that, on intimate look, some strong limitations will pop up, but until now, nothing sounds too difficult (but see below). I only have these few pages of POSIX telling about `Extended tar Format' (P1003.1-1990 -- section 10.1.1), and there are references to other parts of the standard I do not have, which should normally enforce limitations on stored file names (I suspect things like fixing what / and NUL means). There are also some points which the standard does not make clear, Existing practice will then drive what I should do.

POSIX mandates that, when a file name cannot fit within 100 to 256 characters (the variance comes from the fact a / is ideally needed as the 156'th character), or a link name cannot fit within 100 characters, a warning should be issued and the file not be stored. Unless some --posix option is given (or POSIXLY_CORRECT is set), I suspect that GNU tar should disobey this specification, and automatically switch to using GNU extensions to overcome file name or link name length limitations.

There is a problem, however, which I did not intimately studied yet. Given a truly POSIX archive with names having more than 100 characters, I guess that GNU tar up to 1.11.8 will process it as if it were an old V7 archive, and be fooled by some fields which are coded differently. So, the question is to decide if the next generation of GNU tar should produce POSIX format by default, whenever possible, producing archives older versions of GNU tar might not be able to read correctly. I fear that we will have to suffer such a choice one of these days, if we want GNU tar to go closer to POSIX. We can rush it. Another possibility is to produce the current GNU tar format by default for a few years, but have GNU tar versions from some 1.POSIX and up able to recognize all three formats, and let older GNU tar fade out slowly. Then, we could switch to producing POSIX format by default, with not much harm to those still having (very old at that time) GNU tar versions prior to 1.POSIX.

POSIX format cannot represent very long names, volume headers, splitting of files in multi-volumes, sparse files, and incremental dumps; these would be all disallowed if --posix or POSIXLY_CORRECT. Otherwise, if tar is given long names, or `-[VMSgG]', then it should automatically go non-POSIX. I think this is easily granted without much discussion.

Another point is that only mtime is stored in POSIX archives, while GNU tar currently also store atime and ctime. If we want GNU tar to go closer to POSIX, my choice would be to drop atime and ctime support on average. On the other hand, I perceive that full dumps or incremental dumps need atime and ctime support, so for those special applications, POSIX has to be avoided altogether.

A few users requested that --sparse (-S) be always active by default, I think that before replying to them, we have to decide if we want GNU tar to go closer to POSIX on average, while producing files. My choice would be to go closer to POSIX in the long run. Besides possible double reading, I do not see any point of not trying to save files as sparse when creating archives which are neither POSIX nor old-V7, so the actual --sparse (-S) would become selected by default when producing such archives, whatever the reason is. So, --sparse (-S) alone might be redefined to force GNU-format archives, and recover its previous meaning from this fact.

GNU-format as it exists now can easily fool other POSIX tar, as it uses fields which POSIX considers to be part of the file name prefix. I wonder if it would not be a good idea, in the long run, to try changing GNU-format so any added field (like ctime, atime, file offset in subsequent volumes, or sparse file descriptions) be wholly and always pushed into an extension block, instead of using space in the POSIX header block. I could manage to do that portably between future GNU tars. So other POSIX tars might be at least able to provide kind of correct listings for the archives produced by GNU tar, if not able to process them otherwise.

Using these projected extensions might induce older tars to fail. We would use the same approach as for POSIX. I'll put out a tar capable of reading POSIXier, yet extended archives, but will not produce this format by default, in GNU mode. In a few years, when newer GNU tars will have flooded out tar 1.11.X and previous, we could switch to producing POSIXier extended archives, with no real harm to users, as almost all existing GNU tars will be ready to read POSIXier format. In fact, I'll do both changes at the same time, in a few years, and just prepare tar for both changes, without effecting them, from 1.POSIX. (Both changes: 1--using POSIX convention for getting over 100 characters; 2--avoiding mangling POSIX headers for GNU extensions, using only POSIX mandated extension techniques).

So, a future tar will have a --posix flag forcing the usage of truly POSIX headers, and so, producing archives previous GNU tar will not be able to read. So, once pretest will announce that feature, it would be particularly useful that users test how exchangeable will be archives between GNU tar with --posix and other POSIX tar.

In a few years, when GNU tar will produce POSIX headers by default, --posix will have a strong meaning and will disallow GNU extensions. But in the meantime, for a long while, --posix in GNU tar will not disallow GNU extensions like --label=archive-label (-V archive-label), --multi-volume (-M), --sparse (-S), or very long file or link names. However, --posix with GNU extensions will use POSIX headers with reserved-for-users extensions to headers, and I will be curious to know how well or bad POSIX tars will react to these.

GNU tar prior to 1.POSIX, and after 1.POSIX without --posix, generates and checks `ustar ', with two suffixed spaces. This is sufficient for older GNU tar not to recognize POSIX archives, and consequently, wrongly decide those archives are in old V7 format. It is a useful bug for me, because GNU tar has other POSIX incompatibilities, and I need to segregate GNU tar semi-POSIX archives from truly POSIX archives, for GNU tar should be somewhat compatible with itself, while migrating closer to latest POSIX standards. So, I'll be very careful about how and when I will do the correction.


Go to the first, previous, next, last section, table of contents.