ISO/ IEC JTC1/SC22/WG14 N783

SC22/WG14 N783


Significant outstanding issues
Clive D.W. Feather
clive@demon.net
1997-10-20


Abstract
========
This paper is assembled from those elements of N720, N735, and N739 that
involve significant outstanding issues in the Standard.

Items are given a serial number in this paper, but also carry a note
stating their origin. Items taken from N720 do not have a rationale; the
related DR explains the issues.

Change bars are included where part of a large piece of text is changed,
but not in some items where nearly all the quoted text is changed.
References are relative to Draft 11 pre 3.


Specific items
==============

Item 1
[Was N720 DR 166]
-----------------
Many constraints refer to lvalues, yet the current definition can make
it impossible to tell if something is an lvalue until runtime. [6.3.2.4,
6.3.3.1, 6.3.16 are mentioned; 6.3.3.2 has already been addressed.]
[Wording needed]


Item 2
[Was N720 DR 172]
-----------------
There are a number of defects in the rules for pointer comparison. These
should be fixed. Suitable wording is provided in the original DR.


Item 3
[Was N739 item 11]
------------------
In subclause 6.5.2.1, change paragraph 3 from:

    The expression that specifies the width of a bit-field shall be
    an integral constant expression that has nonnegative value that
    shall not exceed the number of bits in an ordinary object of
    compatible type. If the value is zero, the declaration shall have
    no declarator.

to:

    The expression that specifies the width of a bit-field shall be
    an integral constant expression that has nonnegative value that
 |  shall not exceed the number of bits in an object of the type
 |  that would be specified if the colon and expression had been
 |  omitted. If the value is zero, the declaration shall have
    no declarator.

The current wording doesn't say *what* the type is compatible with.


Item 4
[Was N739 item 12]
------------------
Subclause 6.5.2.2 allows an enumerated type (say /enum e/) to be
compatible with /long/ or even /unsigned long long/. On the other hand,
subclause 6.2.1.1 states that the type converts to /int/ or /unsigned
int/ as part of the integral promotions. This produces the apparent
contradiction that two compatible types promote differently !

There are two alternative approaches to solving this.

(A) Change subclause 6.5.2.2 paragraph 4 from:

    Each enumerated type shall be compatible with an integer type.
    The choice of type is implementation-defined, but shall be capable
    of representing the values of all the members of the enumeration.

to:

 |  Each enumerated type shall be compatible with one of the following
 |  types:
 |      signed char             unsigned char
 |      signed short            unsigned short
 |      signed int              unsigned int
    The choice of type is inplementation-defined, but shall be capable
    of representing the values of all the members of the enumeration.

(B) Change subclause 6.2.1.1 paragraph 1 from:

    A /char/, a /short int/, or an /int/ bit-field, or their signed or
    unsigned versions, or an enumeration type, may be used in an
    expression wherever an /int/ or /unsigned int/ may be used. If an
    /int/ can represent all values of the original type, the value is
    converted to an /int/; otherwise, it is converted to an /unsigned
    int/. These are called the /integral promotions/.[37] All other
    arithmetic types are unchanged by the integral promotions.

to:

    A /char/, a /short int/, or an /int/ bit-field, or their signed or
 |  unsigned versions, may be used in an
    expression wherever an /int/ or /unsigned int/ may be used. If an
    /int/ can represent all values of the original type, the value is
    converted to an /int/; otherwise, it is converted to an /unsigned
 |  int/. These are called the /integral promotions/.[37]

 |  An enumeration type may be used in an expression wherever the type
 |  that it is compatible with may be used. The integral promotions
 |  cause the value to be converted in the same way as that compatible
 |  type would be.

    All other arithmetic types are unchanged by the integral promotions.

and in subclause 6.5.2.2, change the first sentence of paragraph 4 from:

    Each enumerated type shall be compatible with an integer type.

to:

 |  Each enumerated type shall be compatible with some signed or
 |  unsigned integral type.

[At present, enumerated types *are* integer types; the intent is to make
them clearly compatible with one of the 10 types named in 6.1.2.5.]


Item 5
[Was N720 DRs 072, 073 and 178]
-------------------------------
These DRs leave the issue of the "struct hack" totally confused.

One way out may be to explicitly bless the following:

    struct hack
    {
        /* other members */
        T last [];  /* Last member may be an indeterminate size array */
    }

sizeof (struct hack) would equal offsetof (struct hack, last). The
notation is an explicit warning that last will be accessed as a VLA
within malloced memory. In any case, wording will be required.


Item 6
[Was N720 DR 142]
-----------------
The Technical Corrigendum given in the DR misses the point. The words
"unless explicitly stated otherwise" aren't needed, because they are
implicit in any reading of the Standard, but in any case they don't solve
the original problem.

What the DR asked about is using #undef with reserved identifiers,
something which is currently strictly conforming. The following change
(suggested in the DR) is necessary to allow an implementation to make use
of flag macros such as _INCLUDED_STDIO_H.

Append to 7.1.3:

    If the program removes (with #undef) any macro definition of an
    identifier in the first group listed above, the behaviour is
    undefined.


Item 7
[Was N735 item 2]
-----------------
There was a long discussion some time ago about the following code:

    printf ("%n foo %n", &i, &i);

and whether it is strictly conforming. I would suggest that we need the
following somewhere in 7.1 (either as a new 7.1.9, or add in 7.1.8 after
paragraph 2):

[1] Except where explicitly stated, there are no sequence points during
    the evaluation of a library function. Where a function's action is
    described in sequential terms, or one function is defined in terms
    of calls to another, this is for the purpose of describing the final
    effect, and does not require the events to actually occur in that
    order, or for an actual call to the other function to occur.

[2] Nevertheless, there is a sequence point immediately before the
    function is called (as specified by subclause 6.3.2.2), and
    immediately before it returns.

[3] Example

    The call:

        int i;
        (printf) ("%n %n", &i, &i)

    invokes undefined behaviour, because it assigns to i twice between
    the same pair of sequence points. Even though printf is defined in
    terms of calls to putc(), it is not required for such a call
    actually to occur, nor for there to be a sequence point before and
    after outputting the space.

There was discussion on this item at London, but no resolution.


Item 8
[Was N735 item 10]
------------------
Locales are currently treated as extremely opaque. It is not possible to
determine whether two locales are equivalent in a category. It is not
even sensible to compare locale strings for equality; the string
returned need not be the same as the string passed in, even if it was
also the string returned from a previous call. That is:

    char *loc;
    char copy_loc [LARGE_ENOUGH];

    loc = setlocale (LC_COLLATE, "C");
    if (strcmp (loc, "C") != 0)
        do_something ();                        // This can happen
    assert (strlen (loc) < LARGE_ENOUGH);
    strcpy (copy_loc, loc);
    loc = setlocale (LC_COLLATE, "C");
    if (strcmp (loc, copy_loc) != 0)
        do_something ();                        // This can happen

I realize that most systems store most locales in files, and therefore
comparing for functional equality is not as simple as it might seem.
However, I would recommend the following as a minimum:

(1) Add to 7.5.1.1 (setlocale()) paragraph 8:
    Furthermore, if this string value is passed to the setlocale
    function with the same category, the result shall be the same string
    value.

(2) Add either a function to compare two locale strings for functional
equivalence in a category, or a function to compare a locale string with
the current locale in a category. Functional equivalence is defined as:
    No behaviour defined in clause 7, other than the result of the
    setlocale function, changes as a result of changing the locale.
Note that "strictly conforming" is not a good term to use in any
comparison.


Item 9
[Was N735 item 11]
------------------
The localeconv() function discusses monetary and non-monetary
formatting, especially the former, but provides no easy way to implement
it. The natural place to do this is the printf() family of functions.
Therefore add to 7.13.6.1 (fprintf()):

  Flag , (comma):
    for d, i, o, u, x, X, f, F, e, E, g, G, a, and A conversions, the
    output shall be grouped in accordance with the /thousands_sep/ and
    /grouping/ fields of the locale. For other conversions, the
    behaviour is undefined.

  Format or flag $ (dollar):
    [It is unclear whether this is better as a flag or a format.]
    Generate a formatted monetary quantity. If it is a format, the
    argument is a double (or long double if L is included). The plus and
    space flags act as if the output already included a sign (even if it
    does not). The # flag specifies international formatting. The minus
    and zero flags can be used. If no precision is specified, the value
    of /frac_digits/ or /int_frac_digits/ from the current locale is
    used; if that is CHAR_MAX, the precision is unspecified. [If it is a
    flag, this would overrule the normal meaning of the precision.]

Issues:

[comma]

Should there be a mechanism to allow the grouping to depend on the
format (e.g. decimal output grouped in threes, hex output grouped in
fours) ?

I am informed that there are circumstances where the /thousands_sep/
character is different for each grouping. For example, a notation
commonly used in Japan (particularly in newspapers) places characters
meaning "myriad", "hundred million", "billion" and so on between the
groups. This would require changing the separator to be a list of
strings, and providing a convention to indicate this (for example, using
CHAR_MAX as the first byte of the string).

[dollar]

I've used the normal rule that the specified precision overrides the
default. An alternative would be that the precision applies only if the
locale-specified value is CHAR_MAX. Which is preferable, or should there
be a way to choose ?

If $ is a flag and is used with %d, should it scale the value to the
appropriate number of fractional digits ? For example, "%$6.2d" might
indicate that the integer is to be printed in /ddd.dd/ form, with 12345
being printed as "123.45". Should %$d and %$i behave differently in this
case ? If $ is a format, should there be an equivalent for integral
types ?

Since this proposal was drafted, it has been pointed out to me that any use
of $ will conflict with the X/Open mechanisms, which use descriptors of the
form "%1$d", "$*2$3$d", and "%*6$.*5$4$d".


Item 10
[Was N735 item 14]
------------------
The Standard is somewhat unclear about the details of stdio buffering.
For example, considering output (the analogous situation happens with
input) a call to fputc() can have one of the following effects:
(1) the character is sent to the underlying system;
(2) the character is written to a buffer;
(3) the character is written to a buffer and then a number of characters
are sent to the underlying system from the buffer;
(4) a number of characters are sent to the underlying system from a
buffer, and then the character is written to the buffer.

In case (1), failure can be reported in a straightforward manner, and it
can be assumed that case (2) never fails. The question is: what will
happen if cases (3) or (4) have a failure during the output, but not
directly as a result of that character (that is, the error occurs
earlier on in the buffer) ?

The present wording of the Standard implies that an error in outputting
a character can only be reported on that call to fputc(), and not on any
subsequent call. This needs to be changed, or buffering becomes a
nonsense - the implementation would be required to *predict* whether a
write will succeed. A suitable location is 7.13.3, and the wording needs
to say something along the following lines:

    If output is buffered, then it may be transmitted to the host
    environment at any subsequent call to fputc(), and shall be
    transmitted no later than the next fflush() call or when the stream
    is closed. Thus a call to fputc() may fail and set the error
    indicator on the stream because of the earlier output. Similarly, if
    input is buffered, a call to fgetc() may cause the error indicator
    to be set even though the same call on an unbuffered stream would
    not (because the error is associated with a later character in the
    input).

    Even if the data is successfully transmitted to the host environment,
    it is possible for an error to occur within the latter. If this happens
    after the stream has been closed, it can not be reported to the
    application; if it occurs earlier, it is implementation-defined when
    it is so reported.

A secondary issue is: can the buffer be sent to the underlying system
other than within a call to fputc(); is asynchronous I/O permitted ? If
so, then:

    When a stream is buffered, characters may be transmitted to or from
    the host environment other than as part of a library function, and
    thus the error indicator for the stream may be set outside such a
    function (the indicator can only be cleared as part of a function
    that explicitly states it does so).


Item 11
[Was N735 item 15]
------------------
Is there a need to provide a way to make the three standard streams be
binary, in the same way that they can already be made wide ? Without it,
there's no strictly-conforming way to write "cat". Even with it there is
the trailing zero byte problem.


Item 12
[Was N735 item 16]
------------------
There is no way to determine whether two fpos_t values represent the
same position in a file. Therefore, it is not possible to do the
following:

    open a file
    read through it, looking for some mark
    note the position using fgetpos()
    rewind
    read through it again to the same position, using calls to fgetpos()
      to determine where you are, rather than recalculating it

I suggest the following function be added to subclause 7.13.10:

    struct fcmppos fcmppos (fpos_t* a, fpos_t* b, FILE *stream)

    Compares two fpos_t values that refer to the given stream; if either
    argument is a null pointer, the result of a call to fgetpos() on the
    stream is used instead. The resulting structure contains at least
    the following fields:

    int before;   // Less than, equal to, or greater than zero according
                  // to whether /a/ is before, at the same location as,
                  // or after /b/ in the file.
    int mbstate;  // Zero if the two positions have the same multibyte
                  // parsing status.

    If the stream has been written to at any point before the later of
    the two positions, the behaviour is undefined.


Item 13
[Was N735 item 19]
------------------
The specification of the comparison functions for bsearch() and qsort()
(7.14.5.1 and 7.14.5.2) is insufficient to safely code them. In
particular, it does not address the following issues.
(1) Are the pointers to objects within the base array (or the key
object), or can they be to copies ?
(2) Can the comparison alter the values of the pointed-to objects ?
(3) If so, does the alteration persist ?
(4) What are the requirements on the consistency of the comparison
results ?

I propose that comparisons are not allowed to alter the values, and
therefore that the implementation can pass pointers to copies of the
objects. [This, of course, invalidates an item in one of my articles in
CUJ :-]

Therefore add the following immediately after the heading of 7.14.5
(there is currently no text between that and the heading of 7.14.5.1).

[1] These utilities make use of a comparison function. This shall
    behave in the following way.

[2] The implementation shall ensure that the second argument (when
    called from /bsearch/), or both arguments (when called from
    /qsort/), shall be pointers to an element of the array, or to a
    copy of such an element. The first argument when called from
    /bsearch/ shall equal /key/. The function shall make its comparison
    based on the pointed-to objects, and not the specific addresses
    passed to it.

[3] The comparison function shall not alter the contents of the array.
    The implementation may reorder elements of the array between calls
    to the comparison function, but shall not alter the contents of any
    individual element.

[4] When the same object (consisting of /size/ bytes, irrespective of
    its current position in the array) is passed more than once to the
    comparison function, the results shall be consistent with one
    another. That is, for /qsort/ they shall define a total ordering on
    the array, and for /bsearch/ the same object shall always compare
    the same way with the key.

[5] A sequence point occurs immediately before and immediately after
    each call to the comparison function, and also between any call to
    the comparison function and any movement of the objects passed as
    arguments to that call.

If it is felt desirable that the pointers *shall* always point into the
array, then replace paragraph [2] above by:

[2] The implementation shall ensure that the second argument (when
    called from /bsearch/), or both arguments (when called from
    /qsort/), shall be pointers to elements of the array [*]. The
    first argument when called from /bsearch/ shall equal /key/.

    [*] That is, if the value passed is /p/, then the following
    expressions are always non-zero:
        ((char *) p - (char *) base) % size == 0
        (char *) p >= (char *) base
        (char *) p < (char *) base + nmemb * size


Item 14
[Was N720 DR 063]
-----------------
What is the required precision of floating point calculations ?


Item 15
[Was N720 DR 087]
-----------------
The issue of sequence points, parallel evaluation, and so on still needs
to be faced squarely. It isn't easy [example: x = f (x++)].


Item 16
[Was N739 item 1]
-----------------
The term "access" is not well defined. From context, it sometimes
appears to mean "read the value", and sometimes "read or write the
value". This ambiguity sometimes makes it hard to understand what is
actually meant.

There needs to be a definition in clause 3, and all uses of the term
need to be checked for the read-only / read-write problem. Probably the
best approach is to define it as "read or write", and to find and fix
the places where "read" is meant.

An example of the "read" usage is 6.3.2.3 paragraph 5:

    With one exception, if a member of a union object is accessed after
    a value has been stored in a different member of the object, the
    behaviour is implementation-defined.

where writing is clearly meant to be excluded.

An example of the "read or write" usage is 6.3 paragraph 6:

    ... If a value is stored into an object ... the type of the lvalue
    becomes the effective type of the object for that access and for
    subsequent accesses ...

where writing is clearly meant to be included.

An example where this causes problems with interpreting the Standard is
6.5.3. Paragraph 11 reads:

    A reference to a value means either an access to or a modification
    of the value.

So "access" presumably means read, but not write. But then paragraph 6
reads:

    What constitutes an access to an object that has volatile-qualified
    type is implementation-defined.

So what constitutes a write to a volatile object is *not* implementation-
defined ?

There are other instances; this is the first one that comes to mind.