ISO/ IEC JTC1/SC22/WG14 N823

   WG14/N823            C9X Public Comment               WG14/N823
                        ==================


Sponsoring National Body: J11                 Date: 98/05/15
Author: Tom MacDonald (with help from Hugh Redelmeier)
Author Affiliation: Silicon Graphics Inc.
Postal Address: 655F Lone Oak Drive, Eagan, MN 55409 USA
E-mail Address: tam@cray.com
Telephone Number: +1 612 6835818
Fax Number: +1 612 6835307
Number of individual comments: 2



Below is a copy of something Hugh Redelmeier sent to the committee
over a year ago.  I don't think WG14 ever adequately addressed the
issue.  I'm re-submitting the paper for the June 1998 meeting.
I've made a few tweaks, but tried to clearly identify them.

Tom MacDonald
tam@cray.com

================================================================

From: hugh@mimosa.com ("D. Hugh Redelmeier")
Date: 	Sat, 1 Feb 1997 04:45:42 -0500
To: sc22wg14@dkuug.dk
Subject: (SC22WG14.3377) DR166 -- lvalue constraints

I promised to write a paper on DR166.  I'm sorry for the lateness of
this.  I have shown an earlier version to larry.jones@sdrc.com,
seebs@solon.com and gwyn@arl.mil.  I have made some changes to address
their comments.  I wish to thank them for their help.  That does not
mean that they would approve of what I say here.

As I see it, the problem is with the wording of 6.2.2.1, in
particular, the first sentence [from c9x-std.txt on the ftp site]:

       [#1] An lvalue is an expression (with an object type  or  an
       incomplete   type   other  than  void)  that  designates  an
       object.38

This looks as if the syntactic recognition of an lvalue depends on it
really designating an object.  In particular, the DR suggests that
this makes the run-time behavior of the lvalue expression affect a
constraint (a compile-time notion).

There is a classic bug in English: the substitution of "that" for
"which" and vice versa.  From Fowler's Modern English Usage (alas, not
the brand new edition):

	Which, that, who:
		... (A) of "which" and "that", "which" is appropriate to
		non-defining and "that" to defining clauses. ...

		...(A) "The river, which here is tidal, is dangerous", but
		"The river that flows through London is the Thames."

I think that the simple fix is to change the first sentence of
6.2.2.1:

	An _lvalue_ is the form of expression used to designate an
	object.#38 It shall have an object type or an incomplete type
	other than void.

I think that this clearly shows the purpose of an lvalue, without
making the syntactic property depend on the runtime validity.

I have moved the parenthetical remark to its own sentence to simplify
and clarify the prose.  I wonder if it belongs in a constraint
section.

Doug Gwyn suggested that expressing the intent is wimpy:
	"There is no force in the "intent" that it be used to designate an
	object, except when it doesn't quite, so why bother to mention it?"
He suggests:
	An _lvalue_ is an expression; it shall have an object type
	or an incomplete type other than void.
I see his point, but I think that describing the purpose is useful.
I agree that the wording could be better.

It is important that any runtime restrictions be explicitly stated
somewhere.  I don't think this change redistributes that burden.  If
they are missing now, they already were (unless the "that designates an
object" did the job).

To express the runtime restrictions, we should add something like:

	When an lvalue expression is evaluated, the behavior is undefined
	if the expression does not designate an object.
or
	When an lvalue expression is evaluated, it shall designate an
	object.

It would probably be useful to add a footnote to the effect:

	[Footnote: note that the operand of a sizeof expression is not
	evaluated -- 6.3.3.4]

Larry asked:
	Can anyone think of a case where we need to require an
	lvalue to designate an object even though it isn't evaluated?
I think not, but the committee should consider this.

================================================================

Note: the following is a separable issue.  I have not prepared
suggested wording changes, so this cannot be considered as a proposal.
I am including it in case the committee is interested.

Many people have been surprised that the behavior of &a[upper_bound]
is undefined in C89.  It was and is a common idiom.  I still use it in
my code and haven't used an implementation that did something
unexpected.

Several comments expressed ambivalence about this.  I think that they
would like to support &a[upper_bound], but don't really like *(a + upper_bound)
which is pretty hard to separate.

[[...TMacD... I suspect the `*' is a typo - should be just (a + upper_bound)
              or  &(*(a + upper_bound))   ...]]

If we wish to make this form well-defined in C9x, I think we could do
so here, and in the description of unary *, and in the description of
addition involving pointers.

We would need to refine the runtime restrictions that we just added to
6.2.2.1, replacing them with:

	When an lvalue expression that is not the operand of a unary & is
	evaluated, it shall designate an object.

	When lvalue expression that is the operand of a unary & is
	evaluated, it shall designate an object or one past the last
                                                      ^
                                                      element  [[...TMacD...]]
	element of an array object.

[Perhaps this should be reworded without "shall"; the flavor should
be clear.]

We need to make some changes in 6.3.3.2 (Address and indirection operators).

Here is one paragraph from the current 6.3.3.2 that would need changing:

       [#4] The unary  *  operator  denotes  indirection.   If  the
       operand  points  to  a  function,  the  result is a function
       designator; if it points to an  object,  the  result  is  an
       lvalue  designating  the  object.   If  the operand has type
       ``pointer to type,'' the result has type  ``type.''   If  an
       invalid value has been assigned to the pointer, the behavior
       of the unary * operator is undefined.49

Here is a paragraph from the current 6.3.6 (Additive operators) that
would need to be adjusted (near the end).

       [#8] When an expression that has integral type is  added  to
       or subtracted from a pointer, the result has the type of the
       pointer operand.   If  the  pointer  operand  points  to  an
       element  of  an array object, and the array is large enough,
       the result points to an element  offset  from  the  original
       element  such  that  the difference of the subscripts of the
       resulting and original array elements  equals  the  integral
       expression.   In  other words, if the expression P points to
       the i-th element of an array object, the  expressions  (P)+N
       (equivalently,  N+(P))  and  (P)-N (where N has the value n)
       point to, respectively, the i+n-th and i- n-th  elements  of
       the  array  object,  provided  they exist.  Moreover, if the
       expression P points to the last element of an array  object,
       the expression (P)+1 points one past the last element of the
       array object, and if the expression Q points  one  past  the
       last element of an array object, the expression (Q)-1 points
       to the last element  of  the  array  object.   If  both  the
       pointer operand and the result point to elements of the same
       array object, or one past the  last  element  of  the  array
       object,  the  evaluation  shall  not  produce  an  overflow;
       otherwise, the  behavior  is  undefined.   Unless  both  the
       pointer operand and the result point to elements of the same
       array object, or the pointer operand  points  one  past  the
       last  element of an array object and the result points to an
       element of the same array object, the behavior is  undefined
       if the result is used as an operand of the unary * operator.

This paragraph seems very fragile.  In fact, I'm not sure that it
works.  For our purpose, I think that the only change would be
to delete the last sentence.  Its function should be achieved by
appropriate words in 6.3.3.2.

Hugh Redelmeier
hugh@mimosa.com  voice: +1 416 482-8253

=================== TMacD's proposed rewrite of 6.3.3.2 ====================

   6.3.3.2  Address and indirection operators

       Constraints

       [#1] The operand of the unary & operator shall be  either  a
       function designator, the result of a [] or unary * operator,
       or an lvalue that designates an object that is  not  a  bit-
       field  and  is  not declared with the register storage-class
            ^
            , or one element past the last element of an array,

       specifier.

       [#2] The operand of the unary * operator shall have  pointer
       type.

       Semantics

       [#3] The result of the unary & (address-of)  operator  is  a
       pointer to the object or function designated by its operand.
                  ^^^^^^^^^^
                  an object, or one element past the last element of an array,

       If the operand  has  type  ``type'',  the  result  has  type
       ``pointer  to  type''.   If  the  operand is the result of a
       unary * operator, neither that operator nor the  &  operator
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                         Neither operator


       are  evaluated,  and  the  result  shall  be as if both were
       ^^^
       is


       omitted, even if the intermediate  object  does  not  exist,
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                            resulting pointer does not point to
                            an object with an effective type
                            (described in 6.3) that can be accessed
                            through this pointer.

       except that the constraints on the operators still apply and
       ^^^^^^^^^^^
       However,

       the result is not an lvalue.  Similarly, if the  operand  is
       the  result of a [] operator, neither the & operator nor the
       unary * that is implied by the []  are  evaluated,  and  the
       result  shall be as if the & operator was removed and the []
       operator was changed to a + operator.

       [#4] The unary  *  operator  denotes  indirection.   If  the
       operand  points  to  a  function,  the  result is a function
       designator; if it points to an  object,  the  result  is  an
       lvalue  designating  the  object.   If  the operand has type
       ``pointer to type'', the result has type  ``type''.   If  an
       invalid value has been assigned to the pointer, the behavior
       of the unary * operator is undefined.71


[[... TMacD ...]] Although, Hugh suggests a rewrite of para 4 above,
                  I think the current wording works.  The last sentence
                  could be rewritten as:

                     If the pointer does not point to an object, the
                     behavior is undefined.


                  I also don't think these words handle the following

                            &p.a    &p->a

                  assuming "a" is a member of a union and "p" points
                  one element past the end of an array.  Not sure if
                  this is the intent.