N2229: Equality With true

Document Number: N2229
Submitter: Martin Sebor
Submission Date: March 26, 2018
Subject: Equality With true

Summary

Pop quiz question: What is the output of the program below?

	#include <ctype.h>
	#include <stdbool.h>
	#include <stdio.h>

	int main (void)
	{
	    const char *s = "123";
  	    int n = 0;
	    while (isdigit (*s) == true)
	    {
	        n = n * 10 + *s - '0';
	        ++s;
	    }
	    printf ("%i\n", n);
	}

As it turns out, the answer is that it depends.

On the system we have tested (Fedora Linux 25 with GNU libc 2.24), the output of the program when compiled as C++ is 123.

In other programming languages that provide a function like isdigit (e.g., C#, D, Java, or Python) the output of the equivalent program is also 123.

However, on the same system, the output of the identical C program is 0. Why is that? Does C not consider the characters in "123" to be digits?

The specification for isdigit in §7.4.1 Character classification functions copied below makes it sound as though the C function should return true for decimal digits as well, just like in other languages:

7.4.1 Character classification functions

-1- The functions in this subclause return nonzero (true) if and only if the value of the argument c conforms to that in the description of the function.

…

7.4.1.5 The isdigit function

-2- The isdigit function tests for any decimal-digit character (as defined in 5.2.1).

What gives? The problem is two-fold. First, unlike in all the other languages where isdigit returns a value of a Boolean type (i.e., literally true or false), in C (and in C++) isdigit returns an int. Second, the problem then is that a true result doesn't necessarily imply the nonzero returned value is equal to the value of the true constant defined in <stdbool.h> (i.e., 1). This is because C implementations commonly define isdigit and the rest of the character classification macros in terms of a bitwise expression such as the following (taken from the GNU C library):

	#define isdigit(c) \
  	  ((*__ctype_b_loc ())[(int) (c)] & (unsigned short) _ISdigit)

and whether the result of the bitwise AND is equal to 1 depends on the value of the tested bit. In Glibc, _ISdigit is an enumerator defined like so

        _ISdigit = ((3) < 8 ? ((1 << (3)) << 8) : ((1 << (3)) >> 8))

The value of the enumerator (for those not accustomed to shifting bits in their heads while reading) is 2048, and so the macro evaluates to either 0 or 2048, but never to 1.

C++ doesn't allow library functions to be implemented as macros and so there isdigit expands to a call to a function which tends to (but isn't required to) return 0 or 1 rather than the result of a bitwise expression, so the difference arises less often.

With the mystery solved and the bug fixed by changing the controlling expression in while loop to lose the redundant equality to true

	while (isdigit (*s))

the program works as expected in C as well.

But another, bigger, question remains: is there something C could do to help programmers avoid this mistake, not just for isdigit and the other character classification functions, but in general?

We believe there is. As mentioned above, the general problem isn't specific to C but impacts all languages whose Boolean type is convertible to other integer types. From the set above that's C and C++. (It doesn't impact languages like C# or Java where such conversions are not permitted.) And as it happens, at least one C++ implementation does detect such mixed-type euality expressions and issues a helpful warning:

	warning C4805: '==': unsafe mix of type 'int' and type 'bool' in operation

Could the same approach be usedby C compilers? Unfortunately, because C requires the true macro to expand to the constant 1, there is no easy way for a C compiler to distinguish its use in this context from an equality comparison with the plain integer constant 1. The Proposed Resoltution below suggests to make a small change to make diagnosing this construct possible even in C.

Proposed Resolution

To enable C implementations to easily detect coding mistakes like the one discussed above we propose to make the following changes. With true (and less crucially, also false) having their own type that is distinct from other scalar types, C compilers also will be able to detect expressions that involve operands of mixed types and help users prevent the bug above by issuing diagnoistics.

7.18 Boolean type and values <stdbool.h>

-1- The header <stdbool.h> macros.
…

-3- The remaining three macros are suitable for use in #if preprocessing directives. They are

	  true

which expands to the integer constant 1with type _Bool ^{new-footnote)},

	  false

which expands to the integer constant 0 with type _Bool , and
…

new-footnote) Definitions that meet this requirement are

	#define true ((_Bool)+1)
	#define false ((_Bool)+0)

The definitions in the footnote are suitable for use in #if directives because the _Bool token is replaced by 0 by the preprocessor, and the true and false expressions then expand to ((0)+1) and ((0)+0), respectively.

The proposal was prompted by GCC request for enhancement 82272. As mentioned in comment 4 on the request, there may be other ways to achieve this effect but, as is evident from the absence of C implementations that detect this problem, they are difficult to implement or the problem is sufficiently obscure, or both. By making the suggested change the C standard will bring the potential for such bugs to the fore and make it easier for implementations to detect than it is today.