P3374R0
Adding formatter for fpos<mbstate_t>

Published Proposal,

This version:
https://extra-creativity.github.io/public/cpp/proposals/add%20formatter%20for%20fpos.html
Author:
(Peking University)
Audience:
LEWG
Project:
ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21

Abstract

This paper aims to add formatter for std::fpos<std::mbstate_t> widely used by position indicator APIs in stream to make it convenient and robust to output.

1. Introduction

Stream-based I/O is designed since C++98 to give a thorough abstraction to I/O. An important part of the abstraction is position indicator, which is used to tell and seek the current position. Such position is most commonly conveyed by char_traits<T>::pos_type, which is fpos<typename char_traits<T>::state_type> by default and the only instantiation used in the standard library is fpos<mbstate_t>. It’s widely implemented to be able to be output by stream operator<<; however, it cannot be output directly by print in C++23 astonishingly due to lack of formatter. Moreover, the state of the position is usually neglected, making it not robust in some cases to be output by operator<<. This paper aims to add a specialization for fpos<mbstate_t> to solve these problems.

2. Motivation

Let’s see tony table to illustrate it directly:

Before
std::ofstream s{"some_file"};
// Do some output...
std::cout << s.tellg();        // ❓Yes on almost all implementations, but not robust
std::println("{}", s.tellg()); // ❌Compile error...
// To make it work same as operator<<, we have to write...
std::println("{}", (std::streamoff)s.tellg());    // 😢What?
std::println("{}", s.tellg() - std::streampos{}); // 😢No way...
After
std::ofstream s{"some_file"};
// Do some output...
std::cout << s.tellg();           // ❓Yes on almost all implementations, but not robust
std::println("{}", s.tellg());    // ✅Yes and robust!
std::println("{:d}", s.tellg());  // ✅Non-robust way can be controlled by users explicitly.
std::stringstream s2{"ABC"};
std::println("{:d}", s2.tellg()); // ✅Especially for streams that don’t need codecvt.

3. Design Decision

3.1. Core Problem

3.1.1. Not an integer, only convertible

For C programmers and those who don’t take a thorough look at the design of stream, they’re likely to regard the position as an integer directly since ftell just returns so. Actually, in C++ it’s designed as follows:

template<typename CharT, typename CharTraits = char_traits<CharT>>
class basic_iostream
{
public:
    using pos_type = typename CharTraits::pos_type;
    pos_type tellg();
    pos_type tellp();
};

And the most commonly used types are alias like:

template<typename CharT, typename CharTraits = char_traits<CharT>>
class basic_fstream : public basic_iostream<CharT, CharTraits> { ... };

using fstream = basic_iostream<char>;
using wfstream = basic_iostream<wchar_t>;

So the return type of tellg/p is usually determined by char_traits<CharT>::pos_type, where CharT is char or wchar_t. They’re defined as fpos<typename char_traits<CharT>::state_type>, and the only used instantiation in the standard library is fpos<mbstate_t>. fpos only supports limited integer operations, like subtracting another fpos to get streamoff, or adding a streamoff offset to get a new fpos. Particularly, streamoff is regulated to be an alias of a signed integer type, and fpos should be convertible to streamoff to make expression streamoff(pos) compile (See [stream.types] and [fpos.operations] in the standard).

Though such requirement can be implemented as follows:

template<typename StateT>
class fpos
{
    explicit operator streamoff() const { ... }
};

The existing mainstream implementations, like MS-STL, libstdc++, libc++, and many other implmentations that support C++98 like Apache stdcxx, STLport (stlport/stl/char_traits#92), all choose to make it not explicit. Such tacit phenomenon make an illusion to C++ programmers that "it’s just an integer". Specifically, for output, the operator<< has all overloads like:

basic_ostream& operator<<(int);
basic_ostream& operator<<(long);
basic_ostream& operator<<(long long);

This enables implicit conversion from fpos to streamoff to match one of the overloads and makes output successful. However, it fails to work when it comes to format, since template doesn’t try to do implicit conversions in most cases. The formatter specialization for int, long and long long cannot be utilized by fpos without explicit conversion.

3.1.2. Not only the integer, though convertible

Another problem that’s usually neglected is that fpos doesn’t merely contain the position integer; it also has a state as conveyed by the template parameter, most typically mbstate_t. It’s used to determine the current state of character conversion, like for codecvt in locale. By default, fpos assigned by a streamoff will have a value-initialized state, which means the initial state. However, sometimes it’s possibly not "initial", making such restoration unsafe and incorrect.

For instance, though rarely happen, if some derived class of basic_streambuf allows partial conversion when overriding overflow (since it’s only regulated to prepare space for at least one CharT), tellp of its stream may also report a fpos with partial status. Anyway, it’s incomplete to only report the position integer in some cases.

3.2. Proposed Solution

To make it both safe and convenient, we propose to add formatter specialization for fpos<mbstate_t>. Considering that almost all behaviors of mbstate_t are implementation-defined, it seems meaningless to regulate its format specifications. However, the state should be output in a way that can fully convey its information, so if [P1729] is accepted, a corresponding scanner can be defined to do a round trip.

So to be specific, the formatter specialization of fpos<mbstate_t> should behave as follows:

3.3. Possible Future Evolution

An important aspect to consider is whether we need to change the behavior of operator<<, like making conversion operator to streamoff explicit or overloading operator<< for fpos<mbstate_t>. Such breaking change may or may not be expected by many.

Besides, it’s worthwhile to have a discussion about whether we need to leave some way for the formatter specialization to check whether mbstate_t is in its initial state by mbsinit to boost safety in some cases.

It could also be talked about whether the formatter should be generalized for fpos to support any StateT that is formattable, and whether mbstate_t should support formatter itself instead of relying on fpos<mbstate_t>. It may be also worthy to discuss whether ambiguity is introduced when parsing (position, mbstate descriptor) to restore the fpos<mbstate_t> if the descriptor is not constrained at all.

4. Standard Wording

We propose to add wording in [fpos.operations]:

2.Stream operations that return a value of type traits::pos_type return P(O(-1)) as an invalid value to signal an error. If this value is used as an argument to any istream, ostream, or streambuf member that accepts a value of type traits::pos_type then the behavior of that function is undefined.
3. Assuming that mbstate_t can be fully restored by a basic_string<charT> called mbstate descriptor, the formatter of fpos<mbstate_t> should behave as follows:
namespace std {
 template<class charT>
 struct formatter<fpos<mbstate_t>, charT> {
 private:
  formatter<streamoff, charT> underlying_;    // exposition only    
  bool need_state_ = false;     // exposition only
 public:   template<class ParseContext>    constexpr typename ParseContext::iterator     parse(ParseContext& ctx);
  template<class FormatContext>    typename FormatContext::iterator     format(const fpos<mbstate_t>& ref, FormatContext& ctx) const;  }; }
template<class ParseContext>  constexpr typename ParseContext::iterator   parse(ParseContext& ctx);
Effects: Sets need_state_ to true if format-specifier or format-spec ([format.string.general]) is not present, otherwise same as underlying_.format(ctx).
Returns: An iterator past the end of format-spec.

template<class FormatContext>
 typename FormatContext::iterator
  format(const fpos<mbstate_t>& ref, FormatContext& ctx) const;
Effects: Writes the following into ctx.out():
  • If need_state_ is false, then as if underlying_.format(static_cast<streamoff>(ref), ctx);
  • Otherwise,
    • STATICALLY-WIDEN<charT>("("),
    • the result of writing static_cast<streamoff>(ref) via underlying_,
    • STATICALLY-WIDEN<charT>(", "),
    • the mbstate descriptor of ref.state(),
    • STATICALLY-WIDEN<charT>(")").
Returns: An iterator past the end of the output range.

5. Impact on Existing Code

This is a pure extension to the standard library so there won’t be severe conflicts with the existing code. The only possible conflict is that some existing code has already added specialization on fpos<mbstate_t>, but it seems that no open-source code on Github does so.

6. Implementation

Since mbstate_t is implementation-defined and the internal state is undocumented, we may only be able to write some pseudo-code that’s close to the final implementation. The easiest implementation may make use of inheritance:

template<typename CharT>
struct formatter<fpos<mbstate_t>, CharT> : formatter<streamoff, CharT>
{
private:
    using Base = formatter<streamoff, CharT>;
    bool need_state_ = false;

public:
    constexpr auto parse(const auto& ctx)
    {
        auto it = ctx.begin();
        if (it == context.end() || *it == WIDEN('}'))
        {
            need_state_ = true;
            return it;
        }

        return Base::parse(ctx);
    }

    auto format(const fpos<mbstate_t>& value, auto& ctx) const
    {
        if (need_state_) {
            return format_to(ctx.out(), ctx.locale(), WIDEN("({}, {})"),
                             static_cast<streamoff>(value),
                             GET_MBSTATE_DESCRIPTOR(value.state()));
        }
        return Base::format(static_cast<streamoff>(value), ctx);
    }
};

Notice that MS-STL and libstdc++ can still utilize the implicit conversion so the static_cast in Base::format can be omitted. libc++ checks whether value is integer in the template Base::format method and thus implicit conversion cannot help.

7. Acknowledgement

Thanks to Victor Zverovich, the author of [fmt], and Arthur O’Dwyer for suggestions and discussions on generalization of the conversion and encouragement to post this paper. Thanks to Tom Honermann for advice on formatting the state type and assistance. I’d also like to extend my gratitude to Peking University for giving me a colorful undergraduate life in the last four years.

References

Informative References

[FMT]
Victor Zverovich; et al.. The {fmt} library. URL: https://github.com/fmtlib/fmt
[P1729]
Elias Kosunen, Victor Zverovich. Text Parsing. URL: https://wg21.link/p1729