1. Revision history
1.1. Changes since R2
- 
     Return a subrange scan 
- 
     Default CharT char scanner formatter CharT 
- 
     Add design discussion about thousands separators in § 4.3.5.1 Design discussion: Thousands separator grouping checking and § 4.3.5.2 Design discussion: Separate flag for thousands separators. 
- 
     Add design discussion about additional error information in § 4.6.2 Design discussion: Additional information. 
- 
     Add clarification about field width calculation in § 4.3.4 Width and precision. 
- 
     Add note about scope at the end of § 2 Introduction. 
- 
     Fix/clarify error handling in example § 3.5 Alternative error handling. 
- 
     Address SG16 feedback: - 
       Add definition of "whitespace", and clarify matching of non-whitespace literal characters, in § 4.2 Format strings. 
- 
       Add section about text encoding § 4.11 Encoding, and an example about handing reading code units § 4.3.8 Type specifiers: CharT. 
- 
       Add example about using locales in § 4.10 Locales. 
- 
       Add potential future extension: § 6.3 Reading code points (or even grapheme clusters?) 
 
- 
       
1.2. Changes since R1
- 
     Thoroughly describe the design 
- 
     Add examples 
- 
     Add specification (synopses only) 
- 
     Design changes: - 
       Return an expected tuple std :: scan 
- 
       Make std :: scan string_view 
- 
       Remove support for partial successes 
 
- 
       
2. Introduction
With the introduction of 
According to [CODESEARCH], a C and C++ codesearch engine based on the ACTCD19
dataset, there are 389,848 calls to 
The lack of a general-purpose parsing facility based on format strings has been raised in [P1361] in the context of formatting and parsing of dates and times.
This paper explores the possibility of adding a symmetric parsing facility,
to complement the 
This facility is not a parser per se, as it is probably not sufficient
for parsing something more complicated, e.g. JSON.
This is not a parser combinator library.
This is intended to be an almost-drop-in replacement for 
3. Examples
3.1. Basic example
if ( auto result = std :: scan < std :: string , int > ( "answer = 42" , "{} = {}" )) { // ~~~~~~~~~~~~~~~~ ~~~~~~~~~~~ ~~~~~~~ // output types input format // string const auto & [ key , value ] = result -> values (); // ~~~~~~~~~~ // scanned // values // result == true // result->range() gives an empty range (result->begin() == result->end()) // key == "answer" // value == 42 } else { // We’ll end up here if we had an error // Inspect the returned scan_error with result.error() } 
3.2. Reading multiple values at once
auto input = "25 54.32E-1 Thompson 56789 0123" ; auto result = std :: scan < int , float , string_view , int , float , int > ( input , "{:d}{:f}{:9}{:2i}{:g}{:o}" ); // result is a std::expected, operator-> will throw if it doesn’t contain a value auto [ i , x , str , j , y , k ] = result -> values (); // i == 25 // x == 54.32e-1 // str == "Thompson" // j == 56 // y == 789.0 // k == 0123 
3.3. Reading from an arbitrary range
std :: string input { "123 456" }; if ( auto result = std :: scan < int > ( std :: views :: reverse ( input ), "{}" )) { // If only a single value is returned, it can be inspected with result->value() // result->value() == 654 } 
3.4. Reading multiple values in a loop
std :: vector < int > read_values ; std :: ranges :: forward_range auto range = ...; auto input = std :: ranges :: subrange { range }; while ( auto result = std :: scan < int > ( input , "{}" )) { read_values . push_back ( result -> value ()); input = result -> range (); } 
3.5. Alternative error handling
// Since std::scan returns a std::expected, // its monadic interface can be used auto result = std :: scan < int > (..., "{}" ) . transform ([]( auto result ) { return result . value (); }); if ( ! result ) { // handle error } int num = * result ; // With [ P2561 ]: int num = std :: scan < int > (..., "{}" ). try ? . value (); 
3.6. Scanning an user-defined type
struct mytype { int a {}, b {}; }; // Specialize std::scanner to add support for user-defined types // Inherit from std::scanner<string> to get format string parsing (scanner::parse()) from it template <> struct std :: scanner < mytype > : std :: scanner < std :: string > { template < typename Context > auto scan ( mytype & val , Context & ctx ) const -> std :: expected < typename Context :: iterator , std :: scan_error > { return std :: scan < int , int > ( ctx . range (), "[{}, {}]" ) . transform ([ & val ]( const auto & result ) { std :: tie ( val . a , val . b ) = result . values (); return result . begin (); }); } }; auto result = std :: scan < mytype > ( "[123, 456]" , "{}" ); // result->value().a == 123 // result->value().b == 456 
4. Design
The new parsing facility is intended to complement the existing C++ I/O streams
library, integrate well with the chrono library, and provide an API similar to 
4.1. Overview
The main user-facing part of the library described in this paper,
is the function template 
template < class ... Args , scannable_range < char > Range > auto scan ( Range && range , format_string < Args ... > fmt ) -> expected < scan_result < ranges :: borrowed_ssubrange_t < Range > , Args ... > , scan_error > ; template < class ... Args , scannable_range < wchar_t > Range > auto scan ( Range && range , wformat_string < Args ... > fmt ) -> expected < scan_result < ranges :: borrowed_ssubrange_t < Range > , Args ... > , scan_error > ; 
4.2. Format strings
As with 
- 
     Many format specifiers like hh h l j 
- 
     There is no standard way to extend the syntax for user-defined types. 
- 
     Using '%' get_time 
Therefore, we propose a syntax based on 
- 
     An easy-to-parse mini-language focused on the data format rather than conveying the type information 
- 
     Extensibility for user-defined types 
- 
     Positional arguments 
- 
     Support for both locale-specific and locale-independent parsing (see § 4.10 Locales) 
- 
     Consistency with std :: format 
At the same time, most of the specifiers will remain quite similar to the ones
in 
Maintaining similarity with 
In this proposal, "whitespace" is defined to be the Unicode code points with the Pattern_White_Space property, as defined by UAX #31 (UAX31-R3a). Those code points are currently:
- 
     ASCII whitespace characters (U+0009 to U+000D, U+0020) 
- 
     U+0085 (next line) 
- 
     U+200E and U+200F (LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK) 
- 
     U+2028 and U+2029 (LINE SEPARATOR and PARAGRAPH SEPARATOR) 
Unicode defines a lot of different things
in the realm of whitespace, all for different kinds of use cases.
The Pattern_White_Space-property is chosen for its stability (it’s guaranteed to not change),
and because its intended use is for classifying things that should be treated as
whitespace in machine-readable syntaxes. 
auto r0 = std :: scan < char > ( "abcd" , "ab{}d" ); // r0->value() == 'c' auto r1 = std :: scan < string , string > ( "abc \n def" , "{} {}" ); const auto & [ s1 , s2 ] = r1 -> values (); // s1 == "abc", s2 == "def" 
As mentioned above, the format string syntax consists of replacement fields
delimited by curly brackets (
| replacement field syntax | replacement field syntax | 
|---|---|
| 
 | 
 | 
4.3. Format string specifiers
Below is a somewhat detailed description of each of the specifiers
in a 
4.3.1. Manual indexing
replacement - field ::= '{' [ arg - id ] [ ':' format - spec ] '}' 
Like 
auto r = std :: scan < int , int , int > ( "0 1 2" , "{1} {0} {2}" ); auto [ i0 , i1 , i2 ] = r -> values (); // i0 == 1, i1 == 0, i2 == 2 
4.3.2. Fill and align
fill - and - align ::= [ fill ] align fill ::= any character other than '{' or '}' align ::= one of '<' '>' '^' 
The fill and align options are valid for all argument types.
The fill character is denoted by the 
If an alignment is specified, the value to be parsed is assumed to be properly aligned with the specified fill character.
If a field width is specified, it will be the maximum number of characters
to be consumed from the input range.
In that case, if no alignment is specified, the default alignment for the type
is considered (see 
For the 
This spec is compatible with 
Note: For format type specifiers other than 
auto r0 = std :: scan < int > ( " 42" , "{}" ); // r0->value() == 42, r0->range() == "" auto r1 = std :: scan < char > ( " x" , "{}" ); // r1->value() == ' ', r1->range() == " x" auto r2 = std :: scan < char > ( "x " , "{}" ); // r2->value() == 'x', r2->range() == " " auto r3 = std :: scan < int > ( " 42" , "{:6}" ); // r3->value() == 42, r3->range() == "" auto r4 = std :: scan < char > ( "x " , "{:6}" ); // r4->value() == 'x', r4->range() == "" auto r5 = std :: scan < int > ( "***42" , "{:*>}" ); // r5->value() == 42 auto r6 = std :: scan < int > ( "***42" , "{:*>5}" ); // r6->value() == 42 auto r7 = std :: scan < int > ( "***42" , "{:*>4}" ); // r7->value() == 4 auto r8 = std :: scan < int > ( "42" , "{:*>}" ); // r8->value() == 42 auto r9 = std :: scan < int > ( "42" , "{:*>5}" ); // ERROR (mismatching field width) auto rA = std :: scan < int > ( "42***" , "{:*<}" ); // rA->value() == 42, rA->range() == "" auto rB = std :: scan < int > ( "42***" , "{:*<5}" ); // rB->value() == 42, rB->range() == "" auto rC = std :: scan < int > ( "42***" , "{:*<4}" ); // rC->value() == 42, rC->range() == "*" auto rD = std :: scan < int > ( "42" , "{:*<}" ); // rD->value() == 42 auto rE = std :: scan < int > ( "42" , "{:*<5}" ); // ERROR (mismatching field width) auto rF = std :: scan < int > ( "42" , "{:*^}" ); // rF->value() == 42, rF->range() == "" auto rG = std :: scan < int > ( "*42*" , "{:*^}" ); // rG->value() == 42, rG->range() == "" auto rH = std :: scan < int > ( "*42**" , "{:*^}" ); // rH->value() == 42, rH->range() == "*" auto rI = std :: scan < int > ( "**42*" , "{:*^}" ); // ERROR (not enough fill characters after value) auto rJ = std :: scan < int > ( "**42**" , "{:*^6}" ); // rJ->value() == 42, rJ->range() == "" auto rK = std :: scan < int > ( "*42**" , "{:*^5}" ); // rK->value() == 42, rK->range() == "" auto rL = std :: scan < int > ( "**42*" , "{:*^6}" ); // ERROR (not enough fill characters after value) auto rM = std :: scan < int > ( "**42*" , "{:*^5}" ); // ERROR (not enough fill characters after value) 
Note: This behavior, while compatible with 
4.3.3. Sign, # 0 
format - spec ::= ... [ sign ] [ '#' ] [ '0' ] ... sign ::= one of '+' '-' ' ' 
These flags would have no effect in 
Note: This is incompatible with 
4.3.4. Width and precision
width ::= positive - integer OR '{' [ arg - id ] '}' precision ::= '.' nonnegative - integer OR '.' '{' [ arg - id ] '}' 
The width specifier is valid for all argument types.
The meaning of this specifier somewhat deviates from 
std :: format std :: scan auto str = std :: format ( "{:2}" , 123 ); // str == "123" // because only the minimum width was set by the format string auto result = std :: scan < int > ( str , "{:2}" ); // result->value() == 12 // result->range() == "3" // because the maximum width was set to 2 by the format string 
For compatibility with 
For a sequence of characters in UTF-8, UTF-16, or UTF-32, an implementation should use as its field width the sum of the field widths of the first code point of each extended grapheme cluster. Extended grapheme clusters are defined by UAX #29 of the Unicode Standard. The following code points have a field width of 2:
any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property as described by UAX #44 of the Unicode Standard
U+4dc0 – U+4dff (Yijing Hexagram Symbols)
U+1f300 – U+1f5ff (Miscellaneous Symbols and Pictographs)
U+1f900 – U+1f9ff (Supplemental Symbols and Pictographs)
The field width of all other code points is 1.
For a sequence of characters in neither UTF-8, UTF-16, nor UTF-32, the field width is unspecified.
This essentially maps 1 field width unit = 1 user perceived character.
It should be noted, that with this definition, grapheme clusters like emoji have a field width of 2.
This behavior is present in 
std :: format - 
      Plain bytes or code units 
- 
      Unicode code points 
- 
      Unicode (extended) grapheme clusters 
- 
      std :: format 
- 
      Exclusively using UAX #11 (East Asian Width) widths 
Specifying the width with another argument, like in 
4.3.5. Localized (L 
format - spec ::= ... [ 'L' ] ... 
Enables scanning of values in locale-specific forms.
- 
     For integer types, allows for digit group separator characters, equivalent to numpunct :: thousands_sep numpunct :: grouping 
- 
     For floating-point types, the same as above. In addition, the locale-specific radix separator character is used, from numpunct :: decimal_point 
- 
     For bool numpunct :: truename numpunct :: falsename 
4.3.5.1. Design discussion: Thousands separator grouping checking
As proposed, when using localized scanning, the grouping of thousands
separators in the input must exactly match the value retrieved from 
struct custom_numpunct : std :: numpunct < char > { std :: string do_grouping () const override { return " \3 " ; } char do_thousands_sep () const override { return ',' ; } }; auto loc = std :: locale ( std :: locale :: classic (), new custom_numpunct ); // As proposed: // Check grouping, error if invalid auto r0 = std :: scan < int > ( loc , "123,45" , "{:L}" ); // r0.has_value() == false // ALTERNATIVE: // Do not check grouping, only skip it auto r1 = std :: scan < int > ( loc , "123,45" , "{:L}" ); // r1.has_value() == true // r1->value() == 12345 // Current proposed behavior, _somewhat_ consistent with iostreams: istringstream iss { "123,45" }; iss . imbue ( locale ( locale :: classic (), new custom_numpunct )); int i {}; iss >> i ; // i == 12345 // iss.fail() == !iss == true 
This highlights a problem with using 
4.3.5.2. Design discussion: Separate flag for thousands separators
It may also be desirable to split up the behavior of skipping and checking
of thousands separators from the realm of localization. For example,
in the POSIX-extended version of ' format specifier,
which allows opting-into reading of thousands separators.
When a locale isn’t used, a set of options similar to the thousands separator
options used with the 
// NOT PROPOSED, // hypothetical example, with a ' format specifier auto r = std :: scan < int > ( "123,456" , "{:'}" ); // r->value() == 123456 
4.3.6. Type specifiers: strings
| Type | Meaning | 
|---|---|
| none,  | Copies from the input until a whitespace character is encountered. | 
|  | Copies an escaped string from the input. | 
|  | Copies from the input until the field width is exhausted. Does not skip preceding whitespace. Errors, if no field width is provided. | 
s std :: istream std :: string std :: string word ; std :: istringstream { "Hello world" } >> word ; // word == "Hello" auto r = std :: scan < string > ( "Hello world" , "{:s}" ); // r->value() == "Hello" 
Note: The 
4.3.7. Type specifiers: integers
Integer values are scanned as if by using 
- 
     A positive + 
- 
     Preceding whitespace is skipped. 
| Type | Meaning | 
|---|---|
| , | with base 2. The base prefix isor. | 
|  | with base 8. For non-zero values, the base prefix is. | 
| , | with base 16. The base prefix isor. | 
|  | with base 10. No base prefix. | 
|  | with base 10. No base prefix. Nosign allowed. | 
|  | Detect base from a possible prefix, default to decimal. | 
|  | Copies a character from the input. | 
| none | Same as  | 
Note: The flags 
4.3.8. Type specifiers: CharT 
   | Type | Meaning | 
|---|---|
| none,  | Copies a character from the input. | 
| ,,,,,,, | Same as for integers. | 
|  | Copies an escaped character from the input. | 
CharT c CharT // As proposed: // U+12345 is 0xF0 0x92 0x8D 0x85 in UTF-8 auto r = std :: scan < char , std :: string > ( "\u{12345}" , "{}{}" ); auto & [ ch , str ] = r -> values (); // ch == '\xF0' // str == "\x92\x8d\x85" (invalid utf-8) // This is the same behavior as with iostreams today 
4.3.9. Type specifiers: bool 
   | Type | Meaning | 
|---|---|
|  | Allows for textual representation, i.e. trueorfalse | 
| ,,,,,, | Allows for integral representation, i.e. or | 
| none | Allows for both textual and integral representation: i.e. true,,false, or. | 
4.3.10. Type specifiers: floating-point types
Similar to integer types,
floating-point values are scanned as if by using 
- 
     A positive + 
- 
     Preceding whitespace is skipped. 
| Type | Meaning | 
|---|---|
| , | with, with/-prefix allowed. | 
| , | with. | 
| , | with. | 
| , | with. | 
| none | with, with/-prefix allowed. | 
4.4. Ranges
We propose, that 
template < class Range , class CharT > concept scannable_range = ranges :: forward_range < Range > && same_as < ranges :: range_value_t < Range > , CharT > ; 
For a range to be a 
scan < int > ( "42" , "{}" ); // OK scan < int > ( L"42" , L"{}" ); // OK scan < int > ( L"42" , "{}" ); // Error: wchar_t[N] is not a scannable_range<char> 
It should be noted, that standard range facilities related to iostreams, namely 
To prevent excessive code bloat, implementations are encouraged to type-erase the range
provided to 
It should be noted, that if the range is not type-erased, the library internals need to be exposed to the user (in a header), and be instantiated for every different kind of range type the user uses.
4.5. Argument passing, and return type of scan 
   In an earlier revision of this paper, output parameters were used to return the scanned values
from 
// R2 (current) auto result = std :: scan < int > ( input , "{}" ); auto [ i ] = result -> values (); // or: auto i = result -> value (); // R1 (previous) int i ; auto result = std :: scan ( input , "{}" , i ); 
The rationale behind this change is as follows:
- 
     It was easy to accidentally use uninitialized values (as evident by the example above). In this revision, the values can only be accessed when the operation is successful. 
- 
     Modern C++ API design principles favor return values over output parameters. 
- 
     The earlier design was conceived at a time, when C++17 support and usage wasn’t as prevalent as it is today. Back then, the only way to use a return-value API was through std :: tie 
- 
     Previously, there were real performance implications when using complicated tuples, both at compile-time and runtime. These concerns have since been alleviated, as compiler technology has improved. 
The return type of 
template < typename R > using borrowed_ssubrange_t = std :: conditional_t < ranges :: borrowed_range < R > , ranges :: subrange < ranges :: iterator_t < R > , ranges :: sentinel_t < R >> , ranges :: dangling > ; 
Note: The name 
Compare this with 
This is novel in the Ranges space: previously all algorithms have either returned an iterator,
or a subrange of two iterators. We believe that false, 
See this StackOverflow answer by Barry Revzin for more context: [BARRY-SO-ANSWER].
4.5.1. Design alternatives
As proposed, 
An alternative could be returning a 
// NOT PROPOSED, design alternative auto [ r , i ] = std :: scan < int > ( "42" , "{}" ); 
However, there are two possible issues with this design:
- 
     It’s easy to accidentally skip checking whether the operation succeeded, and access the scanned values regardless. This could be a potential security issue (even though the values would always be at least value-initialized, not default-initialized). Returning an expected forces checking for success. 
- 
     The numbering of the elements in the returned tuple would be off-by-one compared to the indexing used in format strings: auto r = std :: scan < int > ( "42" , "{0}" ); // std::get<0>(r) refers to the result object // std::get<1>(r) refers to {0} 
For the same reason as enumerated in 2. above, the 
// NOT PROPOSED auto result = std :: scan < int > ( "42" , "{0}" ); // std::get<0>(*result) would refer to the iterator // std::get<1>(*result) would refer to {0} 
4.6. Error handling
Contrasting with 
// Not a specification, just exposition class scan_error { public : enum code_type { good , // EOF: // tried to read from an empty range, // or the input ended unexpectedly. // Naming alternative: end_of_input end_of_range , invalid_format_string , invalid_scanned_value , value_out_of_range }; constexpr scan_error () = default ; constexpr scan_error ( code_type , const char * ); constexpr explicit operator bool () const noexcept ; constexpr code_type code () const noexcept ; constexpr const char * msg () const ; }; 
4.6.1. Design discussion: Essence of std :: scan_error 
   The reason why we propose adding the type 
The 
Possible mappings from 
|  |  | 
|---|---|
|  |  | 
|  |  | 
|  | |
|  | |
|  |  | 
There are multiple dimensions of design decisions to be done here:
- 
     Should scan_error - 
       Yes. (currently proposed, our preference) 
- 
       No, use std :: errc 
 
- 
       
- 
     Should scan_error - 
       Yes, a const char * 
- 
       Yes, a std :: string 
- 
       No. Worse user experience for loss of diagnostic information 
 
- 
       
4.6.2. Design discussion: Additional information
Only having 
Both 
// larger than INT32_MAX std :: string source { "999999999999999999999999999999" }; { std :: istringstream iss { source }; int i {}; iss >> i ; // iss.fail() == true // i == INT32_MAX } { // (assuming sizeof(long) == 4) auto i = std :: strtol ( source . c_str (), nullptr , 10 ); // i == LONG_MAX // errno == ERANGE } { int i {}; auto [ ec , ptr ] = std :: from_chars ( source . data (), source . data () + source . size (), i ); // ec == std::errc::result_out_of_range // i == 0 (!) } { int i {}; auto r = std :: sscanf ( source . c_str (), "%d" , & i ); // r == 1 (?) // i == -1 (?) // errno == ERANGE } 
This is predicated on an issue with using 
Nevertheless, there’s a simple reason for using 
int i {}; std :: cin >> i ; // We would need to check std::cin.operator bool() first, // to determine whether <code data-opaque bs-autolink-syntax='`i`'>i</code> was successfully read: // that’s very easy to forget auto r = std :: scan < int > (..., "{}" ); int i = r -> value (); // ^ // dereference // does not allow for accidentally accessing the value if we had an error 
It’s a tradeoff.
Either we allow for an additional avenue for error reporting through the scanned value,
or we use 
4.7. Binary footprint and type erasure
We propose using a type erasure technique to reduce the per-call binary code size. The scanning function that uses variadic templates can be implemented as a small inline wrapper around its non-variadic counterpart:
template < scannable_range < char > Range > auto vscan ( Range && range , string_view fmt , scan_args_for < Range > args ) -> expected < ranges :: borrowed_ssubrange_t < Range > , scan_error > ; template < typename ... Args , scannable_range < char > SourceRange > auto scan ( SourceRange && source , format_string < Args ... > format ) -> expected < scan_result < ranges :: borrowed_ssubrange_t < SourceRange > , Args ... > , scan_error > { auto args = make_scan_args < SourceRange , Args ... > (); auto result = vscan ( std :: forward < SourceRange > ( range ), format , args ); return make_scan_result ( std :: move ( result ), std :: move ( args )); } 
As shown in [P0645] this dramatically reduces binary code size, which will make 
Note: This implementation of 
4.8. Safety
char s [ 10 ]; std :: sscanf ( input , "%s" , s ); // s may overflow. 
Specifying the maximum length in the format string above solves the issue but is error-prone, especially since one has to account for the terminating null.
Unlike 
4.9. Extensibility
We propose an extension API for user-defined types similar to 
auto r = scan < tm > ( input , "Date: {0:%Y-%m-%d}" ); 
This is done by providing a specialization of 
template <> struct scanner < tm > { constexpr auto parse ( scan_parse_context & ctx ) -> expected < scan_parse_context :: iterator , scan_error > ; template < class ScanContext > auto scan ( tm & t , ScanContext & ctx ) const -> expected < typename ScanContext :: iterator , scan_error > ; }; 
The 
An implementation of 
4.10. Locales
As pointed out in [N4412]:
There are a number of communications protocol frameworks in use that employ text-based representations of data, for example XML and JSON. The text is machine-generated and machine-read and should not depend on or consider the locales at either end.
To address this, 
std :: locale :: global ( std :: locale :: classic ()); // {} uses no locale // {:L} uses the global locale auto r0 = std :: scan < double , double > ( "1.23 4.56" , "{} {:L}" ); // r0->values(): (1.23, 4.56) // {} uses no locale // {:L} uses the supplied locale auto r1 = std :: scan < double , double > ( std :: locale { "fi_FI" }, "1.23 4,56" , "{} {:L}" ); // r1->values(): (1.23, 4.56) 
4.11. Encoding
In a similar manner as with 
std :: scan // Invalid UTF-8 auto r = std :: scan < std :: string > ( "a \xc3 " , "{}" ); // r->value() == "a\xc3" // Erroneous behavior? 
Other potential options for handling invalid encoding would be:
- 
     treat is as UB 
- 
     always sanitize input encoding (potentially very slow when done character-by-character with forward_range 
- 
     check for encoding when reading code units and strings, while potentially introducing a format specifier for "raw mode", which skips these checks 
Note: This topic is under active contention in SG16. See also example in § 4.3.8 Type specifiers: CharT.
4.12. Performance
The API allows efficient implementation that minimizes virtual function calls
and dynamic memory allocations, and avoids unnecessary copies. In particular,
since it doesn’t need to guarantee the lifetime of the input across multiple
function calls, 
We can also avoid unnecessary copies required by 
auto r = std :: scan < std :: string_view , int > ( "answer = 42" , "{} = {}" ); 
This has lifetime implications similar to returning match objects in [P1433] and iterators or subranges in the ranges library and can be mitigated in the same way.
It should be noted, that as proposed, this library does not support
checking at compile-time, whether scanning a 
4.13. Integration with chrono
The proposed facility can be integrated with 
Before:
std :: istringstream is ( "start = 10:30" ); std :: string key ; char sep ; std :: chrono :: seconds time ; is >> key >> sep >> std :: chrono :: parse ( "%H:%M" , time ); 
After:
auto result = std :: scan < std :: string , std :: chrono :: seconds > ( "start = 10:30" , "{0} = {1:%H:%M}" ); const auto & [ key , time ] = result -> values (); 
Note that the 
4.14. Impact on existing code
The proposed API is defined in a new header and should have no impact on existing code.
5. Existing work
[SCNLIB] is a C++ library that, among other things,
provides an interface similar to the one described in this paper.
As of the publication of this paper, the 
[FMT] has a prototype implementation of an earlier version of the proposal.
6. Future extensions
To keep the scope of this paper somewhat manageable, we’ve chosen to only include functionality we consider fundamental. This leaves the design space open for future extensions and other proposals. However, we are not categorically against exploring this design space, if it is deemed critical for v1.
All of the possible future extensions described below are implemented in [SCNLIB].
6.1. Integration with std :: istream 
   Today, in C++, standard I/O is largely done with iostreams, and not with ranges.
The library proposed in this paper doesn’t support that use case well.
The proposed concept of 
Integration with iostreams is needed to enable working with files and 
A possible solution would be a more robust 
6.2. scanf [ character  set ] 
   
auto r = scan < string > ( "abc123" , "{:[a-zA-Z]}" ); // r->value() == "abc", r->range() == "123" // Compare with: char buf [ N ]; sscanf ( "abc123" , "%[a-zA-Z]" , buf ); // ... auto _ = scan < string > (..., "{:[^ \n ]}" ); // match until newline 
It should be noted, that while the syntax is quite similar, this is not a regular expression. This syntax is intentionally way more limited, as is meant for simple character matching.
[SCNLIB] implements this syntax, providing support for matching single characters/code points
(
6.3. Reading code points (or even grapheme clusters?)
[SCNLIB] supports reading Unicode code points with 
6.4. Reading strings and chars of different width
In C++, we have character types other than 
// Currently supported: auto r0 = scan < wchar_t > ( "abc" , "{}" ); // Not supported: auto r1 = scan < char > ( L"abc" , L"{}" ); auto r2 = scan < string , wstring , u8string , u16string , u32string > ( "abc def ghi jkl mno" , "{} {} {} {} {}" ); auto r3 = scan < string , wstring , u8string , u16string , u32string > ( L"abc def ghi jkl mno" , L"{} {} {} {} {}" ); 
6.5. Scanning of ranges
Introduced in [P2286] for 
6.6. Default values for scanned values
Currently, the values returned by 
string str ; str . reserve ( n ); auto r0 = scan < string > (..., "{}" , { std :: move ( str )}); // ... r0 -> value (). clear (); auto r1 = scan < string > (..., "{}" , { std :: move ( r0 -> value ())}); 
6.7. Assignment suppression / discarding values
7. Specification
At this point, only the synopses are provided.
Note the similarity with [P0645] (
The changes to the wording include additions to the header 
7.1. Modify "Header < ranges > 
#include <compare>#include <initializer_list>#include <iterator>namespace std :: ranges { // ... template < range R > using borrowed_iterator_t = see below ; // freestanding template < range R > using borrowed_subrange_t = see below ; // freestanding template < range R > using borrowed_ssubrange_t = see below ; // freestanding // ... } 
7.2. Modify "Dangling iterator handling", paragraph 3 [range.dangling]
For a type 
- 
     if R borrowed_range borrowed_iterator_t < R > iterator_t < R > andborrowed_subrange_t < R > subrange < iterator_t < R >> borrowed_subrange_t < R > subrange < iterator_t < R >> borrowed_ssubrange_t < R > subrange < iterator_t < R > , sentinel_t < R >> 
- 
     otherwise, 
     bothborrowed_iterator_t < R > borrowed_subrange_t < R > dangling borrowed_iterator_t < R > borrowed_subrange_t < R > borrowed_ssubrange_t < R > dangling 
7.3. Header < scan > 
#include <expected>#include <format>#include <ranges>namespace std { class scan_error ; template < class Range , class ... Args > class scan_result ; template < class Range , class CharT > concept scannable_range = ranges :: forward_range < Range > && same_as < ranges :: range_value_t < Range > , CharT > ; template < class Range , class ... Args > using scan_result_type = expected < scan_result < ranges :: borrowed_ssubrange_t < Range > , Args ... > , scan_error > ; template < class ... Args , scannable_range < char > Range > scan_result_type < Range , Args ... > scan ( Range && range , format_string < Args ... > fmt ); template < class ... Args , scannable_range < wchar_t > Range > scan_result_type < Range , Args ... > scan ( Range && range , wformat_string < Args ... > fmt ); template < class ... Args , scannable_range < char > Range > scan_result_type < Range , Args ... > scan ( const locale & loc , Range && range , format_string < Args ... > fmt ); template < class ... Args , scannable_range < wchar_t > Range > scan_result_type < Range , Args ... > scan ( const locale & loc , Range && range , wformat_string < Args ... > fmt ); template < class Range , class CharT > class basic_scan_context ; template < class Context > class basic_scan_args ; template < class Range > using scan_args_for = basic_scan_args < basic_scan_context < unspecified , ranges :: range_value_t < Range >>> ; template < class Range > using vscan_result_type = expected < ranges :: borrowed_ssubrange_t < Range > , scan_error > ; template < scannable_range < char > Range > vscan_result_type < Range > vscan ( Range && range , string_view fmt , scan_args_for < Range > args ); template < scannable_range < wchar_t > Range > vscan_result_type < Range > vscan ( Range && range , wstring_view fmt , scan_args_for < Range > args ); template < scannable_range < char > Range > vscan_result_type < Range > vscan ( const locale & loc , Range && range , string_view fmt , scan_args_for < Range > args ); template < scannable_range < wchar_t > Range > vscan_result_type < Range > vscan ( const locale & loc , Range && range , wstring_view fmt , scan_args_for < Range > args ); template < class T , class CharT = char > struct scanner ; template < class T , class CharT > concept scannable = see below ; template < class CharT > using basic_scan_parse_context = basic_format_parse_context < CharT > ; using scan_parse_context = basic_scan_parse_context < char > ; using wscan_parse_context = basic_scan_parse_context < wchar_t > ; template < class Context > class basic_scan_arg ; template < class Visitor , class Context > decltype ( auto ) visit_scan_arg ( Visitor && vis , basic_scan_arg < Context > arg ); template < class Context , class ... Args > class scan - arg - store ; // exposition only template < class Range , class ... Args > constexpr see below make_scan_args (); template < class Range , class Context , class ... Args > expected < scan_result < Range , Args ... > , scan_error > make_scan_result ( expected < Range , scan_error >&& source , scan - arg - store < Context , Args ... >&& args ); } 
7.4. Class scan_error 
namespace std { class scan_error { public : enum code_type { good , end_of_range , invalid_format_string , invalid_scanned_value , value_out_of_range }; constexpr scan_error () = default ; constexpr scan_error ( code_type error_code , const char * message ); constexpr explicit operator bool () const noexcept ; constexpr code_type code () const noexcept ; constexpr const char * msg () const ; private : code_type code_ ; // exposition only const char * message_ ; // exposition only }; } 
7.5. Class template scan_result 
namespace std { template < class Range , class ... Args > class scan_result { public : using range_type = Range ; constexpr scan_result () = default ; constexpr ~ scan_result () = default ; constexpr scan_result ( range_type r , tuple < Args ... >&& values ); template < class OtherR , class ... OtherArgs > constexpr explicit ( see below ) scan_result ( OtherR && it , tuple < OtherArgs ... >&& values ); constexpr scan_result ( const scan_result & ) = default ; template < class OtherR , class ... OtherArgs > constexpr explicit ( see below ) scan_result ( const scan_result < OtherR , OtherArgs ... >& other ); constexpr scan_result ( scan_result && ) = default ; template < class OtherR , class ... OtherArgs > constexpr explicit ( see below ) scan_result ( scan_result < OtherR , OtherArgs ... >&& other ); constexpr scan_result & operator = ( const scan_result & ) = default ; template < class OtherR , class ... OtherArgs > constexpr scan_result & operator = ( const scan_result < OtherR , OtherArgs ... >& other ); constexpr scan_result & operator = ( scan_result && ) = default ; template < class OtherR , class ... OtherArgs > constexpr scan_result & operator = ( scan_result < OtherR , OtherArgs ... >&& other ); constexpr range_type range () const ; constexpr see below begin () const ; constexpr see below end () const ; template < class Self > constexpr auto && values ( this Self && ); template < class Self > requires sizeof ...( Args ) == 1 constexpr auto && value ( this Self && ); private : range_type range_ ; // exposition only tuple < Args ... > values_ ; // exposition only }; } 
7.6. Class template basic_scan_context 
namespace std { template < class Range , class CharT > class basic_scan_context { public : using char_type = CharT ; using range_type = Range ; using iterator = ranges :: iterator_t < range_type > ; using sentinel = ranges :: sentinel_t < range_type > ; template < class T > using scanner_type = scanner < T , char_type > ; constexpr basic_scan_arg < basic_scan_context > arg ( size_t id ) const noexcept ; std :: locale locale (); constexpr iterator current () const ; constexpr range_type range () const ; constexpr void advance_to ( iterator it ); private : iterator current_ ; // exposition only sentinel end_ ; // exposition only std :: locale locale_ ; // exposition only basic_scan_args < basic_scan_context > args_ ; // exposition only }; } 
7.7. Class template basic_scan_args 
namespace std { template < class Context > class basic_scan_args { size_t size_ ; // exposition only basic_scan_arg < Context >* data_ ; // exposition only public : basic_scan_args () noexcept ; template < class ... Args > basic_scan_args ( scan - arg - store < Context , Args ... >& store ) noexcept ; basic_scan_arg < Context > get ( size_t i ) noexcept ; }; template < class Context , class ... Args > basic_scan_args ( scan - arg - store < Context , Args ... > ) -> basic_scan_args < Context > ; } 
7.8. Concept scannable 
namespace std { template < class T , class Context , class Scanner = typename Context :: template scanner_type < remove_const_t < T >>> concept scannable - with = // exposition only semiregular < Scanner > && requires ( Scanner & s , const Scanner & cs , T & t , Context & ctx , basic_format_parse_context < typename Context :: char_type >& pctx ) { { s . parse ( pctx ) } -> same_as < expected < typename decltype ( pctx ) :: iterator , scan_error >> ; { cs . scan ( t , ctx ) } -> same_as < expected < typename Context :: iterator , scan_error >> ; }; template < class T , class CharT > concept scannable = scannable - with < remove_reference_t < T > , basic_scan_context < unspecified >> ; } 
7.9. Class template basic_scan_arg 
namespace std { template < class Context > class basic_scan_arg { public : class handle ; private : using char_type = typename Context :: char_type ; // exposition only variant < monostate , signed char * , short * , int * , long * , long long * , unsigned char * , unsigned short * , unsigned int * , unsigned long * , unsigned long long * , bool * , char_type * , void ** , float * , double * , long double * , basic_string < char_type >* , basic_string_view < char_type >* , handle > value ; // exposition only template < class T > explicit basic_scan_arg ( T & v ) noexcept ; // exposition only public : basic_scan_arg () noexcept ; explicit operator bool () const noexcept ; }; } 
7.10. Exposition-only class template scan - arg - store 
namespace std { template < class Context , class ... Args > class scan - arg - store { // exposition only tuple < Args ... > args ; // exposition only array < basic_scan_arg < Context > , sizeof ...( Args ) > data ; // exposition only }; }