n601

SC22/WG20 N601

Report about the W3C I18N activities

September 18, 1998

Arnold,

The following is my report about W3C I18N WG meeting.

Please present it in our next meeting as in agenda item of liaison report, though I can not do it with my voice, unfortunately.

If there is anything that the WG20 like to say to the W3C I18N WG,
please provide a text and assign me an action item to tell it to the
W3C I18N WG. I will bring it to the next W3C I18N WG meeting
in 1/1999. Also, if WG20 have any question regarding the contribution
below, please let me know. I will try to answer to it, as far as I can.

Best regards,

Akio Kido (Mgr. of Application SW proj., DBCS Tech. Coordination Office, APTO)

1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242, Japan (LAB-SA1)

E-mail: kido@jp.ibm.com FAX: +81-462-73-7415

Title: Information about W3C I18N activity related with SC22/WG20

Source: Akio Kido (Japan)

Date: September 18, 1998

The W3C I18N WG had its 3^rd meeting on 1998-09-14 and 15 in San Francisco Bay area.

This contribution describes about its activities that are related with programming languages, thus SC22/WG20 may be interested in. In W3C, there are a couple of works which are related with the programming language.

The first one is Document Object Model (DOM) which is a set of API that

access to Web Object and will be bound with programming languages. Java and

ECMAScripts are the expected targets to be bound, but it may be bound with other

programming languages such as C++. The information about DOM is available through W3C Web site, and its URL is "http://www.w3.org/DOM".

The second one is the String Identity Matching and String Indexing or so called character model on the Web. A Working draft of the requirement document is available from the W3C Web site, and its URL is "http://www.w3.org/TR/WD-charreq".

Because of end user of Web does not like to care about encoding of Web document at all, and character encoding may be converted from one to another implicitly within the Web environment, therefore a requirement exists to access to texts in Web transparently from character encoding of the texts. For example, user want to search a text from multiple Web pages that are encoded by different coded character sets. One of the problem to achieve the requirement is multiple representation of a CC data element between different coded character sets, or within a large coded character set such as ISO/IEC 10646. A CC data element may be represented a combining sequence but may be represented by a precomposed form. In order to make the difference of representations of a CC data element, W3C is now considering normalization of character string before the character string manipulation.

The work will be completed in early 1999, and the result will becomes available through the W3C Web site. W3C is now investigating the string normalization method proposed by the Unicode consortium for the purpose.

The implication of the character model on the Web to the programming language standards could be as follows. All of ISO programming languages have a capability to character and character string, but as of 1998, none of standard have the capability to access to CC data element as a unit of information. In other words, no standard have the data type for CC data element, no API is provided to detect boundary of a CC data element from character string, and no API to normalize representation of CC data element in a character string. Now the requirement becomes visible to access to CC data element, in stead of character. Then the question becomes how programming standard should deal with the requirement, or if we should put it on the application domain in stead of deal with it in the scope of programming language standards. The answer may impact to the future edition of TR 11017, TR 10176, and on going activity of internationalization API standard.

Since the above activities may have some dependency with internationalization of ISO programming language standards, the W3C I18N WG discussed about possible cooperation with ISO programming language group especially with SC22/WG20. The conclusion was the W3C I18N WG believed that information exchange would be beneficial for both groups, but establishment of a formal liaisonship might not be mandatory for the purpose at this moment, since some members of the W3C I18N WG are also involved in related ISO standardization committee. For particular SC22/WG20 case, me, Akio Kido, is belongs to both group, the required information exchange will be done through me. This contribution is my first action to inform the W3C I18N WG activity to SC22/WG20, in order to start communication between both groups.