SC22/WG20 N780 Language codes: report to ISO/IEC JTC1/SC22/WG20. John Clews Overview There is a certain amount of incompatibility in relation to standards for language coding. I would recommend that JTC1/SC22/WG20 members look at Peter Constable's recent well-argued paper at the International Unicode Conference for further clarification of the issues. Hopefully information on accessing this paper will be passed to the JTC1/SC22/WG20 convenor shortly, for distribution. In addition, the actual ISO standards process seems not to be able to deliver the amount of codes that many IT vendors will require in a globalised market. The report below also looks at some areas of incompatibility that might impact on JTC1/SC22/WG20 standards. 1. ISO/TC37/SC2/WG1 (Language Codes) I attended ISO/TC37/SC2/WG1 (Language Codes) in London, and its parent SC ISO/TC37/SC2 (Coding Systems). Its convenor, and the project leader (i.e. project editor) for ISO 639-1 was extremely apologetic that the time for this meeting was limited to 90 minutes, despite the importance of ISO 639-1 - which (following national body votes and comments, in a postal ballot and at the ISO/TC37/SC2/WG1 meeting) would replace ISO 639 (which is all that RFC 1766 Language Tags refers to normatively). Voting on the DIS ensures that ISO DIS 639-1 automatically will become a standard. That is, the 2-letter codes in ISO 639-1 will replace the 2-letter codes in ISO 639. It is ISO 639 that is refered to normatively in some ISO/IEC JTC1/SC22/WG20 standards. ISO/IEC JTC1/SC22/WG20 will need to consider whether its standards need to be updated in this regard. ISO/IEC JTC1/SC22/WG20 will also need to consider whether its standards need to allow the inclusion of 3-letter codes from ISO 639-2. 2. RFC 1766: Language Tags There is currently a review of RFC 1766: Language Tags. There is too little flow of information between the review group and ISO/TC37/SC2, which is responsible for Language Codes. This may lead to versioning problems between ISO 639 and its successor, and RFC 1766 and its successor, with a few "loose" ambivalent codes (Hawai'ian is one example) that could impact on IT systems and be difficult to work out where these ambivalences are, for IT end-users who are not active in ISO/TC37/SC2/WG1 or the ISO 639 Joint Advisory Committee. ISO/IEC JTC1/SC22/WG20 needs to keep an eye on potential problems here, and if necessary to suggest delays in some aspects of the RFC 1766 development process if there is any danger of versioning. 3. ISO/TC37/SC2 (Coding Systems) I also attended the parent SC ISO/TC37/SC2 (Coding Systems). Aat Vervoorn, the current SC chair, stepped down. The good news in relation to ISO/TC37/SC2 is that Gerhard Budin (Austria) is the stongest candidate to replace him, and is very aware of IT issues, and language codes in IT standards, being involved in various projects funded by the European Commission (notably the current SALT project) and sometimes in CEN/TC304: Information and Communications Technologies: European Localization Requirements. However, the bad news is that the vagaries of ISO voting could mean that one of the other candidates is elected. 4. ISO 639-3 Gerhard Budin also proposed a NWI provisionally known as "ISO 639 part 3" which ISO CS still has to decide whether it eceives that number or another number. "ISO 639 part 3" goes towards providing codes for more language entities (a criticism which has been leveled against ISO 639 and ISO 639 part 2), and also providing a more structured mechanism for combining language codes with other codes (country codes from ISO 3166 (and potentially from ISO 3166 part 2) and script codes from the draft ISO 15924) than is currently provided by any part of ISO 639 or RFC 1766. It would aim to overcome the "versioning problem" between ISO 639 and RFC 1766 and their successors. However, there is (yet again) a chance of ISO 639 developments and RFC 1766 developments (or strictly speaking the development of the successor to RFC 1766) getting out of step with each other through versioning problems if the IRTF group sets in stone too rigidly the successor to RFC 1766, and in my view it may be better for that group to hold back on some areas. 5. ISO 639-2 (Language codes) and ISO/TC46/SC4/WG1 In the 3-letter codes in ISO 639-2 (bibliographic codes - not restricted in practice to bibliographic use) there are restrictions on which languages can get codes - a "number of (written) documents" barrier has to be overcome, and for each language considered for addition, there is supposed to be proof of 50 documents, even though many languages in the standard itself fail to meet those criteria. These are also considered for includion in the successor to RFC 1766: only 2-letter codes are normatively refered to in RFC 1766 itself; only 2-letter codes are normatively refered to in standards of ISO/IEC JTC1/SC22/WG20. 6. The ISO 639 Joint Advisory Committee (Language Codes) The ISO 639 Joint Advisory Committee seems to have replaced in practice the moribund ISO 639 Joint Working Group. The ISO 639 Joint Advisory Committee is also very slow, and fails to meet IT needs. It did add several languages from the draft ISO 639-1 (ensuring some (but not total) compatibility between ISO 639-1 and ISO 639-2) at its last meeting in February 2000. In addition to that it approved only 6 codes at that meeting, and approved a further one code - for Lower German/Lower Saxon (also used in the Netherlands, and distinct from German or any of its dialects) - although it decided on a new code, rather than using the existing 3-letter code which is in use in UK and Swedish bibliographic standards. The ISO 639 Joint Advisory Committee also ignored requests from the UK for several additional languages: the convenor, despite several UK requests, did not distribute the paper requesting these codes. There has been some disarray in the ISO 639 Joint Advisory Committee (JAC) as ISO procedures have not been followed, and some documents submitted by JAC Observers were not distributed, and JAC membership issues have not always been clear. These were largely exacerbated by a very public email row/rant between Marion Gunn and Michael Everson, despite both acting as alternates for each other on the ISO 639 Joint Advisory Committee, both representing Ireland on various ISO committees and both being partners in the same company in Dublin. 7. IUC (International Unicode Conference) At the IUC (International Unicode Conference) Peter Constable of the Summer Institute of Linguistics (SIL) presented a paper which takes a measured view of the various problems outlined above. SIL has been approached by Unesco, and by IT industry representatives in the past, to use the 3-letter Ethnologue codes (or SIL codes), in various contexts and applications. There has also been recent discussions on the iso639@dkuug.dk list on these issues. There seems more interest from the IT industry in these codes and/or these entities than in the ISO 639 codes. There are some issues of language definition in the SIL codes that need to be addressed, but less than those in ISO 639 (which name but which do not define or identifiy the language). It seems likely that work on "ISO 639 part 3" could also relate to this. Given the IT industry interest, and also the likely convergence of de facto, RFC and ISO codes, it would be preferable in my view for ISO/IEC JTC1/SC22/WG20 (and for IETF) to await further developments in language coding in this area ("ISO 639 part 3" and SIL codes) before updating anything relating to standards of ISO/IEC JTC1/SC22/WG20. I hope that Peter Constable's IUC paper can be distributed as a ISO/IEC JTC1/SC22/WG20 paper. John Clews 20 September 2000. -- John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG tel: +44 1423 888 432; fax: + 44 1423 889061; Email: Converse@sesame.demon.co.uk Committee Chair of ISO/TC46/SC2: Conversion of Written Languages; Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization; Committee Member of ISO/TC37: Terminology; Committee Member of the Foundation for Endangered Languages. Page 1 C:\WINNT\Profiles\winkleaf\Application Data\Microsoft\Templates\Normal.dot