2.2.7 Internationalized Domain Name (idn)

NOTE: This charter is a snapshot of the 48th IETF Meeting in Pittsburgh, Pennsylvania. It may now be out-of-date. Last Modified: 17-Jul-00

Chair(s):

James Seng <jseng@pobox.org.sg>
Marc Blanchet <Marc.Blanchet@viagenie.qc.ca>

Internet Area Director(s):

Thomas Narten <narten@raleigh.ibm.com>
Erik Nordmark <nordmark@eng.sun.com>

Internet Area Advisor:

Erik Nordmark <nordmark@eng.sun.com>

Technical Advisor(s):

John Klensin <klensin@jck.com>
Harald Alvestrand <Harald.Alvestrand@maxware.no>

Mailing Lists:

General Discussion:idn@ops.ietf.org
To Subscribe: idn-request@ops.ietf.org
Archive: ftp://ops.ietf.org/pub/lists/idn*

Description of Working Group:

The goal of the group is to specify the requirements for internationalized access to domain names and to specify a standards track protocol based on the requirements.

The scope of the group is to investigate the possible means of doing this and what methods are feasible given the technical impact they will have on the use of such names by humans as well as application programs, as well as the impact on other users and administrators of the domain name system.

A fundamental requirement in this work is to not disturb the current use and operation of the domain name system, and for the DNS to continue to allow any system anywhere to resolve any domain name.

The group will not address the question of what, if any, body should administer or control usage of names that use this functionality.

The group must identify consequences to the current deployed DNS infrastructure, the protocols and the applications as well as transition scenarios, where applicable.

The WG will actively ensure good communication with interested groups who are studying the problem of internationalized access to domain names.

The Action Item(s) for the Working Group are

1. An Informational RFC specifying the requirements for providing Internationalized access to domain names. The document should provide guidance for development solutions to this problem, taking localized (e.g. writing order) and related operational issues into consideration.

2. An Informational RFC or RFC's documenting the various proposals and Implementations of Internationalization (i18n) of Domain Names. The document(s) should also provide a technical evaluation of the proposals by the Working Group.

3. A standards track specification on access to internationalized domain names including specifying any transition issues.

Goals and Milestones:

Feb 00

  

First draft of the requirements document

Mar 00

  

Presentation and discussion at IETF-Adelaide

May 00

  

Second version of the requirement document

Jul 00

  

Final discussion on the requirement document

Aug 00

  

Req document wg last call

Sep 00

  

First draft of comparaison document

Dec 00

  

Final discussion of comparaison document

Dec 00

  

Protocol RFC first draft

Jan 01

  

Comparaison document wg last call

Mar 01

  

Protocol RFC second draft

Mar 01

  

Transition RFC first draft

Jun 01

  

Protocol RFC wg last call

Jun 01

  

Transition RFC second draft

Sep 01

  

Transition RFC wg last call

Internet-Drafts:

No Request For Comments

Current Meeting Report

Internationalized domain names (idn)
wg meeting notes
IETF Pittsburg, Aug 2000

Notes done by David Conrad, Thanks, David.

Agenda bashing -- no changes.

1. Marc Blanchet: WG update
1.1 new rev of charter since last meeting. major changes:

1.2 New working group web site: http://www.i-d-n.net

1.3 RFC 2026 reiteration

2. Requirements Draft (James Seng for Zita Wenzel)

James, as WG co-chair is no longer editing requirements draft.

No presentation, will go through the ID and highlight the important bits

Version 3 removes many of the requirements of version 2 which was felt to have too many (35). Likely no proposal could meet all requirements in v2. We spent 3 months going through the requirements to see what could be removed, what would be nice, etc.

We have come to a consensus that we should use Unicode as the base character set. Any proposal which uses localized encoding will not meet IDN requirements.

New section to clarify difference between hostnames and domain names.

Graphic representation of DNS architecture/infrastructure from Harald included. Focus our energy on the big box in the diagram (forwarding, caching, parent-zone, and root server). Will consider the other boxes, but not the focus.

KM: most important parts aren't in the picture. If you concentrate on wire protocol and don't consider users then the effort will fail. Must consider wider picture. Thorny issues lie in non-protocol interactions

JK: Computers don't care. WG is important due to the interaction of people with computers.

JS: We won't ignore the other aspects, but must remain focused on what must be done, not on what is outside of WG scope. IAB has an RFC on internationalization that addresses things the WG should consider. If we can't solve the basics, then we can't go on to the next steps.

DC: There is a standard that we make that isn't over the wire. In constrained circumstances -- business card model -- we must deal with non-protocol stuff. What can go on business cards will affect what we're doing.

Requirements:

AB(?): Which Unicode version? (3.0), bidirectionality? (yes)

KM: Fundamental assumption appears to change the DNS -- I don't see that as appropriate for a requirements document. The interactions you care about are app to app, app to user, and user to app -- none affect the DNS. Stuff that happens at higher layers is much more important that what happens on the wire.

JS: Does the reqs doc give the impression that the DNS is to be changed?

KM: There are implications, yes. The focus is on the DNS protocol but the problem is higher up.

MB: Will your concerns be sent to the mailing list?

KM: Yes

JS: Didn't mean to give the impression that the DNS was going to change.

HA: when thinking of the DNS as a set of services, if we are to keep sane, then we should think of interationalized equivalents as new services that are to be made available, not as changes to existing services. Mapping of name to address should have to services -- map as we know it and map as the future may require. We shouldn't expect to convert applications by switching lower layers. The new services might not work exactly the same way the existing services.

MB: will you write an draft about this idea

JK: you can assume a draft will appear

KM: I agree with Harald. I believe there is a whole set of missing requirements for incremental deployment. You have to have the least possible disruption. Changes must be independent of each other.

JS: see requirement 10.

JI: this is a problem we should not be solving. What problem are we trying to solve?

JS: this should go to the mailing list.

3. RACE (Paul Hoffman)

draft-ietf-idn-race

How to do an ascii compatible representation of internationalized characters.
This proposal does not specify how it is to be used.

Fully compatible with today's DNS.

3 step process:

Prefix will change.

Each name part must be 63 octets to conform with the existing DNS. race favors names that are all in one row.

Can get up to 35 characters if single row.

Can get up to 17 characters if two or more rows and one of the rows is non-zero.

Can get 17 to 33 characters if usign two or more but also using row 0.

RACE is an ace format in ace-1 in the comparison document. Includes an identifying mechanism for ace-2 namely ace-2.1.1

HA: have you considered using UTR-6? Yes, you don't get alot of advantage and UTR-6 does a lot of bit shifting which will be hard to implement.

KM: do you define a canonicalization form? No.

KM: are their multiple outputs? No. Another reason not to use UTR-6.

AG: strange to propose ways of compressing into 63 ascii since the wire format doesn't care -- the 63 limitation is at the resolver. Yes.

AG: restrictions are likely not per label.

JK: applications are likely to make bogus assumptions.

BS: not using ACE on the wire? Yes.

BS: what is ace expecting to receive? it is expecting unicode code points. input to the compression is utf-16.

4. UDNS (Paul Hoffman for Dan Oscarsson)

draft-idn-udns-00.txt

Attempts to be a full protocol specification. how do you flag idn awareness in dns queries so idns can be handed back. If not flagged, you must not give back internationalized names.

How to flag: use the IN bit in the DNS query. Last unused bit in the second word. Arguably safe.

Proposes UCS normalization form C encoded in utf-8 with an ACE for backwards compatibility.

DC: how does the length limit issue affect idn?

PH: UTF-8 restricts length of non-English idn's.

OG: deployment problems due to forwarding or recursive servers -- some servers blindly copy those bits.

PH: Right.

MA: Broken servers are broken servers. Don't try to work with them.

JS: On length issues, Thai names are very very long.

DC: some length limit is a fact of life.

PH: yup.

5. ICU (Hyewon Shin)

draft-ietf-idn-icu

Uses IN bit to identify queries
Use UTF-8 as wire format
Case folding/canonicalization before transmission

IN bit indicates wehether the query is from IDNS resolver/server or not and reduce overhead of canonicalizatin

unicode as CCS
utf-8 as CES
all domain names queries should be encoded into Unicode before being
used in resolvers.
resolvers convert the queries into UTF-8

Case folding in locale independent before transmission indicated by IN bit

Valid query formats are indentified with the IN bit

JS: change the title of your internet draft -- calling it the architecture of internationalized domain names is misleading.

PH: you talk about case folding, but you don't talk about canonicalization of the more complex stuff (a+umlaut vs. a-umlaut). Will non-canonicalized names get passed to the resolver? canonicalization is not addressed. it would be done at the same place as case folding.

BS: the resolver does the UTF-8 encoding -- what is the application sending to the resolver? we assumed unicode.

DC: proposing the creation of a parallel DNS service? yes.

DC: do you discuss interworking with existing DNS? not yet.

6. Microsoft's approach (Stuart Kwan)

draft-skwan-utf8-dns-04.txt

Microsoft had a requirement to move people off WINS. WINS allowed the use of Unicode names.

KM: who imposed the requirement

SK: WINS didn't scale.

In Win2K client can initiate query with unicode name, resolver converts names to UTF-8 (does not downcase).

On the server side, database load downcases. On query, downcase and do a byte-for-byte comparison.

Very few changes since -00 draft. Win2K implemetnation hasn't changed.

Biggest flaw: there is no normalization. Not sure what is the best.

Would like to be published as informational.

Michael Patton: make editorial changes to update about what've you've learned.

SK: there is a big emphasis to only use these names when absolutely necessary. But we'll update the draft as requested.

PH: What is WSALookupServiceBegin/Next but that doesn't exist in the draft.

SK: Application gives us unicode and we turn it to utf-8.

PH: so this sends utf-8 over the net.

SK: yes. this tends to be self-correcting.

PH: needs to be discussed in the document.

SK: OK.

BS: any experience with existing applications.

SK: userbase is too large to poll, but nobody has complained.

SK: Microsoft will implement the idn standard when ready.

7. IDNE (Marc Blanchet)

Until a month ago, no proposal using EDNS.

Rationale:

Description

strings in labels are pre-processed

using edns,

current maximums:

idne maximum are:

rationale:

idn api

transition and deployment

Enhancements?

Yergeau proposed major and minor revision numbers

Language tagging?

Compression needed?

MA: extending total overall length of a name is problematic.

MB: yes but application must be IDN aware.

JS: language tagging using plane 14?

MB: yes, since using edns give more space.

Are you using stateful encoding?

MB: No.

OG: Very good first start. Since you use EDNS, you only use modern servers and you can determine if downstream servers can work with EDNS.

DC: Your statement that near term may replace the long term is very insightful. A lot of pressure now to deploy.

MB: Yes.

8. Name Preparation (Paul Hoffman)

draft-ietf-idn-nameprep

Requirements:

current order:

Possible altenative

open issues

KM: locale specific feedback mechanism may imply the DNS is simply unsuitable to do internationalization.

PH: Yes. Currently no documents xxx

Where do we do name preparation?

3 places possible:

There are reasons to do it in each and really good reasons to not do it in each. document is neutral.

TH: seems hard to get the error conditions out of the 4 step model. Is there any other group else who can solve this since we don't have the expertise to do this?

PH: No one has stood up to this task.

JK: There are a few organizations who have looked at this and run away.

PH: individuals at those organizations have indicated they'd help

DC: dns service will guarantee failure -- there will be enough infrastructure change that it'll take years to deploy. What is the goal? DNS has no semantics on the strings. Probably in the resolver.

JS: reverse logic -- forbid characters by default, permit specific characters.

PH: they are equivalent.

JS: easier to check what you want than what you don't want.

=====================================================================

1. Using DNS for Canonicalization data

Items that should be defined as local canonicalization rules:

Advantages:

Disadvantages:

How to provide:
define usable characters as txt rrs
define meta information as txt rr

use idn.arpa domain for table defintion

norm-form - early normalization method name
norm-form-version
norm-form-url

"." - a character the same as the label
".^" - a character the same as label but not allowed as the first (e.g., '-')
".$" - same as above but can't be used as the last
"a" is the character itself

how it works:

Issues:

Using DNAME to reduce queries

add DNAME and CNAME for each canonicalized character
use DNAME instea of TXT

Issues for this method:

OG: both DNAME and CNAME are terminal nodes. Also
CNAME can't be used with anything else but security records. You assume servers don't do case folding but they do. DNAME can't point up in the hierarchy.

LJL: suppose I want to register a name in .JP. Rules are connected to the TLD and not to the language -- the rules should be connected to the language not the nation. Rules are fixed per name.

Using TXT: won't the resolver get confused? IDN.ARPA being recursive won't work. How does this work with DNSSEC?

YY: TLD defines the rules for registration.

JS: can we move it to the mailing list? The TLD defines what characters are valid.

HA: what is the advantage of this per-character approach? Why not put posix local def into a domain?

YY: DNS is not the best method of doing this, but this is used for the DNS, so only the DNS is being used.

HA: revise the proposal without storing character data in the DNS.
Also think about how this works with clients that do not have the code and what the size of the client code will be and how often you expect the client to upgrade. I think this approach has some good things, needs more discussion.

2. Han ideograph for IDN (James Seng)

why I did this draft:

HI are CJK composed of radicals which are made of simple strokes

HI originated from China

HI commonly used in China Japan Korea Taiwan Hong Kong Singapore Malaysia

Case folding:

zvariants are HI which share the same etymology but the glyph varies in some minor way -- should be considered equivalent

Chinese:

there are 2244 SC in last official count and Unicode has 2145

there are multipe TC for one SC. SC-to-TC is almost impossible (need context information)

TC-to-SC may be workable -- may not be perfect, but it can work.

TH: we should do code point to code point because mapping will make thing far too hard due to the need for contextual information.

JS: please read the draft -- just discussing the issues.

TC and SC aren't usually mixed.

TH: not true.

SC and TC are seldom used in the same phrase. You can solve mapping using CNAME and DNAME.

Korean:

Hangul is more commonly used now instead of Chinese derived characters. Hangul doesn't have meaning like Chinese ideographs.

Have their own ideographs with simplified forms.

Japanese:

Kanji, Hiragana and Katakana. Kanji is based on Chinese. Hiragana is a sylabary

Japanese in written form is a vocal script which maps how it is pronounced fairly accurately. Most verbs and nouns are written in Kanji.

Depending on context, pronunciation may be different. Conversion between hiragana and kanji is not practical.

Has their own ideographs (kokuji) with simplified forms.

Ideograph Description:

The same characters can be constructed in multiple ways.

Mechanism:

HI may or may not be folded for the comparison of domain names.
Folding may occur at

In particular, folding during registration time is critical for operational reasons even if we do not adopt any Han folding.

HA: to summarize, one of the real problems is that when someone presents a domain name, they feel they have the right to all the variants, e.g., a chinese name can be represented with different characters in Korean chinese chars and Japanese chinese chars.

PH: saying people will expect certain things. We shouldn't listen to what will be legal or not -- just focus on languages.

3. DNSII-MDNP (Edmon Chung, David Leung)

written an internet-draft, but missed deadline.

been working on this for more than a year

goal: put all the characters into the internet

dnsii protocol has two parts

Prefer 10 over edns (01) since the 3rd and 4th bits for future expansion. EDNS reduced possibilites for future expansion

there should be no ambiguity

from RFC 2277: all protocols must identify for all character data which charset is in use.

compression and edns will still work as expected

should not require any adjustment to dnssec or ipv6

charset encoding:

canonicalization:

Han folding is similar to treating color and colour identically.

we have working code.

this approach is patented

HA: nameservers everywhere must be able to convert between all 400 characters sets, right?

An implementation decision.
HA: What do you do when it encounters a character set it doesn't understand?

The fall back should be back into UTF-7, if still can't be found, return an error.

HA: So the client must know how to convert?
No. Take it to the mailing list.

OG: two observations, I strongly encourage everyone using EDNS label types allows clients to discover capabilities on the server. Don't worry about saving a few bits. Think more about how ideas should be expressed.

DC: One of the considerations of EDNS we worry that the second byte is a second count.

OG: send a note to namedroppers on protocol issues.

PF: two questions: re-emphasize what Harald says. From the email world, best to use as limited characters. Do client side conversion into simple character sets. second: what about future character sets -- you will have to fallback all the time.

DC: We encourage the use of Unicode.

PF: For this to not become a local solution it should be done as close as possible to the client.

4. Evaluation of proposed Encodings for IDN (Yashuhiro Morishita)

mDNKit -- multilingual domain name evaluation kit
objectives:

Developed by JPNIC, released Jul 13

components:

evaluated drafts:

evaluation points
limitations of usage

race and utf5 are ascii compatible encoding

utf8 has incompatibilities with present DNS

ace strings need identifier to distinguish from normal ascii strings.
-- uses ra--

utf-5 uses zld

charset encoding/conversion tools are essential

race currently best method suitiable for transition
utf8 incompatibile with current dns

todo:

LM: if you evaluate cut and paste, most of these systems
cut and paste doesn't work very well. any of these systems had usable cut and paste?

YM: race is best current mehtod for cut and paste since it uses only ascii.

LM: if you have a display method that show race in ASCII and you cut/paste it how do you have interoperability between application and display?

JS: properly implemented system will use MIME on the paste.

5. NuBIND Implementation (Bill Semich)

original goal was to internationalize BIND
maximum support for internet standards

rfc2277, 2279, iso-10646, Unicode UTR15 and 21

3 components:

JK: intellectual property?

BS: not submitting this as a IETF submission. "nubind" name is
trademarked.

Current implementation status:

mail servers, others unexpectedly fail
legacy dns servers

security considerations

application problems

client environment problems
dhcp server configuration
host lcient with an idn resolver setting
all current unix resovlers support ascii only

http server problems

postpone implementation of idn in the DNS until minmal impact
standards or alternatives are accepted
minimize impact on DNS

Why use the internet infrastructure to achieve application goals.

JS: will you submit an ID?

BS: Can submit

TN: need to remove copyright notice in presentation.

BS: take it offline.

6. Comparison of IDN proposals (Paul Hoffman)

draft-ietf-idn-compare

wrapup of technical presentations. talking about comparison doc.

Basic idea of doc is to describe significant features that we need to think about. Includes pros and cons and what features are really needed.

sections of docs

arch:

will be updated to add details about what is sent between app<->resolver, resolver<->server, server<->server

names in binary

will be updated to add where the different markings will be used

names in ascii

prohibited characters

BM: had a machine called <ctrl-s>. There is a distinction that needs to be made between hostname and domain names.

PH: Yes.

MA: reference RFC is 952, not 1035

JS: lot of confusion on this issue.

HA: this is in the requirements doc

canonicalization

transitions

draft will be updated to add specific details for transition and what needs to be transitioned?

EN: are there drafts that talk about transitions?

PH: No. Drafts on transitions do not need to be associated with a proposal.

PF: need clear distinction between Unicode consortium work
and this WG.

PH: part of transition will include how to get groups outside the us to transition with us.

ISO has groups which determine code points in the repertoire. Unicode consortium does not add code points to ISO standards, ISO does. As such, we don't need to liaison with ISO -- just need to be aware of what ISO does in this space.

Root server considerations

JS: RS ops worried about operational implications, e.g., how the RS op
will verify data is correct.

Security considerations:

Expected changes:

please specify categories

please talk about this on the mailing list

draft will be updated within a few weeks

BS: might be important to list patent and IP issues.

PH: maybe. will defer to the AD.

EN: we already have a process to do this.

PH: might be worthwhile to list IPR IETF has been notified of.

MB: I can put it on the website

EN: use a generic notice, not a listing of IPR

7. Working Group Next Steps (Marc Blanchet)

Requirements doc

Need minor revisions. Pretty ready to move to last call.

EN: there are some items that need to be resolved.

HA: need to look at Keith's comments. The chairs/author should declare that the comments on the draft should be 'identify problem, old text, new text'. No comments will be accepted that aren't in this form. Have a hard deadline (2 weeks).

MB: document editors agree?

JS: yes.

MB: OK.

Comparison document

JS: should consider transition period.

HA: could discuss the transition properties without a proposal, but mechanisms will depend on proposals.

MB: wg agreement to keep it going.

3 types of solutions

Want one protocol at the end.

Convergence process:

EN: list 3 solutions, but no proposals to converge. should focus on the dns proposal

OG: good way to procede. might be better to 'cherry pick' from all the proposals. maybe use authors as the design team.

JK (pretending to be Zita): she wants to reiterate taking discussions to the list. Also agrees with Harald's proposal.

HA: in the solution space, trying to converge on the best dns based solution. if we can't come up with a solution that meets the requirements or can't be deployed, then we look at other approach.

BS: a long term solution may be more appropriate to look at than short term.

JS: other solutions not using the DNS may exist, e.g. directory based solution.

Everyone agree to the process? Need to set up the design team -- please send mail to MB.

AB: Aaron Brunner
AG: Andreas Gustafsson
BM: Bill Manning
BS: Bill Semich
DC: Dave Crocker
EN: Erik Nordmark
HA: Harald Alvestrand
KM: Keith Moore
JK: John Klensin
JI: John Ioannidis
JS: James Seng
LJL: Lars-Johan Liman
LM: Larry Masinter
MA: Mark Andrews
MB: Marc Blanchet
OG: Olafur Gudmundsson
PF: Patrick Falstrom
PH: Paul Hoffman
TH: Ted Hardie
TN: Thomas Narten

Slides

Comparison of Internationalized Domain Name Proposals
DNSII-MDNP
Evaluation of Proposed Encodings for IDN by mDNkit
Han Ideograph for IDN
Architecture of Internationalized Domain Name System
IDN Using EDNS (IDNE)
Name Preparation in IDN
WG Next Steps
RACE: Row-based ASCII Compatible Encoding for IDN
(draft-skwan-utf8-dns-04.txt)
Using DNS for Canonicalization Data
Using Universal Character Set Data in the DNS
IDN WG Update