9 #=======================================================================
10 # File name: GUJARATI.TXT
12 # Contents: Map (external version) from Mac OS Gujarati
13 # encoding to Unicode 2.1 and later.
15 # Copyright: (c) 1997-2002, 2005 by Apple Computer, Inc., all rights
18 # Contact: charsets@apple.com
22 # c02 2005-Apr-05 Update header comments. Matches internal xml
23 # <c1.1> and Text Encoding Converter 2.0.
24 # b3,c1 2002-Dec-19 Update URLs. Matches internal utom<b1>.
25 # b02 1999-Sep-22 Update contact e-mail address. Matches
26 # internal utom<b1>, ufrm<b1>, and Text
27 # Encoding Converter version 1.5.
28 # n02 1998-Feb-05 First version; matches internal utom<n4>,
34 # Apple, the Apple logo, and Macintosh are trademarks of Apple
35 # Computer, Inc., registered in the United States and other countries.
36 # Unicode is a trademark of Unicode Inc. For the sake of brevity,
37 # throughout this document, "Macintosh" can be used to refer to
38 # Macintosh computers and "Unicode" can be used to refer to the
41 # Apple Computer, Inc. ("Apple") makes no warranty or representation,
42 # either express or implied, with respect to this document and the
43 # included data, its quality, accuracy, or fitness for a particular
44 # purpose. In no event will Apple be liable for direct, indirect,
45 # special, incidental, or consequential damages resulting from any
46 # defect or inaccuracy in this document or the included data.
48 # These mapping tables and character lists are subject to change.
49 # The latest tables should be available from the following:
51 # <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
53 # For general information about Mac OS encodings and these mapping
54 # tables, see the file "README.TXT".
59 # Three tab-separated columns;
60 # '#' begins a comment which continues to the end of the line.
61 # Column #1 is the Mac OS Gujarati code or code sequence
62 # (in hex as 0xNN or 0xNN+0xNN)
63 # Column #2 is the corresponding Unicode or Unicode sequence
64 # (in hex as 0xNNNN or 0xNNNN+0xNNNN).
65 # Column #3 is a comment containing the Unicode name or sequence
66 # of names. In some cases an additional comment follows the
69 # The entries are in two sections. The first section is for pairs of
70 # Mac OS Gujarati code points that must be mapped in a special way.
71 # The second section maps individual code points.
73 # Within each section, the entries are in Mac OS Gujarati code order.
75 # Control character mappings are not shown in this table, following
76 # the conventions of the standard UTC mapping tables. However, the
77 # Mac OS Gujarati character set uses the standard control characters
78 # at 0x00-0x1F and 0x7F.
80 # Notes on Mac OS Gujarati:
81 # -------------------------
83 # This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
84 # environments, it is only supported via transcoding to and from
87 # Mac OS Gujarati is based on IS 13194:1991 (ISCII-91), with the
88 # addition of several punctuation and symbol characters. However,
89 # Mac OS Gujarati does not support the ATR (attribute) mechanism of
92 # 1. ISCII-91 features in Mac OS Gujarati include:
94 # a) Overloading of nukta
96 # In addition to using the nukta (0xE9) like a combining dot below,
97 # nukta is overloaded to function as a general character modifier.
98 # In this role, certain code points followed by 0xE9 are treated as
99 # a two-byte code point representing a character which may be
100 # rather different than the characters represented by either of
101 # the code points alone. For example, the character GUJARATI OM
102 # (U+0AD0) is represented in ISCII-91 as candrabindu + nukta.
104 # b) Explicit halant and soft halant
106 # A double halant (0xE8 + 0xE8) constitutes an "explicit halant",
107 # which will always appear as a halant instead of causing formation
108 # of a ligature or half-form consonant.
110 # Halant followed by nukta (0xE8 + 0xE9) constitutes a "soft
111 # halant", which prevents formation of a ligature and instead
112 # retains the half-form of the first consonant.
114 # c) Invisible consonant
116 # The byte 0xD9 (called INV in ISCII-91) is an invisible consonant:
117 # It behaves like a consonant but has no visible appearance. It is
118 # intended to be used (often in combination with halant) to display
119 # dependent forms in isolation, such as the RA forms or consonant
122 # d) Extensions for Vedic, etc.
124 # The byte 0xF0 (called EXT in ISCII-91) followed by any byte in
125 # the range 0xA1-0xEE constitutes a two-byte code point which can
126 # be used to represent additional characters for Vedic (or other
127 # extensions); 0xF0 followed by any other byte value constitutes
128 # malformed text. Mac OS Gujarati supports this mechanism, but
129 # does not currently map any of these two-byte code points to
132 # 2. Mac OS Gujarati additions
134 # Mac OS Gujarati adds characters using the code points
135 # 0x80-0x8A and 0x90.
137 # 3. Unused code points
139 # The following code points are currently unused, and are not shown
140 # here: 0x8B-0x8F, 0x91-0xA0, 0xAB, 0xAF, 0xC7, 0xCE, 0xD0, 0xD3,
141 # 0xE0, 0xE4, 0xEB-0xEF, 0xFB-0xFF. In addition, 0xF0 is not shown
142 # here, but it has a special function as described above.
144 # Unicode mapping issues and notes:
145 # ---------------------------------
147 # 1. Mapping the byte pairs
149 # If one of the following byte values is encountered when mapping
150 # Mac OS Gujarati text - xA1, xAA, xDF, or 0xE8 - then the next
151 # byte (if there is one) should be examined. If the next byte is
152 # 0xE9 - or also 0xE8, if the first byte was 0xE8 - then the byte
153 # pair should be mapped using the first section of the mapping
154 # table below. Otherwise, each byte should be mapped using the
155 # second section of the mapping table below.
157 # - The Unicode Standard, Version 2.0, specifies how explicit
158 # halant and soft halant should be represented in Unicode;
159 # these mappings are used below.
161 # If the byte value 0xF0 is encountered when mapping Mac OS
162 # Gujarati text, then the next byte should be examined. If there
163 # is no next byte (e.g. 0xF0 at end of buffer), the mapping
164 # process should indicate incomplete character. If there is a next
165 # byte but it is not in the range 0xA1-0xEE, the mapping process
166 # should indicate malformed text. Otherwise, the mapping process
167 # should treat the byte pair as a valid two-byte code point with no
168 # mapping (e.g. map it to QUESTION MARK, REPLACEMENT CHARACTER,
171 # 2. Mapping the invisible consonant
173 # It has been suggested that INV in ISCII-91 should map to ZERO
174 # WIDTH NON-JOINER in Unicode. However, this causes problems with
175 # roundtrip fidelity: The ISCII-91 sequences 0xE8+0xE8 and 0xE8+0xD9
176 # would map to the same sequence of Unicode characters. We have
177 # instead mapped INV to LEFT-TO-RIGHT MARK, which avoids these
180 # Details of mapping changes in each version:
181 # -------------------------------------------
185 0x0000 - 0x007F = 0x00 -
256 #0x0ACD+0x200C = 0xE8+0xE8
257 #0x0ACD+0x200D = 0xE8+0xE9