This chapter describes how you can define how the assembler is to perform translation between EBCDIC, ASCII and Unicode character sets.
There are numerous ways to define which code points (numeric values) are assigned to which characters. Each of these schemes can be called a code page. IBM has assigned numbers to many of the different code pages.
Since the Tachyon assemblers can read and write files containing characters encoded in EBCDIC, ASCII and/or Unicode, the assembler needs to know how to translate between these character sets. The CODEPAGE option is used to tell the assembler in which EBCDIC and ASCII code pages it is to assume that the characters are encoded. In any assembly one EBCDIC and one ASCII code page will be used.
The assembler supports over 90 different Single Byte Character Set (SBCS) code pages. All of the EBCDIC and ASCII code pages are defined in terms of their translation to and from Unicode. When translating between EBCDIC and ASCII, characters are effectively converted first to Unicode and then to the target code page.
Normally any two given code pages will not define the same set of 256 characters, so usually some characters cannot be translated between the code pages. The assembler requires that all of the characters in the IBM High Level Assemblers Standard Character Set must be translatable between the selected pair of EBCDIC and ASCII code pages. All but three of the characters (the national characters) must translate to their usual code points. These characters are the uppercase letters A-Z, the lowercase letters a-z and the digits 0-9 as well as the following:
blank | & | ' | ( | ) | * | + | , | - | . | / | : | = | _ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASCII | 20 | 26 | 27 | 28 | 29 | 2A | 2B | 2C | 2D | 2E | 2F | 3A | 3D | 5F |
EBCDIC | 40 | 50 | 7D | 4D | 5D | 5C | 4E | 6B | 60 | 4B | 61 | 7A | 7E | 6D |
Note: Some EBCDIC code pages such as 290 (EBCDIC Katakana), 803 (EBCDIC Hebrew) and 1030 (EBCDIC Katakana Extended) do not define the lowercase letters a-z to their normal code points. These code pages are not usable by the assembler.
IBMs High Level Assembler uses the CODEPAGE option to define the EBCDIC code page of the source files. It uses this code page information only to translate EBCDIC characters to Unicode in CU constants and literals. The Tachyon assemblers support an extended CODEPAGE option to specify both an EBCDIC and an ASCII code page.
The CODEPAGE option is specified as CODEPAGE(ebcdic,ascii,list) where ebcdic is an EBCDIC code page number, ascii is an ASCII code page number, and list is either LIST or NOLIST. The code page numbers may be specified as either decimal numbers or their hexadecimal equivalents using the X'hex' notation. When setting the CODEPAGE option, the EBCDIC code page must be specified. If the ASCII code page number is omitted, the default is 819 (ISO-8859-1 Latin-1). If the list option is omitted, the default is NOLIST. If LIST is specified, the resulting translation between the EBCDIC, ASCII and Unicode code points will be displayed in the assembly listing.
The default for the CODEPAGE option is CODEPAGE(1047,819,NOLIST). These code pages translate all 256 code points between EBCDIC and ASCII. These are also the default EBCDIC and ASCII code pages for z/OS UNIX Systems Services, Tachyon File Tools and the Tachyon Operating System. However, the default EBCDIC code page is different from High Level Assemblers default of CODEPAGE(1148).
The following shows the output of the CODEPAGE(1047,819,LIST) option:
CodePage(1047,819) EBCDIC/ASCII/Unicode Translation- 00/00/0000 01/01/0001 02/02/0002 03/03/0003 04/9C/009C 05/09/0009 06/86/0086 07/7F/007F 08/97/0097 09/8D/008D 0A/8E/008E 0B/0B/000B 0C/0C/000C 0D/0D/000D 0E/0E/000E 0F/0F/000F 10/10/0010 11/11/0011 12/12/0012 13/13/0013 14/9D/009D 15/85/0085 16/08/0008 17/87/0087 18/18/0018 19/19/0019 1A/92/0092 1B/8F/008F 1C/1C/001C 1D/1D/001D 1E/1E/001E 1F/1F/001F 20/80/0080 21/81/0081 22/82/0082 23/83/0083 24/84/0084 25/0A/000A 26/17/0017 27/1B/001B 28/88/0088 29/89/0089 2A/8A/008A 2B/8B/008B 2C/8C/008C 2D/05/0005 2E/06/0006 2F/07/0007 30/90/0090 31/91/0091 32/16/0016 33/93/0093 34/94/0094 35/95/0095 36/96/0096 37/04/0004 38/98/0098 39/99/0099 3A/9A/009A 3B/9B/009B 3C/14/0014 3D/15/0015 3E/9E/009E 3F/1A/001A 40/20/0020 41/A0/00A0 42/E2/00E2 43/E4/00E4 44/E0/00E0 45/E1/00E1 46/E3/00E3 47/E5/00E5 48/E7/00E7 49/F1/00F1 4A/A2/00A2 4B/2E/002E 4C/3C/003C 4D/28/0028 4E/2B/002B 4F/7C/007C 50/26/0026 51/E9/00E9 52/EA/00EA 53/EB/00EB 54/E8/00E8 55/ED/00ED 56/EE/00EE 57/EF/00EF 58/EC/00EC 59/DF/00DF 5A/21/0021 5B/24/0024 5C/2A/002A 5D/29/0029 5E/3B/003B 5F/5E/005E 60/2D/002D 61/2F/002F 62/C2/00C2 63/C4/00C4 64/C0/00C0 65/C1/00C1 66/C3/00C3 67/C5/00C5 68/C7/00C7 69/D1/00D1 6A/A6/00A6 6B/2C/002C 6C/25/0025 6D/5F/005F 6E/3E/003E 6F/3F/003F 70/F8/00F8 71/C9/00C9 72/CA/00CA 73/CB/00CB 74/C8/00C8 75/CD/00CD 76/CE/00CE 77/CF/00CF 78/CC/00CC 79/60/0060 7A/3A/003A 7B/23/0023 7C/40/0040 7D/27/0027 7E/3D/003D 7F/22/0022 80/D8/00D8 81/61/0061 82/62/0062 83/63/0063 84/64/0064 85/65/0065 86/66/0066 87/67/0067 88/68/0068 89/69/0069 8A/AB/00AB 8B/BB/00BB 8C/F0/00F0 8D/FD/00FD 8E/FE/00FE 8F/B1/00B1 90/B0/00B0 91/6A/006A 92/6B/006B 93/6C/006C 94/6D/006D 95/6E/006E 96/6F/006F 97/70/0070 98/71/0071 99/72/0072 9A/AA/00AA 9B/BA/00BA 9C/E6/00E6 9D/B8/00B8 9E/C6/00C6 9F/A4/00A4 A0/B5/00B5 A1/7E/007E A2/73/0073 A3/74/0074 A4/75/0075 A5/76/0076 A6/77/0077 A7/78/0078 A8/79/0079 A9/7A/007A AA/A1/00A1 AB/BF/00BF AC/D0/00D0 AD/5B/005B AE/DE/00DE AF/AE/00AE B0/AC/00AC B1/A3/00A3 B2/A5/00A5 B3/B7/00B7 B4/A9/00A9 B5/A7/00A7 B6/B6/00B6 B7/BC/00BC B8/BD/00BD B9/BE/00BE BA/DD/00DD BB/A8/00A8 BC/AF/00AF BD/5D/005D BE/B4/00B4 BF/D7/00D7 C0/7B/007B C1/41/0041 C2/42/0042 C3/43/0043 C4/44/0044 C5/45/0045 C6/46/0046 C7/47/0047 C8/48/0048 C9/49/0049 CA/AD/00AD CB/F4/00F4 CC/F6/00F6 CD/F2/00F2 CE/F3/00F3 CF/F5/00F5 D0/7D/007D D1/4A/004A D2/4B/004B D3/4C/004C D4/4D/004D D5/4E/004E D6/4F/004F D7/50/0050 D8/51/0051 D9/52/0052 DA/B9/00B9 DB/FB/00FB DC/FC/00FC DD/F9/00F9 DE/FA/00FA DF/FF/00FF E0/5C/005C E1/F7/00F7 E2/53/0053 E3/54/0054 E4/55/0055 E5/56/0056 E6/57/0057 E7/58/0058 E8/59/0059 E9/5A/005A EA/B2/00B2 EB/D4/00D4 EC/D6/00D6 ED/D2/00D2 EE/D3/00D3 EF/D5/00D5 F0/30/0030 F1/31/0031 F2/32/0032 F3/33/0033 F4/34/0034 F5/35/0035 F6/36/0036 F7/37/0037 F8/38/0038 F9/39/0039 FA/B3/00B3 FB/DB/00DB FC/DC/00DC FD/D9/00D9 FE/DA/00DA FF/9F/009FEach translatable EBCDIC character is displayed as a group of three code points. The first code point is for the selected EBCDIC code page, the second is for the selected ASCII code page and the third is the Unicode code point. If the EBCDIC character cannot be translated to ASCII, the ASCII code point will be listed as --.
Code Page | Description |
---|---|
00037 | EBCDIC USA, Canada, Australia, New Zealand, Netherlands, Brazil, Portugal |
00264 | EBCDIC Print Train and Text Processing |
00273 | EBCDIC Austria, Germany |
00274 | EBCDIC Belgium |
00275 | EBCDIC Brazil |
00277 | EBCDIC Denmark, Norway |
00278 | EBCDIC Finland, Sweden |
00280 | EBCDIC Italy |
00281 | EBCDIC Japanese English |
00284 | EBCDIC Spanish |
00285 | EBCDIC United Kingdom |
00293 | EBCDIC APL |
00297 | EBCDIC France |
00420 | EBCDIC Arabic |
00423 | EBCDIC Greek |
00424 | EBCDIC Hebrew |
00500 | EBCDIC Latin-1 |
00838 | EBCDIC Thai |
00870 | EBCDIC Latin-2 |
00871 | EBCDIC Iceland |
00875 | EBCDIC Greek |
00880 | EBCDIC Cyrillic |
00924 | EBCDIC Latin-9 |
01005 | EBCDIC Isomophic Text Communication |
01025 | EBCDIC Russian |
01026 | EBCDIC Turkey |
01027 | EBCDIC Japanese (Latin) Extended |
01031 | EBCDIC Japanese (Latin) Extended |
01047 | EBCDIC Latin-1 |
01122 | EBCDIC Estonia |
01123 | EBCDIC Ukraine |
01130 | EBCDIC Vietnamese |
01140 | EBCDIC USA, Canada, Australia, New Zealand, Netherlands |
01141 | EBCDIC Austria, Germany |
01142 | EBCDIC Denmark, Norway |
01143 | EBCDIC Finland, Sweden |
01144 | EBCDIC Italy |
01145 | EBCDIC Spanish |
01146 | EBCDIC United Kingdom |
01147 | EBCDIC France |
01148 | EBCDIC Latin-1 |
01149 | EBCDIC Iceland |
01153 | EBCDIC Latin-2 |
01154 | EBCDIC Cyrillic |
01155 | EBCDIC Turkey |
01156 | EBCDIC Baltic |
01157 | EBCDIC Estonia |
01158 | EBCDIC Ukraine |
01160 | EBCDIC Thai |
01164 | EBCDIC Vietnamese |
01165 | EBCDIC Latin-2 |
Code Page | Description |
---|---|
00367 | US-ASCII-7 |
00437 | DOS USA |
00720 | DOS Arabic |
00737 | DOS Greek |
00775 | DOS Baltic |
00813 | ISO-8859-7 Greek |
00819 | ISO-8859-1 Latin-1 Western European |
00850 | DOS Latin-1 |
00852 | DOS Latin-2 |
00855 | DOS Cyrillic |
00856 | DOS Hebrew |
00857 | DOS Turkish |
00858 | DOS Latin-1 + Euro |
00860 | DOS Portuguese |
00861 | DOS Icelandic |
00862 | DOS Israel |
00863 | DOS French Canadian |
00864 | DOS Arabic |
00865 | DOS Nordic |
00866 | DOS Russian |
00869 | DOS Greek |
00874 | ISO-8859-11 Thai |
00878 | KOI8-R Russian |
00907 | ASCII APL |
00910 | DOS APL |
00912 | ISO-8859-2 Latin-2 Eastern European |
00913 | ISO-8859-3 Latin-3 Southern European |
00914 | ISO-8859-4 Latin-4 Northern European |
00915 | ISO-9959-5 Cyrillic |
00916 | ISO-8859-8 Hebrew |
00919 | ISO-8859-10 Latin-6 Nordic |
00920 | ISO-8859-9 Latin-5 Turkish |
00921 | ISO-8859-13 Latin-7 Baltic |
00923 | ISO-8859-15 Latin-9 |
01006 | DOS Urdu |
01089 | ISO-8859-6 Arabic |
01139 | ASCII Japanese Alphanumeric Katakana |
01250 | Windows Latin-2 |
01251 | Windows Cyrillic |
01252 | Windows Latin-1 |
01253 | Windows Greek |
01254 | Windows Latin-5 Turkish |
01255 | Windows Hebrew |
01256 | Windows Arabic |
01257 | Windows Baltic |
01258 | Windows Vietnamese |