Chapter 7 - Character Set Translation

This chapter describes how you can define how the Tachyon File Tools programs are to perform translation between EBCDIC, ASCII and Unicode character sets.

Code Points and Code Pages

There are numerous ways to define which code points (numeric values) are assigned to which characters. Each of these schemes can be called a code page. IBM has assigned numbers to many of the different code pages.

Since the Tachyon File Tools programs can read and write files containing characters encoded in EBCDIC and/or ASCII, the programs need to know how to translate between these character sets. Two environment variables are used to tell the Tachyon File Tools programs in which EBCDIC and ASCII code pages they are to assume that the characters are encoded.

Tachyon File Tools supports over 90 different Single Byte Character Set (SBCS) code pages. All of the EBCDIC and ASCII code pages are defined in terms of their translation to and from Unicode. When translating between EBCDIC and ASCII, characters are effectively converted first to Unicode and then to the target code page.

Normally any two given code pages will not define the same set of 256 characters, so usually some characters cannot be translated between the code pages. The Tachyon File Tools programs require that the uppercase letters A-Z, the lowercase letters a-z and the digits 0-9 translate to their usual code points. The national characters at EBCDIC code points X'5B', X'7B' and X'7C' must also be translatable to ASCII. In code page 37 (EBCDIC USA), these code points correspond to the $, # and @ characters. If the EBCDIC national character at X'C0' is translatable to ASCII, that character may be used in data set names.

Note: Some EBCDIC code pages such as 290 (EBCDIC Katakana), 803 (EBCDIC Hebrew) and 1030 (EBCDIC Katakana Extended) do not define the lowercase letters a-z to their normal code points. These code pages are not usable by Tachyon File Tools.


Environment Variables

Tachyon File Tools recognizes two environment variables for the purpose of overriding the default ASCII and EBCDIC code pages. The environment variables are:

CODEPAGE_ASCII

The value of this environment variable must be a decimal number that is one of the supported ASCII code pages. If the CODEPAGE_ASCII environment variable is not set, the default is code page 819 (ISO-8859-1 Latin-1).

CODEPAGE_EBCDIC

The value of this environment variable must be a decimal number that is one of the supported EBCDIC code pages. If the CODEPAGE_EBCDIC environment variable is not set, the default is code page 1047 (Latin-1).

The default set of code pages (ASCII 819 and EBCDIC 1047) provide the same translation between ASCII and EBCDIC as was provided by default in versions 1 and 2 of Tachyon File Tools. These are also the default code pages used by z/OS UNIX Systems Services, the Tachyon z/Assembler and the Tachyon Operating System.


EBCDIC Code Pages

Code PageDescription
00037EBCDIC USA, Canada, Australia, New Zealand, Netherlands, Brazil, Portugal
00264EBCDIC Print Train and Text Processing
00273EBCDIC Austria, Germany
00274EBCDIC Belgium
00275EBCDIC Brazil
00277EBCDIC Denmark, Norway
00278EBCDIC Finland, Sweden
00280EBCDIC Italy
00281EBCDIC Japanese English
00284EBCDIC Spanish
00285EBCDIC United Kingdom
00293EBCDIC APL
00297EBCDIC France
00420EBCDIC Arabic
00423EBCDIC Greek
00424EBCDIC Hebrew
00500EBCDIC Latin-1
00838EBCDIC Thai
00870EBCDIC Latin-2
00871EBCDIC Iceland
00875EBCDIC Greek
00880EBCDIC Cyrillic
00924EBCDIC Latin-9
01005EBCDIC Isomophic Text Communication
01025EBCDIC Russian
01026EBCDIC Turkey
01027EBCDIC Japanese (Latin) Extended
01031EBCDIC Japanese (Latin) Extended
01047EBCDIC Latin-1
01122EBCDIC Estonia
01123EBCDIC Ukraine
01130EBCDIC Vietnamese
01140EBCDIC USA, Canada, Australia, New Zealand, Netherlands
01141EBCDIC Austria, Germany
01142EBCDIC Denmark, Norway
01143EBCDIC Finland, Sweden
01144EBCDIC Italy
01145EBCDIC Spanish
01146EBCDIC United Kingdom
01147EBCDIC France
01148EBCDIC Latin-1
01149EBCDIC Iceland
01153EBCDIC Latin-2
01154EBCDIC Cyrillic
01155EBCDIC Turkey
01156EBCDIC Baltic
01157EBCDIC Estonia
01158EBCDIC Ukraine
01160EBCDIC Thai
01164EBCDIC Vietnamese
01165EBCDIC Latin-2


ASCII Code Pages

Code PageDescription
00367US-ASCII-7
00437DOS USA
00720DOS Arabic
00737DOS Greek
00775DOS Baltic
00813ISO-8859-7 Greek
00819ISO-8859-1 Latin-1 Western European
00850DOS Latin-1
00852DOS Latin-2
00855DOS Cyrillic
00856DOS Hebrew
00857DOS Turkish
00858DOS Latin-1 + Euro
00860DOS Portuguese
00861DOS Icelandic
00862DOS Israel
00863DOS French Canadian
00864DOS Arabic
00865DOS Nordic
00866DOS Russian
00869DOS Greek
00874ISO-8859-11 Thai
00878KOI8-R Russian
00907ASCII APL
00910DOS APL
00912ISO-8859-2 Latin-2 Eastern European
00913ISO-8859-3 Latin-3 Southern European
00914ISO-8859-4 Latin-4 Northern European
00915ISO-9959-5 Cyrillic
00916ISO-8859-8 Hebrew
00919ISO-8859-10 Latin-6 Nordic
00920ISO-8859-9 Latin-5 Turkish
00921ISO-8859-13 Latin-7 Baltic
00923ISO-8859-15 Latin-9
01006DOS Urdu
01089ISO-8859-6 Arabic
01139ASCII Japanese Alphanumeric Katakana
01250Windows Latin-2
01251Windows Cyrillic
01252Windows Latin-1
01253Windows Greek
01254Windows Latin-5 Turkish
01255Windows Hebrew
01256Windows Arabic
01257Windows Baltic
01258Windows Vietnamese


Frames No Frames Previous Next Contents
Introduction Setup Link Edit Un-Xmit Re-Xmit Update Utility Translation
© Copyright 1999-2006, Tachyon Software® LLC.
Last modified on July 30, 2006