Add new codepage charmaps/IBM858 [BZ #21084]

This code page is identical to code page 850 except that X'D5'
has been changed from LI61 (dotless i) to SC20 (euro symbol).

The code points from /x01 to /x1f in the /localedata/charmaps/IBM858
file have the same mapping as those in localedata/charmaps/ANSI_X3.4-1968.
That means they disagree with with

ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00858.txt

in that range.
For example, localedata/charmaps/IBM858 and localedata/charmaps/ANSI_X3.4-1968 have:

   “<U0001>     /x01         START OF HEADING (SOH)”

whereas CP00858.txt has:

   “01 SS000000        Smiling Face”

That means that CP00858.txt is not really ASCII-compatible and to make
it ASCII-compatible we deviate fro CP00858.txt in the code points from /x01
to /x1f.

	[BZ #21084]
	* benchtests/strcoll-inputs/filelist#en_US.UTF-8: Add IBM858 and ibm858.c.
	* iconvdata/Makefile: Add IBM858.
	* iconvdata/gconv-modules: Add IBM858.
	* iconvdata/ibm858.c: New file.
	* iconvdata/tst-tables.sh: Add IBM858
	* localedata/charmaps/IBM858: New file.
This commit is contained in:
Mike FABIAN 2017-09-07 15:28:28 +02:00
parent fcc82c06dc
commit 799c8d6905
7 changed files with 336 additions and 8 deletions

View file

@ -1,3 +1,13 @@
2017-09-14 Mike FABIAN <mfabian@redhat.com>
[BZ #21084]
* benchtests/strcoll-inputs/filelist#en_US.UTF-8: Add IBM858 and ibm858.c.
* iconvdata/Makefile: Add IBM858.
* iconvdata/gconv-modules: Likewise.
* iconvdata/tst-tables.sh: Likewise.
* iconvdata/ibm858.c: New file.
* localedata/charmaps/IBM858: Likewise.
2017-09-14 Akhilesh Kumar <akhilesh.k@samsung.com>
[BZ #22023]

View file

@ -11232,6 +11232,7 @@ ISO-8859-9E
UTF-8
ISO-8859-2
IBM850
IBM858
EUC-TW
KOI8-U
IBM903
@ -13920,6 +13921,7 @@ ibm12712.c
ibm1145.h
ibm932.c
ibm850.c
ibm858.c
ibm437.c
ibm1399.c
stdio-common

View file

@ -36,9 +36,9 @@ modules := ISO8859-1 ISO8859-2 ISO8859-3 ISO8859-4 ISO8859-5 \
IBM874 CP737 CP775 ISO-2022-KR HP-TURKISH8 HP-THAI8 HP-GREEK8 \
KOI8-R LATIN-GREEK LATIN-GREEK-1 IBM256 IBM273 IBM277 IBM278 \
IBM280 IBM281 IBM284 IBM285 IBM290 IBM297 IBM420 IBM424 \
IBM437 IBM850 IBM851 IBM852 IBM855 IBM857 IBM860 IBM861 \
IBM862 IBM863 IBM864 IBM865 IBM868 IBM869 IBM875 IBM880 \
IBM866 CP1258 IBM922 IBM1124 IBM1129 IBM932 IBM943 \
IBM437 IBM850 IBM851 IBM852 IBM855 IBM857 IBM858 IBM860 \
IBM861 IBM862 IBM863 IBM864 IBM865 IBM868 IBM869 IBM875 \
IBM880 IBM866 CP1258 IBM922 IBM1124 IBM1129 IBM932 IBM943 \
IBM856 IBM930 IBM933 IBM935 IBM937 IBM939 IBM1046 \
IBM1132 IBM1133 IBM1160 IBM1161 IBM1162 IBM1163 IBM1164 \
IBM918 IBM1004 IBM1026 CP1125 CP1250 CP1251 CP1252 CP1253 \
@ -153,11 +153,11 @@ gen-8bit-modules := iso8859-2 iso8859-3 iso8859-4 iso8859-6 iso8859-9 koi-8 \
gen-8bit-gap-modules := koi8-r latin-greek latin-greek-1 ibm256 ibm273 \
ibm277 ibm278 ibm280 ibm281 ibm284 ibm285 ibm290 \
ibm297 ibm420 ibm424 ibm437 ibm850 ibm851 ibm852 \
ibm855 ibm857 ibm860 ibm861 ibm862 ibm863 ibm864 \
ibm865 ibm868 ibm869 ibm875 ibm880 ibm918 ibm1004 \
ibm1026 cp1125 cp1250 cp1251 cp1252 cp1253 cp1254 \
cp1256 cp1257 ibm866 iso8859-5 iso8859-7 iso8859-8 \
iso8859-10 macintosh iec_p27-1 asmo_449 \
ibm855 ibm857 ibm858 ibm860 ibm861 ibm862 ibm863 \
ibm864 ibm865 ibm868 ibm869 ibm875 ibm880 ibm918 \
ibm1004 ibm1026 cp1125 cp1250 cp1251 cp1252 cp1253 \
cp1254 cp1256 cp1257 ibm866 iso8859-5 iso8859-7 \
iso8859-8 iso8859-10 macintosh iec_p27-1 asmo_449 \
csn_369103 cwi dec-mcs ecma-cyrillic gost_19768-74 \
greek-ccitt greek7 greek7-old inis inis-8 \
inis-cyrillic iso_2033 iso_5427 iso_5427-ext \

View file

@ -743,6 +743,13 @@ alias OSF10020352// IBM850//
module IBM850// INTERNAL IBM850 1
module INTERNAL IBM850// IBM850 1
# from to module cost
alias CP858// IBM858//
alias 858// IBM858//
alias CSPC858MULTILINGUAL// IBM858//
module IBM858// INTERNAL IBM858 1
module INTERNAL IBM858// IBM858 1
# from to module cost
alias CP851// IBM851//
alias 851// IBM851//

27
iconvdata/ibm858.c Normal file
View file

@ -0,0 +1,27 @@
/* Conversion from and to IBM858.
Copyright (C) 2017 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
#include <stdint.h>
/* Get the conversion table. */
#define TABLES <ibm858.h>
#define CHARSET_NAME "IBM858//"
#define HAS_HOLES 1 /* Not all 256 character are defined. */
#include <8bit-gap.c>

View file

@ -125,6 +125,7 @@ cat <<EOF |
IBM855
IBM856
IBM857
IBM858
IBM860
IBM861
IBM862

281
localedata/charmaps/IBM858 Normal file
View file

@ -0,0 +1,281 @@
<code_set_name> IBM858
<comment_char> %
<escape_char> /
% version: 1.0
% source: ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00858.txt, 1998
% source: UNICODE 1.0
% This code page is identical to code page 850 except that X'D5'
% has been changed from LI61 (dotless i) to SC20 (euro symbol).
% The code points from /x01 to /x1f in this file have the same mapping
% as those in ANSI_X3.4-1968. That means they disagree with with CP00858.txt
% in that range. For example, this file and ANSI_X3.4-1968 have:
% “<U0001> /x01 START OF HEADING (SOH)”
% whereas CP00858.txt has:
% “01 SS000000 Smiling Face”
% That means that CP00858.txt is not really ASCII-compatible and to make
% it ASCII-compatible we deviate fro CP00858.txt in the code points from /x01
% to /x1f.
% alias CP858
% alias 858
CHARMAP
<U0000> /x00 NULL (NUL)
<U0001> /x01 START OF HEADING (SOH)
<U0002> /x02 START OF TEXT (STX)
<U0003> /x03 END OF TEXT (ETX)
<U0004> /x04 END OF TRANSMISSION (EOT)
<U0005> /x05 ENQUIRY (ENQ)
<U0006> /x06 ACKNOWLEDGE (ACK)
<U0007> /x07 BELL (BEL)
<U0008> /x08 BACKSPACE (BS)
<U0009> /x09 CHARACTER TABULATION (HT)
<U000A> /x0a LINE FEED (LF)
<U000B> /x0b LINE TABULATION (VT)
<U000C> /x0c FORM FEED (FF)
<U000D> /x0d CARRIAGE RETURN (CR)
<U000E> /x0e SHIFT OUT (SO)
<U000F> /x0f SHIFT IN (SI)
<U0010> /x10 DATALINK ESCAPE (DLE)
<U0011> /x11 DEVICE CONTROL ONE (DC1)
<U0012> /x12 DEVICE CONTROL TWO (DC2)
<U0013> /x13 DEVICE CONTROL THREE (DC3)
<U0014> /x14 DEVICE CONTROL FOUR (DC4)
<U0015> /x15 NEGATIVE ACKNOWLEDGE (NAK)
<U0016> /x16 SYNCHRONOUS IDLE (SYN)
<U0017> /x17 END OF TRANSMISSION BLOCK (ETB)
<U0018> /x18 CANCEL (CAN)
<U0019> /x19 END OF MEDIUM (EM)
<U001A> /x1a SUBSTITUTE (SUB)
<U001B> /x1b ESCAPE (ESC)
<U001C> /x1c FILE SEPARATOR (IS4)
<U001D> /x1d GROUP SEPARATOR (IS3)
<U001E> /x1e RECORD SEPARATOR (IS2)
<U001F> /x1f UNIT SEPARATOR (IS1)
<U0020> /x20 SPACE
<U0021> /x21 EXCLAMATION MARK
<U0022> /x22 QUOTATION MARK
<U0023> /x23 NUMBER SIGN
<U0024> /x24 DOLLAR SIGN
<U0025> /x25 PERCENT SIGN
<U0026> /x26 AMPERSAND
<U0027> /x27 APOSTROPHE
<U0028> /x28 LEFT PARENTHESIS
<U0029> /x29 RIGHT PARENTHESIS
<U002A> /x2a ASTERISK
<U002B> /x2b PLUS SIGN
<U002C> /x2c COMMA
<U002D> /x2d HYPHEN-MINUS
<U002E> /x2e FULL STOP
<U002F> /x2f SOLIDUS
<U0030> /x30 DIGIT ZERO
<U0031> /x31 DIGIT ONE
<U0032> /x32 DIGIT TWO
<U0033> /x33 DIGIT THREE
<U0034> /x34 DIGIT FOUR
<U0035> /x35 DIGIT FIVE
<U0036> /x36 DIGIT SIX
<U0037> /x37 DIGIT SEVEN
<U0038> /x38 DIGIT EIGHT
<U0039> /x39 DIGIT NINE
<U003A> /x3a COLON
<U003B> /x3b SEMICOLON
<U003C> /x3c LESS-THAN SIGN
<U003D> /x3d EQUALS SIGN
<U003E> /x3e GREATER-THAN SIGN
<U003F> /x3f QUESTION MARK
<U0040> /x40 COMMERCIAL AT
<U0041> /x41 LATIN CAPITAL LETTER A
<U0042> /x42 LATIN CAPITAL LETTER B
<U0043> /x43 LATIN CAPITAL LETTER C
<U0044> /x44 LATIN CAPITAL LETTER D
<U0045> /x45 LATIN CAPITAL LETTER E
<U0046> /x46 LATIN CAPITAL LETTER F
<U0047> /x47 LATIN CAPITAL LETTER G
<U0048> /x48 LATIN CAPITAL LETTER H
<U0049> /x49 LATIN CAPITAL LETTER I
<U004A> /x4a LATIN CAPITAL LETTER J
<U004B> /x4b LATIN CAPITAL LETTER K
<U004C> /x4c LATIN CAPITAL LETTER L
<U004D> /x4d LATIN CAPITAL LETTER M
<U004E> /x4e LATIN CAPITAL LETTER N
<U004F> /x4f LATIN CAPITAL LETTER O
<U0050> /x50 LATIN CAPITAL LETTER P
<U0051> /x51 LATIN CAPITAL LETTER Q
<U0052> /x52 LATIN CAPITAL LETTER R
<U0053> /x53 LATIN CAPITAL LETTER S
<U0054> /x54 LATIN CAPITAL LETTER T
<U0055> /x55 LATIN CAPITAL LETTER U
<U0056> /x56 LATIN CAPITAL LETTER V
<U0057> /x57 LATIN CAPITAL LETTER W
<U0058> /x58 LATIN CAPITAL LETTER X
<U0059> /x59 LATIN CAPITAL LETTER Y
<U005A> /x5a LATIN CAPITAL LETTER Z
<U005B> /x5b LEFT SQUARE BRACKET
<U005C> /x5c REVERSE SOLIDUS
<U005D> /x5d RIGHT SQUARE BRACKET
<U005E> /x5e CIRCUMFLEX ACCENT
<U005F> /x5f LOW LINE
<U0060> /x60 GRAVE ACCENT
<U0061> /x61 LATIN SMALL LETTER A
<U0062> /x62 LATIN SMALL LETTER B
<U0063> /x63 LATIN SMALL LETTER C
<U0064> /x64 LATIN SMALL LETTER D
<U0065> /x65 LATIN SMALL LETTER E
<U0066> /x66 LATIN SMALL LETTER F
<U0067> /x67 LATIN SMALL LETTER G
<U0068> /x68 LATIN SMALL LETTER H
<U0069> /x69 LATIN SMALL LETTER I
<U006A> /x6a LATIN SMALL LETTER J
<U006B> /x6b LATIN SMALL LETTER K
<U006C> /x6c LATIN SMALL LETTER L
<U006D> /x6d LATIN SMALL LETTER M
<U006E> /x6e LATIN SMALL LETTER N
<U006F> /x6f LATIN SMALL LETTER O
<U0070> /x70 LATIN SMALL LETTER P
<U0071> /x71 LATIN SMALL LETTER Q
<U0072> /x72 LATIN SMALL LETTER R
<U0073> /x73 LATIN SMALL LETTER S
<U0074> /x74 LATIN SMALL LETTER T
<U0075> /x75 LATIN SMALL LETTER U
<U0076> /x76 LATIN SMALL LETTER V
<U0077> /x77 LATIN SMALL LETTER W
<U0078> /x78 LATIN SMALL LETTER X
<U0079> /x79 LATIN SMALL LETTER Y
<U007A> /x7a LATIN SMALL LETTER Z
<U007B> /x7b LEFT CURLY BRACKET
<U007C> /x7c VERTICAL LINE
<U007D> /x7d RIGHT CURLY BRACKET
<U007E> /x7e TILDE
<U007F> /x7f DELETE (DEL)
<U00C7> /x80 LATIN CAPITAL LETTER C WITH CEDILLA
<U00FC> /x81 LATIN SMALL LETTER U WITH DIAERESIS
<U00E9> /x82 LATIN SMALL LETTER E WITH ACUTE
<U00E2> /x83 LATIN SMALL LETTER A WITH CIRCUMFLEX
<U00E4> /x84 LATIN SMALL LETTER A WITH DIAERESIS
<U00E0> /x85 LATIN SMALL LETTER A WITH GRAVE
<U00E5> /x86 LATIN SMALL LETTER A WITH RING ABOVE
<U00E7> /x87 LATIN SMALL LETTER C WITH CEDILLA
<U00EA> /x88 LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EB> /x89 LATIN SMALL LETTER E WITH DIAERESIS
<U00E8> /x8a LATIN SMALL LETTER E WITH GRAVE
<U00EF> /x8b LATIN SMALL LETTER I WITH DIAERESIS
<U00EE> /x8c LATIN SMALL LETTER I WITH CIRCUMFLEX
<U00EC> /x8d LATIN SMALL LETTER I WITH GRAVE
<U00C4> /x8e LATIN CAPITAL LETTER A WITH DIAERESIS
<U00C5> /x8f LATIN CAPITAL LETTER A WITH RING ABOVE
<U00C9> /x90 LATIN CAPITAL LETTER E WITH ACUTE
<U00E6> /x91 LATIN SMALL LETTER AE
<U00C6> /x92 LATIN CAPITAL LETTER AE
<U00F4> /x93 LATIN SMALL LETTER O WITH CIRCUMFLEX
<U00F6> /x94 LATIN SMALL LETTER O WITH DIAERESIS
<U00F2> /x95 LATIN SMALL LETTER O WITH GRAVE
<U00FB> /x96 LATIN SMALL LETTER U WITH CIRCUMFLEX
<U00F9> /x97 LATIN SMALL LETTER U WITH GRAVE
<U00FF> /x98 LATIN SMALL LETTER Y WITH DIAERESIS
<U00D6> /x99 LATIN CAPITAL LETTER O WITH DIAERESIS
<U00DC> /x9a LATIN CAPITAL LETTER U WITH DIAERESIS
<U00F8> /x9b LATIN SMALL LETTER O WITH STROKE
<U00A3> /x9c POUND SIGN
<U00D8> /x9d LATIN CAPITAL LETTER O WITH STROKE
<U00D7> /x9e MULTIPLICATION SIGN
<U0192> /x9f LATIN SMALL LETTER F WITH HOOK
<U00E1> /xa0 LATIN SMALL LETTER A WITH ACUTE
<U00ED> /xa1 LATIN SMALL LETTER I WITH ACUTE
<U00F3> /xa2 LATIN SMALL LETTER O WITH ACUTE
<U00FA> /xa3 LATIN SMALL LETTER U WITH ACUTE
<U00F1> /xa4 LATIN SMALL LETTER N WITH TILDE
<U00D1> /xa5 LATIN CAPITAL LETTER N WITH TILDE
<U00AA> /xa6 FEMININE ORDINAL INDICATOR
<U00BA> /xa7 MASCULINE ORDINAL INDICATOR
<U00BF> /xa8 INVERTED QUESTION MARK
<U00AE> /xa9 REGISTERED SIGN
<U00AC> /xaa NOT SIGN
<U00BD> /xab VULGAR FRACTION ONE HALF
<U00BC> /xac VULGAR FRACTION ONE QUARTER
<U00A1> /xad INVERTED EXCLAMATION MARK
<U00AB> /xae LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
<U00BB> /xaf RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
<U2591> /xb0 LIGHT SHADE
<U2592> /xb1 MEDIUM SHADE
<U2593> /xb2 DARK SHADE
<U2502> /xb3 BOX DRAWINGS LIGHT VERTICAL
<U2524> /xb4 BOX DRAWINGS LIGHT VERTICAL AND LEFT
<U00C1> /xb5 LATIN CAPITAL LETTER A WITH ACUTE
<U00C2> /xb6 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
<U00C0> /xb7 LATIN CAPITAL LETTER A WITH GRAVE
<U00A9> /xb8 COPYRIGHT SIGN
<U2563> /xb9 BOX DRAWINGS DOUBLE VERTICAL AND LEFT
<U2551> /xba BOX DRAWINGS DOUBLE VERTICAL
<U2557> /xbb BOX DRAWINGS DOUBLE DOWN AND LEFT
<U255D> /xbc BOX DRAWINGS DOUBLE UP AND LEFT
<U00A2> /xbd CENT SIGN
<U00A5> /xbe YEN SIGN
<U2510> /xbf BOX DRAWINGS LIGHT DOWN AND LEFT
<U2514> /xc0 BOX DRAWINGS LIGHT UP AND RIGHT
<U2534> /xc1 BOX DRAWINGS LIGHT UP AND HORIZONTAL
<U252C> /xc2 BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
<U251C> /xc3 BOX DRAWINGS LIGHT VERTICAL AND RIGHT
<U2500> /xc4 BOX DRAWINGS LIGHT HORIZONTAL
<U253C> /xc5 BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
<U00E3> /xc6 LATIN SMALL LETTER A WITH TILDE
<U00C3> /xc7 LATIN CAPITAL LETTER A WITH TILDE
<U255A> /xc8 BOX DRAWINGS DOUBLE UP AND RIGHT
<U2554> /xc9 BOX DRAWINGS DOUBLE DOWN AND RIGHT
<U2569> /xca BOX DRAWINGS DOUBLE UP AND HORIZONTAL
<U2566> /xcb BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
<U2560> /xcc BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
<U2550> /xcd BOX DRAWINGS DOUBLE HORIZONTAL
<U256C> /xce BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
<U00A4> /xcf CURRENCY SIGN
<U00F0> /xd0 LATIN SMALL LETTER ETH (Icelandic)
<U00D0> /xd1 LATIN CAPITAL LETTER ETH (Icelandic)
<U00CA> /xd2 LATIN CAPITAL LETTER E WITH CIRCUMFLEX
<U00CB> /xd3 LATIN CAPITAL LETTER E WITH DIAERESIS
<U00C8> /xd4 LATIN CAPITAL LETTER E WITH GRAVE
<U20AC> /xd5 EURO SIGN
<U00CD> /xd6 LATIN CAPITAL LETTER I WITH ACUTE
<U00CE> /xd7 LATIN CAPITAL LETTER I WITH CIRCUMFLEX
<U00CF> /xd8 LATIN CAPITAL LETTER I WITH DIAERESIS
<U2518> /xd9 BOX DRAWINGS LIGHT UP AND LEFT
<U250C> /xda BOX DRAWINGS LIGHT DOWN AND RIGHT
<U2588> /xdb FULL BLOCK
<U2584> /xdc LOWER HALF BLOCK
<U00A6> /xdd BROKEN BAR
<U00CC> /xde LATIN CAPITAL LETTER I WITH GRAVE
<U2580> /xdf UPPER HALF BLOCK
<U00D3> /xe0 LATIN CAPITAL LETTER O WITH ACUTE
<U00DF> /xe1 LATIN SMALL LETTER SHARP S (German)
<U00D4> /xe2 LATIN CAPITAL LETTER O WITH CIRCUMFLEX
<U00D2> /xe3 LATIN CAPITAL LETTER O WITH GRAVE
<U00F5> /xe4 LATIN SMALL LETTER O WITH TILDE
<U00D5> /xe5 LATIN CAPITAL LETTER O WITH TILDE
<U00B5> /xe6 MICRO SIGN
<U00FE> /xe7 LATIN SMALL LETTER THORN (Icelandic)
<U00DE> /xe8 LATIN CAPITAL LETTER THORN (Icelandic)
<U00DA> /xe9 LATIN CAPITAL LETTER U WITH ACUTE
<U00DB> /xea LATIN CAPITAL LETTER U WITH CIRCUMFLEX
<U00D9> /xeb LATIN CAPITAL LETTER U WITH GRAVE
<U00FD> /xec LATIN SMALL LETTER Y WITH ACUTE
<U00DD> /xed LATIN CAPITAL LETTER Y WITH ACUTE
<U00AF> /xee MACRON
<U00B4> /xef ACUTE ACCENT
<U00AD> /xf0 SOFT HYPHEN
<U00B1> /xf1 PLUS-MINUS SIGN
<U2017> /xf2 DOUBLE LOW LINE
<U00BE> /xf3 VULGAR FRACTION THREE QUARTERS
<U00B6> /xf4 PILCROW SIGN
<U00A7> /xf5 SECTION SIGN
<U00F7> /xf6 DIVISION SIGN
<U00B8> /xf7 CEDILLA
<U00B0> /xf8 DEGREE SIGN
<U00A8> /xf9 DIAERESIS
<U00B7> /xfa MIDDLE DOT
<U00B9> /xfb SUPERSCRIPT ONE
<U00B3> /xfc SUPERSCRIPT THREE
<U00B2> /xfd SUPERSCRIPT TWO
<U25A0> /xfe BLACK SQUARE
<U00A0> /xff NO-BREAK SPACE
END CHARMAP