From 5536f7440f2f4a12782e8d741cbbba5f1c3cfea8 Mon Sep 17 00:00:00 2001 From: Archaic Date: Mon, 26 Dec 2005 19:00:06 +0000 Subject: Applied Alexander Patrakov's patch which adds UTF-8 capability to the development branch of the LFS Book. git-svn-id: http://svn.linuxfromscratch.org/LFS/trunk/BOOK@7235 4aa44e1e-78dd-0310-a6d2-fbcd4c07a689 --- chapter07/bootscripts.xml | 6 ++ chapter07/console.xml | 261 +++++++++++++++++++++++++++++++++------------- chapter07/profile.xml | 53 +++++++--- 3 files changed, 230 insertions(+), 90 deletions(-) (limited to 'chapter07') diff --git a/chapter07/bootscripts.xml b/chapter07/bootscripts.xml index 775215e7e..6e884ac77 100644 --- a/chapter07/bootscripts.xml +++ b/chapter07/bootscripts.xml @@ -47,6 +47,12 @@ make install + The console script that comes with + LFS-Bootscripts-&lfs-bootscripts-version; doesn't support Unicode. Install + a replacement version: + +install -m755 ../console /etc/rc.d/init.d + diff --git a/chapter07/console.xml b/chapter07/console.xml index 315112366..b0b9417a3 100644 --- a/chapter07/console.xml +++ b/chapter07/console.xml @@ -17,96 +17,207 @@ This section discusses how to configure the console bootscript that sets up the keyboard map and the console font. If non-ASCII - characters (e.g., the British pound sign and Euro character) will not be used - and the keyboard is a U.S. one, skip this section. Without the configuration - file, the console bootscript will do nothing. + characters (e.g., the copyright sign, the British pound sign and Euro symbol) + will not be used and the keyboard is a U.S. one, skip this section. Without + the configuration file, the console bootscript will do + nothing. The console script reads the /etc/sysconfig/console file for configuration information. Decide which keymap and screen font will be used. Various language-specific - HOWTO's can also help with this (see . A pre-made - /etc/sysconfig/console file with known settings for several - countries was installed with the LFS-Bootscripts package, so the relevant - section can be uncommented if the country is supported. If still in doubt, look - in the /usr/share/kbd directory for valid - keymaps and screen fonts. Read loadkeys(1) and - setfont(8) to determine the correct arguments for - these programs. Once decided, create the configuration file with the following - command: - -cat >/etc/sysconfig/console <<"EOF" -KEYMAP="[arguments for loadkeys]" -FONT="[arguments for setfont]" + HOWTO's can also help with this, see . If still in + doubt, look in the /usr/share/kbd + directory for valid keymaps and screen fonts. Read + loadkeys(1) and setfont(8) manual + pages to determine the correct arguments for these programs. + + The /etc/sysconfig/console file should contain lines + of the form: VARIABLE="value". The following variables are recognized: + + + + + KEYMAP + + This variable specifies the arguments for the + loadkeys program, typically, the name of keymap + to load, e.g. "es". If this variable is not set, the bootscript will + not run the loadkeys program, and the default kernel + keymap will be used. + + + + + KEYMAP_CORRECTIONS + + This (rarely used) variable + specifies the arguments for the second call to the + loadkeys program. This is useful if the stock keymap + is not completely satisfactory and a small adjustment has to be made. E.g., + to include the Euro sign into a keymap that normally doesn't have it, + set this variable to "euro2". + + + + + FONT + + This variable specifies the arguments for the + setfont program. Typically, this includes the font + name, "-m", and the name of the application character map to load. + E.g., in order to load the "lat1-16" font together with the "8859-1" + application character map, set this variable to "lat1-16 -m 8859-1". + If this variable is not set, the bootscript will not run the + setfont program, and the default VGA font will be + used together with the default application character map. + + + + + UNICODE + + Set this variable to "1", "yes" or "true" in order to put the + console into UTF-8 mode. This is useful in UTF-8 based locales and + harmful otherwise. + + + + + LEGACY_CHARSET + + For many keyboard layouts, there is no stock Unicode keymap in + the Kbd package. The console bootscript will + convert an available keymap to UTF-8 on the fly if this variable is + set to the encoding of the available non-UTF-8 keymap. Note, however, + that dead keys and composing will not work in UTF-8 mode without the + special kernel patch. + + + + + BROKEN_COMPOSE + + Set this to "0" if you are going to apply that kernel patch in + Chapter 8. Note that you also have to add the character set expected + by composition rules in your keymap to the FONT variable after the + "-m" switch. + + + + + + Support for compiling the keymap directly into the kernel has been + removed because there were reports that it leads to incorrect results. + + Some examples: + + + + + For a non-Unicode setup, only the KEYMAP and FONT variables are + generally needed. E.g., for a Polish setup, one would use: + +cat > /etc/sysconfig/console << "EOF" +# Begin /etc/sysconfig/console + +KEYMAP="pl2" +FONT="lat2a-16 -m 8859-2" + +# End /etc/sysconfig/console EOF + - For example, for Spanish users who also want to use the Euro - character (accessible by pressing AltGr+E), the following settings are - correct: + + As mentioned above, it is sometimes necessary to adjust a + stock keymap slightly. The following example adds the Euro symbol to the + German keymap: -cat >/etc/sysconfig/console <<"EOF" -KEYMAP="es euro2" -FONT="lat9-16 -u iso01" +cat > /etc/sysconfig/console << "EOF" +# Begin /etc/sysconfig/console + +KEYMAP="de-latin1" +KEYMAP_CORRECTIONS="euro2" +FONT="lat0-16 -m 8859-15" + +# End /etc/sysconfig/console EOF + - - The FONT line above is correct only for the ISO 8859-15 - character set. If using ISO 8859-1 and, therefore, a pound sign - instead of Euro, the correct FONT line would be: + + Here is a Unicode-enabled example for Bulgarian, where a stock + UTF-8 keymap exists and defines no dead keys or composition rules: -FONT="lat1-16" - +cat > /etc/sysconfig/console << "EOF" +# Begin /etc/sysconfig/console + +UNICODE="1" +KEYMAP="bg_bds-utf8" +FONT="LatArCyrHeb-16" - If the KEYMAP or FONT variable is not set, - the console initscript will not run the corresponding - program. - - In some keymaps, the Backspace and Delete keys send characters different - from ones in the default keymap built into the kernel. This confuses some - applications. For example, Emacs displays its help (instead of erasing the - character before the cursor) when Backspace is pressed. To check if the keymap - in use is affected (this works only for i386 keymaps): - -zgrep '\W14\W' [/path/to/your/keymap] - - If the keycode 14 is Backspace instead of Delete, create the - following keymap snippet to fix this issue: - -mkdir -pv /etc/kbd && cat > /etc/kbd/bs-sends-del <<"EOF" - keycode 14 = Delete Delete Delete Delete - alt keycode 14 = Meta_Delete - altgr alt keycode 14 = Meta_Delete - keycode 111 = Remove - altgr control keycode 111 = Boot - control alt keycode 111 = Boot -altgr control alt keycode 111 = Boot +# End /etc/sysconfig/console +EOF + + + + Due to the use of a 512-glyph LatArCyrHeb-16 font in the previous + example, bright colors are no longer available on the Linux console unless + a framebuffer is used. If one wants to have bright colors without + framebuffer and can live without characters not belonging to his language, + it is still possible to use a language-specific 256-glyph font, as + illustrated below. This would, however, also break single quotes in manual + pages. + + + +cat > /etc/sysconfig/console << "EOF" +# Begin /etc/sysconfig/console + +UNICODE="1" +KEYMAP="bg_bds-utf8" +FONT="cyr-sun16" + +# End /etc/sysconfig/console EOF + - Tell the console script to load this - snippet after the main keymap: + + The following example illustrates keymap autoconversion from + ISO-8859-15 to UTF-8 and enabling dead keys in Unicode mode: -cat >>/etc/sysconfig/console <<"EOF" -KEYMAP_CORRECTIONS="/etc/kbd/bs-sends-del" +cat > /etc/sysconfig/console << "EOF" +# Begin /etc/sysconfig/console + +UNICODE="1" +KEYMAP="de-latin1" +KEYMAP_CORRECTIONS="euro2" +LEGACY_CHARSET="iso-8859-15" +BROKEN_COMPOSE="0" +FONT="LatArCyrHeb-16 -m 8859-15" + +# End /etc/sysconfig/console EOF + + + + For Chinese, Japanese, Korean and some other languages, the Linux + console cannot be configured to display the needed characters. Users + who need such languages should install the X Window System, fonts that + cover the necessary character ranges, and the proper input Method (e.g. + SCIM, it supports a wide variety of languages). + - To compile the keymap directly into the kernel instead of - setting it every time from the console bootscript, - follow the instructions given in - Doing this ensures that the keyboard will always work as expected, - even when booting into maintenance mode (by passing - init=/bin/sh to the kernel), because the - console bootscript will not be run in that - situation. Additionally, the kernel will not set the screen font - automatically. This should not pose many problems because ASCII characters - will be handled correctly, and it is unlikely that a user would need - to rely on non-ASCII characters while in maintenance mode. - - Since the kernel will set up the keymap, it is possible to omit - the KEYMAP variable from the - /etc/sysconfig/console configuration file. It can - also be left in place, if desired, without consequence. Keeping it - could be beneficial if running several different kernels where it is - difficult to ensure that the keymap is compiled into every one of - them. + + + + + The /etc/sysconfig/console file only controls + Linux text console localization. It has nothing to do with setting the proper + keyboard layout and terminal fonts in X Window System. + diff --git a/chapter07/profile.xml b/chapter07/profile.xml index dd53a5141..ae7617ba7 100644 --- a/chapter07/profile.xml +++ b/chapter07/profile.xml @@ -69,17 +69,19 @@ for the desired language (e.g., en) and [CC] with the two-letter code for the appropriate country (e.g., GB). [charmap] should - be replaced with the canonical charmap for your chosen locale. + be replaced with the canonical charmap for your chosen locale. Optional + modifiers such as @euro may also be present. The list of all locales supported by Glibc can be obtained by running the following command: locale -a - Locales can have a number of synonyms, e.g. ISO-8859-1 + Charmaps can have a number of aliases, e.g. ISO-8859-1 is also referred to as iso8859-1 and iso88591. - Some applications cannot handle the various synonyms correctly, so it is - safest to choose the canonical name for a particular locale. To determine + Some applications cannot handle the various synonyms correctly (e.g. require + that "UTF-8" is written as "UTF-8", not "utf8"), so it is safest in most + cases to choose the canonical name for a particular locale. To determine the canonical name, run the following command, where [locale name] is the output given by locale -a for your preferred locale (en_GB.iso88591 in our example). @@ -115,6 +117,7 @@ LC_ALL=[locale name] locale int_prefix Further instructions assume that there are no such error messages from Glibc. + Some packages beyond LFS may also lack support for your chosen locale. One example is the X library (part of the X Window System), which outputs the following error message: @@ -139,23 +142,43 @@ LC_ALL=[locale name] locale int_prefix cat > /etc/profile << "EOF" # Begin /etc/profile -export LANG=[ll]_[CC].[charmap] +export LANG=[ll]_[CC].[charmap][@modifiers] export INPUTRC=/etc/inputrc # End /etc/profile EOF + The C (default) and en_US (the recommended + one for United States English users) locales are different. C + uses the US-ASCII 7-bit character set, and treats bytes with the high bit set + as invalid characters. That's why, e.g., the ls command + substitutes them with question marks in that locale. Also, an attempt to send + mail with such characters from Mutt or Pine results in non-RFC-conforming + messages being set (the charset in the outgoing mail is indicatsed as "unknown + 8-bit"). So you can use the C locale only if you are sure that + you will never need 8-bit characters. + + UTF-8 based locales are not supported well by many programs. E.g., the + watch program displays only ASCII characters in UTF-8 + locales and has no such restriction in traditional 8-bit locales like en_US. + Without patches and/or installing software beyond BLFS, in UTF-8 based locales + you will not be able to do such basic tasks as printing plain-text files from + the command line, recording Windows-readable CDs with filenames containing + non-ASCII characters, viewing ID3v1 tags in MP3 files and so on. It is also + impossible (without damaging non-ASCII characters) to connect using ssh from + the system using a UTF-8 based locale to a host that still uses a traditional + 8-bit locale, and vice versa. In short, use UTF-8 only if you are going to + use KDE or GNOME and never open the terminal, or if you are going to tolerate + bugs. + + - The C (default) and en_US (the - recommended one for United States English users) locales are different. + Bug reports reproducible only in UTF-8 locales and for which there + is no patch or other fix mentioned in the report, will be closed immediately, + without investigation, with the "WONTFIX" resolution and a "don't use this + program or revert to non-UTF-8 locale" comment. Patches that have ill + effects in non-UTF-8 locales (other than replacement of translated program + messages with English ones) will be rejected. - Setting the keyboard layout, screen font, and locale-related environment - variables are the only internationalization steps needed to support locales - that use ordinary single-byte encodings and left-to-right writing direction. - More complex cases (including UTF-8 based locales) require additional steps - and additional patches because many applications tend to not work properly - under such conditions. These steps and patches are not included in the LFS - book and such locales are not yet supported by LFS. - -- cgit v1.2.3-54-g00ecf