diff options
author | Jeremy Huntwork <jhuntwork@linuxfromscratch.org> | 2006-01-06 01:59:08 +0000 |
---|---|---|
committer | Jeremy Huntwork <jhuntwork@linuxfromscratch.org> | 2006-01-06 01:59:08 +0000 |
commit | fa21b3dc894b9964620968dfae6685d69ce89fb9 (patch) | |
tree | 2353b9af8cae48156b98e651873d483e994e352a /chapter07 | |
parent | 60e34b52810dd47567ca18aa2c86fe4cd7c9fd01 (diff) |
Initial support of UTF-8. Thanks Alexander Patrakov.
git-svn-id: http://svn.linuxfromscratch.org/LFS/trunk/BOOK@7245 4aa44e1e-78dd-0310-a6d2-fbcd4c07a689
Diffstat (limited to 'chapter07')
-rw-r--r-- | chapter07/console.xml | 263 | ||||
-rw-r--r-- | chapter07/profile.xml | 48 |
2 files changed, 219 insertions, 92 deletions
diff --git a/chapter07/console.xml b/chapter07/console.xml index 315112366..ee34edcb9 100644 --- a/chapter07/console.xml +++ b/chapter07/console.xml @@ -17,96 +17,209 @@ <para>This section discusses how to configure the <command>console</command> bootscript that sets up the keyboard map and the console font. If non-ASCII - characters (e.g., the British pound sign and Euro character) will not be used - and the keyboard is a U.S. one, skip this section. Without the configuration - file, the <command>console</command> bootscript will do nothing.</para> + characters (e.g., the copyright sign, the British pound sign and Euro symbol) + will not be used and the keyboard is a U.S. one, skip this section. Without + the configuration file, the <command>console</command> bootscript will do + nothing.</para> <para>The <command>console</command> script reads the <filename>/etc/sysconfig/console</filename> file for configuration information. Decide which keymap and screen font will be used. Various language-specific - HOWTO's can also help with this (see <ulink - url="http://www.tldp.org/HOWTO/HOWTO-INDEX/other-lang.html"/>. A pre-made - <filename>/etc/sysconfig/console</filename> file with known settings for several - countries was installed with the LFS-Bootscripts package, so the relevant - section can be uncommented if the country is supported. If still in doubt, look - in the <filename class="directory">/usr/share/kbd</filename> directory for valid - keymaps and screen fonts. Read <filename>loadkeys(1)</filename> and - <filename>setfont(8)</filename> to determine the correct arguments for - these programs. Once decided, create the configuration file with the following - command:</para> - -<screen><userinput>cat >/etc/sysconfig/console <<"EOF" -<literal>KEYMAP="<replaceable>[arguments for loadkeys]</replaceable>" -FONT="<replaceable>[arguments for setfont]</replaceable>"</literal> + HOWTOs can also help with this, see <ulink + url="http://www.tldp.org/HOWTO/HOWTO-INDEX/other-lang.html"/>. If still in + doubt, look in the <filename class="directory">/usr/share/kbd</filename> + directory for valid keymaps and screen fonts. Read + <filename>loadkeys(1)</filename> and <filename>setfont(8)</filename> manual + pages to determine the correct arguments for these programs.</para> + + <para>The <filename>/etc/sysconfig/console</filename> file should contain lines + of the form: VARIABLE="value". The following variables are recognized:</para> + + <variablelist> + + <varlistentry> + <term>KEYMAP</term> + <listitem> + <para>This variable specifies the arguments for the + <command>loadkeys</command> program, typically, the name of keymap + to load, e.g., <quote>es</quote>. If this variable is not set, the + bootscript will not run the <command>loadkeys</command> program, + and the default kernel keymap will be used.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>KEYMAP_CORRECTIONS</term> + <listitem> + <para>This (rarely used) variable + specifies the arguments for the second call to the + <command>loadkeys</command> program. This is useful if the stock keymap + is not completely satisfactory and a small adjustment has to be made. E.g., + to include the Euro sign into a keymap that normally doesn't have it, + set this variable to <quote>euro2</quote>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>FONT</term> + <listitem> + <para>This variable specifies the arguments for the + <command>setfont</command> program. Typically, this includes the font + name, <quote>-m</quote>, and the name of the application character + map to load. E.g., in order to load the <quote>lat1-16</quote> font + together with the <quote>8859-1</quote> application character map + (as it is appropriate in the USA), <!-- because of the copyright sign --> + set this variable to <quote>lat1-16 -m 8859-1</quote>. + If this variable is not set, the bootscript will not run the + <command>setfont</command> program, and the default VGA font will be + used together with the default application character map.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>UNICODE</term> + <listitem> + <para>Set this variable to <quote>1</quote>, <quote>yes</quote> or + <quote>true</quote> in order to put the + console into UTF-8 mode. This is useful in UTF-8 based locales and + harmful otherwise.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>LEGACY_CHARSET</term> + <listitem> + <para>For many keyboard layouts, there is no stock Unicode keymap in + the Kbd package. The <command>console</command> bootscript will + convert an available keymap to UTF-8 on the fly if this variable is + set to the encoding of the available non-UTF-8 keymap. Note, however, + that dead keys (i.e., keys that don't produce a character by + themselves, but put an accent onto a character procuced by the next + key; there are no dead keys on the standard US keyboard) and composing + (i.e., pressing Ctrl+. A E in order to produce the Æ character) + will not work in UTF-8 mode without the special kernel patch. + This variable is useful only in UTF-8 mode.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>BROKEN_COMPOSE</term> + <listitem> + <para>Set this to <quote>0</quote> if you are going to apply the kernel patch in + Chapter 8. Note that you also have to add the character set expected + by composition rules in your keymap to the FONT variable after the + <quote>-m</quote> switch. This variable is useful only in UTF-8 mode.</para> + </listitem> + </varlistentry> + + </variablelist> + + <para>Support for compiling the keymap directly into the kernel has been + removed because there were reports that it leads to incorrect results.</para> + + <para>Some examples:</para> + + <itemizedlist> + + <listitem> + <para>For a non-Unicode setup, only the KEYMAP and FONT variables are + generally needed. E.g., for a Polish setup, one would use:</para> + +<screen role="nodump"><userinput>cat > /etc/sysconfig/console << "EOF" +<literal># Begin /etc/sysconfig/console + +KEYMAP="pl2" +FONT="lat2a-16 -m 8859-2" + +# End /etc/sysconfig/console</literal> EOF</userinput></screen> + </listitem> - <para>For example, for Spanish users who also want to use the Euro - character (accessible by pressing AltGr+E), the following settings are - correct:</para> + <listitem> + <para>As mentioned above, it is sometimes necessary to adjust a + stock keymap slightly. The following example adds the Euro symbol to the + German keymap:</para> -<screen role="nodump"><userinput>cat >/etc/sysconfig/console <<"EOF" -<literal>KEYMAP="es euro2" -FONT="lat9-16 -u iso01"</literal> +<screen role="nodump"><userinput>cat > /etc/sysconfig/console << "EOF" +<literal># Begin /etc/sysconfig/console + +KEYMAP="de-latin1" +KEYMAP_CORRECTIONS="euro2" +FONT="lat0-16 -m 8859-15" + +# End /etc/sysconfig/console</literal> EOF</userinput></screen> + </listitem> - <note> - <para>The <envar>FONT</envar> line above is correct only for the ISO 8859-15 - character set. If using ISO 8859-1 and, therefore, a pound sign - instead of Euro, the correct <envar>FONT</envar> line would be:</para> + <listitem> + <para>The following is a Unicode-enabled example for Bulgarian, where a stock + UTF-8 keymap exists and defines no dead keys or composition rules:</para> -<screen role="nodump"><userinput>FONT="lat1-16"</userinput></screen> - </note> +<screen role="nodump"><userinput>cat > /etc/sysconfig/console << "EOF" +<literal># Begin /etc/sysconfig/console + +UNICODE="1" +KEYMAP="bg_bds-utf8" +FONT="LatArCyrHeb-16" + +# End /etc/sysconfig/console</literal> +EOF</userinput></screen> + </listitem> + + <listitem> + <para>Due to the use of a 512-glyph LatArCyrHeb-16 font in the previous + example, bright colors are no longer available on the Linux console unless + a framebuffer is used. If one wants to have bright colors without + framebuffer and can live without characters not belonging to his language, + it is still possible to use a language-specific 256-glyph font, as + illustrated below.</para> - <para>If the <envar>KEYMAP</envar> or <envar>FONT</envar> variable is not set, - the <command>console</command> initscript will not run the corresponding - program.</para> - - <para>In some keymaps, the Backspace and Delete keys send characters different - from ones in the default keymap built into the kernel. This confuses some - applications. For example, Emacs displays its help (instead of erasing the - character before the cursor) when Backspace is pressed. To check if the keymap - in use is affected (this works only for i386 keymaps):</para> - -<screen role="nodump"><userinput>zgrep '\W14\W' <replaceable>[/path/to/your/keymap]</replaceable></userinput></screen> - - <para>If the keycode 14 is Backspace instead of Delete, create the - following keymap snippet to fix this issue:</para> - -<screen role="nodump"><userinput>mkdir -pv /etc/kbd && cat > /etc/kbd/bs-sends-del <<"EOF" -<literal> keycode 14 = Delete Delete Delete Delete - alt keycode 14 = Meta_Delete - altgr alt keycode 14 = Meta_Delete - keycode 111 = Remove - altgr control keycode 111 = Boot - control alt keycode 111 = Boot -altgr control alt keycode 111 = Boot</literal> +<screen role="nodump"><userinput>cat > /etc/sysconfig/console << "EOF" +<literal># Begin /etc/sysconfig/console + +UNICODE="1" +KEYMAP="bg_bds-utf8" +FONT="cyr-sun16" + +# End /etc/sysconfig/console</literal> EOF</userinput></screen> + </listitem> + + <listitem> + <para>The following example illustrates keymap autoconversion from + ISO-8859-15 to UTF-8 and enabling dead keys in Unicode mode:</para> + +<screen role="nodump"><userinput>cat > /etc/sysconfig/console << "EOF" +<literal># Begin /etc/sysconfig/console - <para>Tell the <command>console</command> script to load this - snippet after the main keymap:</para> +UNICODE="1" +KEYMAP="de-latin1" +KEYMAP_CORRECTIONS="euro2" +LEGACY_CHARSET="iso-8859-15" +BROKEN_COMPOSE="0" +FONT="LatArCyrHeb-16 -m 8859-15" -<screen role="nodump"><userinput>cat >>/etc/sysconfig/console <<"EOF" -<literal>KEYMAP_CORRECTIONS="/etc/kbd/bs-sends-del"</literal> +# End /etc/sysconfig/console</literal> EOF</userinput></screen> + </listitem> - <para>To compile the keymap directly into the kernel instead of - setting it every time from the <command>console</command> bootscript, - follow the instructions given in <xref linkend="ch-bootable-kernel" role="."/> - Doing this ensures that the keyboard will always work as expected, - even when booting into maintenance mode (by passing - <parameter>init=/bin/sh</parameter> to the kernel), because the - <command>console</command> bootscript will not be run in that - situation. Additionally, the kernel will not set the screen font - automatically. This should not pose many problems because ASCII characters - will be handled correctly, and it is unlikely that a user would need - to rely on non-ASCII characters while in maintenance mode.</para> - - <para>Since the kernel will set up the keymap, it is possible to omit - the <envar>KEYMAP</envar> variable from the - <filename>/etc/sysconfig/console</filename> configuration file. It can - also be left in place, if desired, without consequence. Keeping it - could be beneficial if running several different kernels where it is - difficult to ensure that the keymap is compiled into every one of - them.</para> + <listitem> + <para>For Chinese, Japanese, Korean and some other languages, the Linux + console cannot be configured to display the needed characters. Users + who need such languages should install the X Window System, fonts that + cover the necessary character ranges, and the proper input method (e.g., + SCIM, it supports a wide variety of languages).</para> + </listitem> + + </itemizedlist> + + <!-- Added because folks keep posting their console file with X questions + to blfs-support list --> + <note> + <para>The <filename>/etc/sysconfig/console</filename> file only controls the + Linux text console localization. It has nothing to do with setting the proper + keyboard layout and terminal fonts in the X Window System, with ssh sessions + or with a serial console.</para> + </note> </sect1> diff --git a/chapter07/profile.xml b/chapter07/profile.xml index dd53a5141..e2748d9df 100644 --- a/chapter07/profile.xml +++ b/chapter07/profile.xml @@ -69,17 +69,20 @@ for the desired language (e.g., <quote>en</quote>) and <replaceable>[CC]</replaceable> with the two-letter code for the appropriate country (e.g., <quote>GB</quote>). <replaceable>[charmap]</replaceable> should - be replaced with the canonical charmap for your chosen locale.</para> + be replaced with the canonical charmap for your chosen locale. Optional + modifiers such as <quote>@euro</quote> may also be present.</para> <para>The list of all locales supported by Glibc can be obtained by running the following command:</para> <screen role="nodump"><userinput>locale -a</userinput></screen> - <para>Locales can have a number of synonyms, e.g. <quote>ISO-8859-1</quote> + <para>Charmaps can have a number of aliases, e.g., <quote>ISO-8859-1</quote> is also referred to as <quote>iso8859-1</quote> and <quote>iso88591</quote>. - Some applications cannot handle the various synonyms correctly, so it is - safest to choose the canonical name for a particular locale. To determine + Some applications cannot handle the various synonyms correctly (e.g., require + that <quote>UTF-8</quote> is written as <quote>UTF-8</quote>, not + <quote>utf8</quote>), so it is safest in most + cases to choose the canonical name for a particular locale. To determine the canonical name, run the following command, where <replaceable>[locale name]</replaceable> is the output given by <command>locale -a</command> for your preferred locale (<quote>en_GB.iso88591</quote> in our example).</para> @@ -115,6 +118,7 @@ LC_ALL=[locale name] locale int_prefix</userinput></screen> Further instructions assume that there are no such error messages from Glibc.</para> + <!-- FIXME: the xlib example will became obsolete real soon --> <para>Some packages beyond LFS may also lack support for your chosen locale. One example is the X library (part of the X Window System), which outputs the following error message:</para> @@ -139,23 +143,33 @@ LC_ALL=[locale name] locale int_prefix</userinput></screen> <screen><userinput>cat > /etc/profile << "EOF" <literal># Begin /etc/profile -export LANG=<replaceable>[ll]</replaceable>_<replaceable>[CC]</replaceable>.<replaceable>[charmap]</replaceable> +export LANG=<replaceable>[ll]</replaceable>_<replaceable>[CC]</replaceable>.<replaceable>[charmap]</replaceable><replaceable>[@modifiers]</replaceable> export INPUTRC=/etc/inputrc # End /etc/profile</literal> EOF</userinput></screen> - <note> - <para>The <quote>C</quote> (default) and <quote>en_US</quote> (the - recommended one for United States English users) locales are different.</para> - </note> - - <para>Setting the keyboard layout, screen font, and locale-related environment - variables are the only internationalization steps needed to support locales - that use ordinary single-byte encodings and left-to-right writing direction. - More complex cases (including UTF-8 based locales) require additional steps - and additional patches because many applications tend to not work properly - under such conditions. These steps and patches are not included in the LFS - book and such locales are not yet supported by LFS.</para> + <para>The <quote>C</quote> (default) and <quote>en_US</quote> (the recommended + one for United States English users) locales are different. <quote>C</quote> + uses the US-ASCII 7-bit character set, and treats bytes with the high bit set + as invalid characters. That's why, e.g., the <command>ls</command> command + substitutes them with question marks in that locale. Also, an attempt to send + mail with such characters from Mutt or Pine results in non-RFC-conforming + messages being sent (the charset in the outgoing mail is indicated as <quote>unknown + 8-bit</quote>). So you can use the <quote>C</quote> locale only if you are sure that + you will never need 8-bit characters.</para> + + <para>UTF-8 based locales are not supported well by many programs. E.g., the + <command>watch</command> program displays only ASCII characters in UTF-8 + locales and has no such restriction in traditional 8-bit locales like en_US. + Without patches and/or installing software beyond BLFS, in UTF-8 based locales + you will not be able to do such basic tasks as printing plain-text files from + the command line, recording Windows-readable CDs with filenames containing + non-ASCII characters, viewing ID3v1 tags in MP3 files and so on. Work is in + progress to document and, if possible, fix such problems, see + <ulink url="&blfs-root;view/svn/introduction/locale-issues.html"/>. + It is, however, safe to use UTF-8 based locales if you are going to use only + KDE or GNOME and never open the terminal.</para> + <!-- All abovementioned problems except "watch" have a known fix beyond BLFS --> </sect1> |