aboutsummaryrefslogtreecommitdiffstats
path: root/chapter05/toolchaintechnotes.xml
diff options
context:
space:
mode:
authorPierre Labastie <pieere@linuxfromscratch.org>2020-05-03 21:02:51 +0000
committerPierre Labastie <pieere@linuxfromscratch.org>2020-05-03 21:02:51 +0000
commitefcb3933433838b71f3a4a53ec1ac6d899aaec0b (patch)
treef0b1fb24d5ac7ebb93cc2deddefbc16938ea49d0 /chapter05/toolchaintechnotes.xml
parent9d719e24c33f9a2ecf8a5582cd811c43a8fa46c2 (diff)
Make the new book
git-svn-id: http://svn.linuxfromscratch.org/LFS/branches/cross-chap5@11831 4aa44e1e-78dd-0310-a6d2-fbcd4c07a689
Diffstat (limited to 'chapter05/toolchaintechnotes.xml')
-rw-r--r--chapter05/toolchaintechnotes.xml445
1 files changed, 307 insertions, 138 deletions
diff --git a/chapter05/toolchaintechnotes.xml b/chapter05/toolchaintechnotes.xml
index e0ab899eb..63c9210e5 100644
--- a/chapter05/toolchaintechnotes.xml
+++ b/chapter05/toolchaintechnotes.xml
@@ -24,143 +24,312 @@
process has been designed to minimize the risks for new readers and to provide
the most educational value at the same time.</para>
- <note>
- <para>Before continuing, be aware of the name of the working platform,
- often referred to as the target triplet. A simple way to determine the
- name of the target triplet is to run the <command>config.guess</command>
- script that comes with the source for many packages. Unpack the Binutils
- sources and run the script: <userinput>./config.guess</userinput> and note
- the output. For example, for a 32-bit Intel processor the
- output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
- system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>.</para>
-
- <para>Also be aware of the name of the platform's dynamic linker, often
- referred to as the dynamic loader (not to be confused with the standard
- linker <command>ld</command> that is part of Binutils). The dynamic linker
- provided by Glibc finds and loads the shared libraries needed by a program,
- prepares the program to run, and then runs it. The name of the dynamic
- linker for a 32-bit Intel machine will be <filename
- class="libraryfile">ld-linux.so.2</filename> (<filename
- class="libraryfile">ld-linux-x86-64.so.2</filename> for 64-bit systems). A
- sure-fire way to determine the name of the dynamic linker is to inspect a
- random binary from the host system by running: <userinput>readelf -l
- &lt;name of binary&gt; | grep interpreter</userinput> and noting the
- output. The authoritative reference covering all platforms is in the
- <filename>shlib-versions</filename> file in the root of the Glibc source
- tree.</para>
- </note>
-
- <para>Some key technical points of how the <xref
- linkend="chapter-temporary-tools"/> build method works:</para>
-
- <itemizedlist>
- <listitem>
- <para>Slightly adjusting the name of the working platform, by changing the
- &quot;vendor&quot; field target triplet by way of the
- <envar>LFS_TGT</envar> variable, ensures that the first build of Binutils
- and GCC produces a compatible cross-linker and cross-compiler. Instead of
- producing binaries for another architecture, the cross-linker and
- cross-compiler will produce binaries compatible with the current
- hardware.</para>
- </listitem>
- <listitem>
- <para> The temporary libraries are cross-compiled. Because a
- cross-compiler by its nature cannot rely on anything from its host
- system, this method removes potential contamination of the target
- system by lessening the chance of headers or libraries from the host
- being incorporated into the new tools. Cross-compilation also allows for
- the possibility of building both 32-bit and 64-bit libraries on 64-bit
- capable hardware.</para>
- </listitem>
- <listitem>
- <para>Careful manipulation of the GCC source tells the compiler which target
- dynamic linker will be used.</para>
- </listitem>
- </itemizedlist>
-
- <para>Binutils is installed first because the <command>configure</command>
- runs of both GCC and Glibc perform various feature tests on the assembler
- and linker to determine which software features to enable or disable. This
- is more important than one might first realize. An incorrectly configured
- GCC or Glibc can result in a subtly broken toolchain, where the impact of
- such breakage might not show up until near the end of the build of an
- entire distribution. A test suite failure will usually highlight this error
- before too much additional work is performed.</para>
-
- <para>Binutils installs its assembler and linker in two locations,
- <filename class="directory">/tools/bin</filename> and <filename
- class="directory">/tools/$LFS_TGT/bin</filename>. The tools in one
- location are hard linked to the other. An important facet of the linker is
- its library search order. Detailed information can be obtained from
- <command>ld</command> by passing it the <parameter>--verbose</parameter>
- flag. For example, an <userinput>ld --verbose | grep SEARCH</userinput>
- will illustrate the current search paths and their order. It shows which
- files are linked by <command>ld</command> by compiling a dummy program and
- passing the <parameter>--verbose</parameter> switch to the linker. For example,
- <userinput>gcc dummy.c -Wl,--verbose 2&gt;&amp;1 | grep succeeded</userinput>
- will show all the files successfully opened during the linking.</para>
-
- <para>The next package installed is GCC. An example of what can be
- seen during its run of <command>configure</command> is:</para>
-
-<screen><computeroutput>checking what assembler to use... /tools/i686-lfs-linux-gnu/bin/as
-checking what linker to use... /tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>
-
- <para>This is important for the reasons mentioned above. It also demonstrates
- that GCC's configure script does not search the PATH directories to find which
- tools to use. However, during the actual operation of <command>gcc</command>
- itself, the same search paths are not necessarily used. To find out which
- standard linker <command>gcc</command> will use, run:
- <userinput>gcc -print-prog-name=ld</userinput>.</para>
-
- <para>Detailed information can be obtained from <command>gcc</command> by
- passing it the <parameter>-v</parameter> command line option while compiling
- a dummy program. For example, <userinput>gcc -v dummy.c</userinput> will show
- detailed information about the preprocessor, compilation, and assembly stages,
- including <command>gcc</command>'s included search paths and their order.</para>
-
- <para>Next installed are sanitized Linux API headers. These allow the standard
- C library (Glibc) to interface with features that the Linux kernel will
- provide.</para>
-
- <para>The next package installed is Glibc. The most important considerations
- for building Glibc are the compiler, binary tools, and kernel headers. The
- compiler is generally not an issue since Glibc will always use the compiler
- relating to the <parameter>--host</parameter> parameter passed to its
- configure script; e.g. in our case, the compiler will be
- <command>i686-lfs-linux-gnu-gcc</command>. The binary tools and kernel
- headers can be a bit more complicated. Therefore, take no risks and use the
- available configure switches to enforce the correct selections. After the run
- of <command>configure</command>, check the contents of the
- <filename>config.make</filename> file in the <filename
- class="directory">glibc-build</filename> directory for all important details.
- Note the use of <parameter>CC="i686-lfs-gnu-gcc"</parameter> to control which
- binary tools are used and the use of the <parameter>-nostdinc</parameter> and
- <parameter>-isystem</parameter> flags to control the compiler's include
- search path. These items highlight an important aspect of the Glibc
- package&mdash;it is very self-sufficient in terms of its build machinery and
- generally does not rely on toolchain defaults.</para>
-
- <para>During the second pass of Binutils, we are able to utilize the
- <parameter>--with-lib-path</parameter> configure switch to control
- <command>ld</command>'s library search path.</para>
-
- <para>For the second pass of GCC, its sources also need to be modified to
- tell GCC to use the new dynamic linker. Failure to do so will result in the
- GCC programs themselves having the name of the dynamic linker from the host
- system's <filename class="directory">/lib</filename> directory embedded into
- them, which would defeat the goal of getting away from the host. From this
- point onwards, the core toolchain is self-contained and self-hosted. The
- remainder of the <xref linkend="chapter-temporary-tools"/> packages all build
- against the new Glibc in <filename
- class="directory">/tools</filename>.</para>
-
- <para>Upon entering the chroot environment in <xref
- linkend="chapter-building-system"/>, the first major package to be
- installed is Glibc, due to its self-sufficient nature mentioned above.
- Once this Glibc is installed into <filename
- class="directory">/usr</filename>, we will perform a quick changeover of the
- toolchain defaults, and then proceed in building the rest of the target
- LFS system.</para>
+ <para>The build process is based on the process of
+ <emphasis>cross-compilation</emphasis>. Cross-compilation is normally used
+ for building a compiler and its toolchain for a machine different from
+ the one that is used for the build. This is not strictly needed for LFS,
+ since the machine where the new system will run is the same as the one
+ used for the build. But cross-compilation has the great advantage that
+ anything that is cross-compiled cannot depend on the host environment.</para>
+
+ <sect2 id="cross-compile" xreflabel="About Cross-Compilation">
+
+ <title>About Cross-Compilation</title>
+
+ <para>Cross-compilation involves some concepts that deserve a section on
+ their own. Although this section may be omitted in a first reading, it
+ is strongly suggested to come back to it later in order to get a full
+ grasp of the build process.</para>
+
+ <para>Let us first define some terms used in this context:</para>
+
+ <variablelist>
+ <varlistentry><term>build</term><listitem>
+ <para>is the machine where we build programs. Note that this machine
+ is referred to as the <quote>host</quote> in other
+ sections.</para></listitem>
+ </varlistentry>
+
+ <varlistentry><term>host</term><listitem>
+ <para>is the machine/system where the built programs will run. Note
+ that this use of <quote>host</quote> is not the same as in other
+ sections.</para></listitem>
+ </varlistentry>
+
+ <varlistentry><term>target</term><listitem>
+ <para>is only used for compilers. It is the machine the compiler
+ produces code for. It may be different from both build and
+ host.</para></listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>As an example, let us imagine the following scenario: we may have a
+ compiler on a slow machine only, let's call the machine A, and the compiler
+ ccA. We may have also a fast machine (B), but with no compiler, and we may
+ want to produce code for a another slow machine (C). Then, to build a
+ compiler for machine C, we would have three stages:</para>
+
+ <informaltable align="center">
+ <tgroup cols="5">
+ <colspec colnum="1" align="center"/>
+ <colspec colnum="2" align="center"/>
+ <colspec colnum="3" align="center"/>
+ <colspec colnum="4" align="center"/>
+ <colspec colnum="5" align="left"/>
+ <thead>
+ <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
+ <entry>Target</entry><entry>Action</entry></row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>1</entry><entry>A</entry><entry>A</entry><entry>B</entry>
+ <entry>build cross-compiler cc1 using ccA on machine A</entry>
+ </row>
+ <row>
+ <entry>2</entry><entry>A</entry><entry>B</entry><entry>B</entry>
+ <entry>build cross-compiler cc2 using cc1 on machine A</entry>
+ </row>
+ <row>
+ <entry>3</entry><entry>B</entry><entry>C</entry><entry>C</entry>
+ <entry>build compiler ccC using cc2 on machine B</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+
+ <para>Then, all the other programs needed by machine C can be compiled
+ using cc2 on the fast machine B. Note that unless B can run programs
+ produced for C, there is no way to test the built programs until machine
+ C itself is running. For example, for testing ccC, we may want to add a
+ fourth stage:</para>
+
+ <informaltable align="center">
+ <tgroup cols="5">
+ <colspec colnum="1" align="center"/>
+ <colspec colnum="2" align="center"/>
+ <colspec colnum="3" align="center"/>
+ <colspec colnum="4" align="center"/>
+ <colspec colnum="5" align="left"/>
+ <thead>
+ <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
+ <entry>Target</entry><entry>Action</entry></row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>4</entry><entry>C</entry><entry>C</entry><entry>C</entry>
+ <entry>rebuild and test ccC using itself on machine C</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+
+ <para>In the example above, only cc1 and cc2 are cross-compilers, that is,
+ they produce code for a machine different from the one they are run on.
+ The other compilers ccA and ccC produce code for the machine they are run
+ on. Such compilers are called <emphasis>native</emphasis> compilers.</para>
+
+ </sect2>
+
+ <sect2 id="lfs-cross">
+ <title>Implementation of Cross-Compilation for LFS</title>
+
+ <note>
+ <para>Almost all the build systems use names of the form
+ cpu-vendor-kernel-os referred to as the machine triplet. An astute
+ reader may wonder why a <quote>triplet</quote> refers to a four component
+ name. The reason is history: initially, three component names were enough
+ to designate unambiguously a machine, but with new machines and systems
+ appearing, that proved insufficient. The word <quote>triplet</quote>
+ remained. A simple way to determine your machine triplet is to run
+ the <command>config.guess</command>
+ script that comes with the source for many packages. Unpack the Binutils
+ sources and run the script: <userinput>./config.guess</userinput> and note
+ the output. For example, for a 32-bit Intel processor the
+ output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
+ system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>.</para>
+
+ <para>Also be aware of the name of the platform's dynamic linker, often
+ referred to as the dynamic loader (not to be confused with the standard
+ linker <command>ld</command> that is part of Binutils). The dynamic linker
+ provided by Glibc finds and loads the shared libraries needed by a
+ program, prepares the program to run, and then runs it. The name of the
+ dynamic linker for a 32-bit Intel machine will be <filename
+ class="libraryfile">ld-linux.so.2</filename> (<filename
+ class="libraryfile">ld-linux-x86-64.so.2</filename> for 64-bit systems). A
+ sure-fire way to determine the name of the dynamic linker is to inspect a
+ random binary from the host system by running: <userinput>readelf -l
+ &lt;name of binary&gt; | grep interpreter</userinput> and noting the
+ output. The authoritative reference covering all platforms is in the
+ <filename>shlib-versions</filename> file in the root of the Glibc source
+ tree.</para>
+ </note>
+
+ <para>In order to fake a cross compilation, the name of the host triplet
+ is slightly adjusted by changing the &quot;vendor&quot; field in the
+ <envar>LFS_TGT</envar> variable. We also use the
+ <parameter>--with-sysroot</parameter> when building the cross linker and
+ cross compiler, to tell them where to find the needed host files. This
+ ensures none of the other programs built in <xref
+ linkend="chapter-temporary-tools"/> can link to libraries on the build
+ machine. Only two stages are mandatory, and one more for tests:</para>
+
+ <informaltable align="center">
+ <tgroup cols="5">
+ <colspec colnum="1" align="center"/>
+ <colspec colnum="2" align="center"/>
+ <colspec colnum="3" align="center"/>
+ <colspec colnum="4" align="center"/>
+ <colspec colnum="5" align="left"/>
+ <thead>
+ <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
+ <entry>Target</entry><entry>Action</entry></row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>1</entry><entry>pc</entry><entry>pc</entry><entry>lfs</entry>
+ <entry>build cross-compiler cc1 using cc-pc on pc</entry>
+ </row>
+ <row>
+ <entry>2</entry><entry>pc</entry><entry>lfs</entry><entry>lfs</entry>
+ <entry>build compiler cc-lfs using cc1 on pc</entry>
+ </row>
+ <row>
+ <entry>3</entry><entry>lfs</entry><entry>lfs</entry><entry>lfs</entry>
+ <entry>rebuild and test cc-lfs using itself on lfs</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+
+ <para>In the above table, <quote>on pc</quote> means the commands are run
+ on a machine using the already installed distribution. <quote>On
+ lfs</quote> means the commands are run in a chrooted environment.</para>
+
+ <para>Now, there is more about cross-compiling: the C language is not
+ just a compiler, but also defines a standard library. In this book, the
+ GNU C library, named glibc, is used. This library must
+ be compiled for the lfs machine, that is, using the cross compiler cc1.
+ But the compiler itself uses an internal library implementing complex
+ instructions not available in the assembler instruction set. This
+ internal library is named libgcc, and must be linked to the glibc
+ library to be fully functional! Furthermore, the standard library for
+ C++ (libstdc++) also needs being linked to glibc. The solution
+ to this chicken and egg problem is to first build a degraded cc1+libgcc,
+ lacking some fuctionalities such as threads and exception handling, then
+ build glibc using this degraded compiler (glibc itself is not
+ degraded), then build libstdc++. But this last library will lack the
+ same functionalities as libgcc.</para>
+
+ <para>This is not the end of the story: the conclusion of the preceding
+ paragraph is that cc1 is unable to build a fully functional libstdc++, but
+ this is the only compiler available for building the C/C++ libraries
+ during stage 2! Of course, the compiler built during stage 2, cc-lfs,
+ would be able to build those libraries, but (i) the build system of
+ gcc does not know that it is usable on pc, and (ii) using it on pc
+ would be at risk of linking to the pc libraries, since cc-lfs is a native
+ compiler. So we have to build libstdc++ later, in chroot.</para>
+
+ </sect2>
+
+ <sect2 id="other-details">
+
+ <title>Other procedural details</title>
+
+ <para>The cross-compiler will be installed in a separate <filename
+ class="directory">$LFS/tools</filename> directory, since it will not
+ be part of the final system.</para>
+
+ <para>Binutils is installed first because the <command>configure</command>
+ runs of both GCC and Glibc perform various feature tests on the assembler
+ and linker to determine which software features to enable or disable. This
+ is more important than one might first realize. An incorrectly configured
+ GCC or Glibc can result in a subtly broken toolchain, where the impact of
+ such breakage might not show up until near the end of the build of an
+ entire distribution. A test suite failure will usually highlight this error
+ before too much additional work is performed.</para>
+
+ <para>Binutils installs its assembler and linker in two locations,
+ <filename class="directory">$LFS/tools/bin</filename> and <filename
+ class="directory">$LFS/tools/$LFS_TGT/bin</filename>. The tools in one
+ location are hard linked to the other. An important facet of the linker is
+ its library search order. Detailed information can be obtained from
+ <command>ld</command> by passing it the <parameter>--verbose</parameter>
+ flag. For example, <command>$LFS_TGT-ld --verbose | grep SEARCH</command>
+ will illustrate the current search paths and their order. It shows which
+ files are linked by <command>ld</command> by compiling a dummy program and
+ passing the <parameter>--verbose</parameter> switch to the linker. For
+ example,
+ <command>$LFS_TGT-gcc dummy.c -Wl,--verbose 2&gt;&amp;1 | grep succeeded</command>
+ will show all the files successfully opened during the linking.</para>
+
+ <para>The next package installed is GCC. An example of what can be
+ seen during its run of <command>configure</command> is:</para>
+
+<screen><computeroutput>checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
+checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>
+
+ <para>This is important for the reasons mentioned above. It also
+ demonstrates that GCC's configure script does not search the PATH
+ directories to find which tools to use. However, during the actual
+ operation of <command>gcc</command> itself, the same search paths are not
+ necessarily used. To find out which standard linker <command>gcc</command>
+ will use, run: <command>$LFS_TGT-gcc -print-prog-name=ld</command>.</para>
+
+ <para>Detailed information can be obtained from <command>gcc</command> by
+ passing it the <parameter>-v</parameter> command line option while compiling
+ a dummy program. For example, <command>gcc -v dummy.c</command> will show
+ detailed information about the preprocessor, compilation, and assembly
+ stages, including <command>gcc</command>'s included search paths and their
+ order.</para>
+
+ <para>Next installed are sanitized Linux API headers. These allow the
+ standard C library (Glibc) to interface with features that the Linux
+ kernel will provide.</para>
+
+ <para>The next package installed is Glibc. The most important
+ considerations for building Glibc are the compiler, binary tools, and
+ kernel headers. The compiler is generally not an issue since Glibc will
+ always use the compiler relating to the <parameter>--host</parameter>
+ parameter passed to its configure script; e.g. in our case, the compiler
+ will be <command>$LFS_TGT-gcc</command>. The binary tools and kernel
+ headers can be a bit more complicated. Therefore, take no risks and use
+ the available configure switches to enforce the correct selections. After
+ the run of <command>configure</command>, check the contents of the
+ <filename>config.make</filename> file in the <filename
+ class="directory">build</filename> directory for all important details.
+ Note the use of <parameter>CC="$LFS_TGT-gcc"</parameter> (with
+ <envar>$LFS_TGT</envar> expanded) to control which binary tools are used
+ and the use of the <parameter>-nostdinc</parameter> and
+ <parameter>-isystem</parameter> flags to control the compiler's include
+ search path. These items highlight an important aspect of the Glibc
+ package&mdash;it is very self-sufficient in terms of its build machinery
+ and generally does not rely on toolchain defaults.</para>
+
+ <para>As said above, the standard C++ library is compiled next, followed
+ by all the programs that need themselves to be built. The install step
+ uses the <envar>DESTDIR</envar> variable to have the programs land into
+ the LFS filesystem.</para>
+
+ <para>Then the native lfs compiler is built. First Binutils Pass 2, with
+ the same <envar>DESTDIR</envar> install as the other programs, then the
+ second pass of GCC, omitting libstdc++ and other non-important libraries.
+ Due to some weird logic in GCC's configure script,
+ <envar>CC_FOR_TARGET</envar> ends up as <command>cc</command> when host
+ is the same as target, but is different from build. This is why
+ <parameter>CC_FOR_TARGET=$LFS_TGT-gcc</parameter> is put explicitely into
+ the configure options.</para>
+
+ <para>Upon entering the chroot environment in <xref
+ linkend="chapter-building-system"/>, the first task is to install
+ libstdc++. Then temporary installations of programs needed for the proper
+ operation of the toolchain are performed. Programs needed for testing
+ other programs are also built. From this point onwards, the
+ core toolchain is self-contained and self-hosted. In the remainder of
+ the <xref linkend="chapter-building-system"/>, final versions of all the
+ packages needed for a fully functional system are built, tested and
+ installed.</para>
+
+ </sect2>
</sect1>