aboutsummaryrefslogtreecommitdiffstats
path: root/part3intro/toolchaintechnotes.xml
blob: db39452845c0b01a10e4270bc503123285571179 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
  <!ENTITY % general-entities SYSTEM "../general.ent">
  %general-entities;
]>

<sect1 id="ch-tools-toolchaintechnotes" xreflabel="Toolchain Technical Notes">
  <?dbhtml filename="toolchaintechnotes.html"?>

  <title>Toolchain Technical Notes</title>

  <para>This section explains some of the rationale and technical details
  behind the overall build method. Don't try to immediately
  understand everything in this section. Most of this information will be
  clearer after performing an actual build. Come back and re-read this chapter
  at any time during the build process.</para>

  <para>The overall goal of <xref linkend="chapter-cross-tools"/> and <xref
  linkend="chapter-temporary-tools"/> is to produce a temporary area
  containing a set of tools that are known to be good, and that are isolated from the host system.
  By using the <command>chroot</command> command, the compilations in the remaining chapters
  will be isolated within that environment, ensuring a clean, trouble-free
  build of the target LFS system. The build process has been designed to
  minimize the risks for new readers, and to provide the most educational value
  at the same time.</para>

  <para>This build process is based on
  <emphasis>cross-compilation</emphasis>. Cross-compilation is normally used
  to build a compiler and its associated toolchain for a machine different from
  the one that is used for the build. This is not strictly necessary for LFS,
  since the machine where the new system will run is the same as the one
  used for the build. But cross-compilation has one great advantage:
  anything that is cross-compiled cannot depend on the host environment.</para>

  <sect2 id="cross-compile" xreflabel="About Cross-Compilation">

    <title>About Cross-Compilation</title>

    <note>
      <para>
        The LFS book is not (and does not contain) a general tutorial to
        build a cross- (or native) toolchain. Don't use the commands in the
        book for a cross-toolchain for some purpose other
        than building LFS, unless you really understand what you are doing.
      </para>
    </note>

    <para>Cross-compilation involves some concepts that deserve a section of
    their own. Although this section may be omitted on a first reading,
    coming back to it later will help you gain a fuller understanding of
    the process.</para>

    <para>Let us first define some terms used in this context.</para>

    <variablelist>
      <varlistentry><term>The build</term><listitem>
        <para>is the machine where we build programs. Note that this machine
        is also referred to as the <quote>host</quote>.</para></listitem>
      </varlistentry>

      <varlistentry><term>The host</term><listitem>
        <para>is the machine/system where the built programs will run. Note
        that this use of <quote>host</quote> is not the same as in other
        sections.</para></listitem>
      </varlistentry>

      <varlistentry><term>The target</term><listitem>
        <para>is only used for compilers. It is the machine the compiler
        produces code for. It may be different from both the build and
        the host.</para></listitem>
      </varlistentry>

    </variablelist>

    <para>As an example, let us imagine the following scenario (sometimes
    referred to as <quote>Canadian Cross</quote>). We have a
    compiler on a slow machine only, let's call it machine A, and the compiler
    ccA. We also have a fast machine (B), but no compiler for (B), and we
    want to produce code for a third, slow machine (C). We will build a
    compiler for machine C in three stages.</para>

    <informaltable align="center">
      <tgroup cols="5">
        <colspec colnum="1" align="center"/>
        <colspec colnum="2" align="center"/>
        <colspec colnum="3" align="center"/>
        <colspec colnum="4" align="center"/>
        <colspec colnum="5" align="left"/>
        <thead>
          <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
               <entry>Target</entry><entry>Action</entry></row>
        </thead>
        <tbody>
          <row>
            <entry>1</entry><entry>A</entry><entry>A</entry><entry>B</entry>
            <entry>Build cross-compiler cc1 using ccA on machine A.</entry>
          </row>
          <row>
            <entry>2</entry><entry>A</entry><entry>B</entry><entry>C</entry>
            <entry>Build cross-compiler cc2 using cc1 on machine A.</entry>
          </row>
          <row>
            <entry>3</entry><entry>B</entry><entry>C</entry><entry>C</entry>
            <entry>Build compiler ccC using cc2 on machine B.</entry>
          </row>
        </tbody>
      </tgroup>
    </informaltable>

    <para>Then, all the programs needed by machine C can be compiled
    using cc2 on the fast machine B. Note that unless B can run programs
    produced for C, there is no way to test the newly built programs until machine
    C itself is running. For example, to run a test suite on ccC, we may want to add a
    fourth stage:</para>

    <informaltable align="center">
      <tgroup cols="5">
        <colspec colnum="1" align="center"/>
        <colspec colnum="2" align="center"/>
        <colspec colnum="3" align="center"/>
        <colspec colnum="4" align="center"/>
        <colspec colnum="5" align="left"/>
        <thead>
          <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
               <entry>Target</entry><entry>Action</entry></row>
        </thead>
        <tbody>
          <row>
            <entry>4</entry><entry>C</entry><entry>C</entry><entry>C</entry>
            <entry>Rebuild and test ccC using ccC on machine C.</entry>
          </row>
        </tbody>
      </tgroup>
    </informaltable>

    <para>In the example above, only cc1 and cc2 are cross-compilers, that is,
    they produce code for a machine different from the one they are run on.
    The other compilers ccA and ccC produce code for the machine they are run
    on. Such compilers are called <emphasis>native</emphasis> compilers.</para>

  </sect2>

  <sect2 id="lfs-cross">
    <title>Implementation of Cross-Compilation for LFS</title>

    <note>
      <para>All the cross-compiled packages in this book use an
      autoconf-based building system.  The autoconf-based building system
      accepts system types in the form cpu-vendor-kernel-os,
      referred to as the system triplet.  Since the vendor field is often
      irrelevant, autoconf lets you omit it.</para>
      
      <para>An astute reader may wonder
      why a <quote>triplet</quote> refers to a four component name. The
      kernel field and the os field began as a single
      <quote>system</quote> field.  Such a three-field form is still valid
      today for some systems, for example,
      <literal>x86_64-unknown-freebsd</literal>.  But
      two systems can share the same kernel and still be too different to
      use the same triplet to describe them.  For example, Android running on a
      mobile phone is completely different from Ubuntu running on an ARM64
      server, even though they are both running on the same type of CPU (ARM64) and
      using the same kernel (Linux).</para>
      
      <para>Without an emulation layer, you cannot run an
      executable for a server on a mobile phone or vice versa.  So the
      <quote>system</quote> field has been divided into kernel and os fields, to
      designate these systems unambiguously.  In our example, the Android
      system is designated <literal>aarch64-unknown-linux-android</literal>,
      and the Ubuntu system is designated
      <literal>aarch64-unknown-linux-gnu</literal>.</para>
      
      <para>The word <quote>triplet</quote> remains embedded in the lexicon. A simple way to determine your
      system triplet is to run the <command>config.guess</command>
      script that comes with the source for many packages. Unpack the binutils
      sources, run the script <userinput>./config.guess</userinput>, and note
      the output. For example, for a 32-bit Intel processor the
      output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
      system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>. On most
      Linux systems the even simpler <command>gcc -dumpmachine</command> command
      will give you similar information.</para>

      <para>You should also be aware of the name of the platform's dynamic linker, often
      referred to as the dynamic loader (not to be confused with the standard
      linker <command>ld</command> that is part of binutils). The dynamic linker
      provided by package glibc finds and loads the shared libraries needed by a
      program, prepares the program to run, and then runs it. The name of the
      dynamic linker for a 32-bit Intel machine is <filename
      class="libraryfile">ld-linux.so.2</filename>; it's <filename
      class="libraryfile">ld-linux-x86-64.so.2</filename> on 64-bit systems. A
      sure-fire way to determine the name of the dynamic linker is to inspect a
      random binary from the host system by running: <userinput>readelf -l
      &lt;name of binary&gt; | grep interpreter</userinput> and noting the
      output. The authoritative reference covering all platforms is in the
      <filename>shlib-versions</filename> file in the root of the glibc source
      tree.</para>
    </note>

    <para>In order to fake a cross-compilation in LFS, the name of the host triplet
    is slightly adjusted by changing the &quot;vendor&quot; field in the
    <envar>LFS_TGT</envar> variable so it says &quot;lfs&quot;. We also use the
    <parameter>--with-sysroot</parameter> option when building the cross-linker and
    cross-compiler to tell them where to find the needed host files. This
    ensures that none of the other programs built in <xref
    linkend="chapter-temporary-tools"/> can link to libraries on the build
    machine. Only two stages are mandatory, plus one more for tests.</para>

    <informaltable align="center">
      <tgroup cols="5">
        <colspec colnum="1" align="center"/>
        <colspec colnum="2" align="center"/>
        <colspec colnum="3" align="center"/>
        <colspec colnum="4" align="center"/>
        <colspec colnum="5" align="left"/>
        <thead>
          <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
               <entry>Target</entry><entry>Action</entry></row>
        </thead>
        <tbody>
          <row>
            <entry>1</entry><entry>pc</entry><entry>pc</entry><entry>lfs</entry>
            <entry>Build cross-compiler cc1 using cc-pc on pc.</entry>
          </row>
          <row>
            <entry>2</entry><entry>pc</entry><entry>lfs</entry><entry>lfs</entry>
            <entry>Build compiler cc-lfs using cc1 on pc.</entry>
          </row>
          <row>
            <entry>3</entry><entry>lfs</entry><entry>lfs</entry><entry>lfs</entry>
            <entry>Rebuild and test cc-lfs using cc-lfs on lfs.</entry>
          </row>
        </tbody>
      </tgroup>
    </informaltable>

    <para>In the preceding table, <quote>on pc</quote> means the commands are run
    on a machine using the already installed distribution. <quote>On
    lfs</quote> means the commands are run in a chrooted environment.</para>

    <para>Now, there is more about cross-compiling: the C language is not
    just a compiler, but also defines a standard library. In this book, the
    GNU C library, named glibc, is used (there is an alternative, &quot;musl&quot;). This library must
    be compiled for the LFS machine; that is, using the cross-compiler cc1.
    But the compiler itself uses an internal library providing complex
    subroutines for functions not available in the assembler instruction set. This
    internal library is named libgcc, and it must be linked to the glibc
    library to be fully functional. Furthermore, the standard library for
    C++ (libstdc++) must also be linked with glibc. The solution to this
    chicken and egg problem is first to build a degraded cc1-based libgcc,
    lacking some functionalities such as threads and exception handling, and then
    to build glibc using this degraded compiler (glibc itself is not
    degraded), and also to build libstdc++. This last library will lack some of the
    functionality of libgcc.</para>

    <para>The upshot of the preceding
    paragraph is that cc1 is unable to build a fully functional libstdc++, but
    this is the only compiler available for building the C/C++ libraries
    during stage 2. Of course, the compiler built by stage 2, cc-lfs,
    would be able to build those libraries, but:</para>

    <itemizedlist>
      <listitem>
        <para>
          Generally cc-lfs cannot run on pc (the host distro).  Despite the
          triplets of pc and lfs are compatible to each other, an executable
          for lfs will depend on glibc-&glibc-version; while the host distro
          may utilizes a different libc implementation (for example, musl) or
          a previous release of glibc (for example, glibc-2.13).
        </para>
      </listitem>
      <listitem>
        <para>
          Even if cc-lfs happens to run on pc, using it on pc would create
          a risk of linking to the pc libraries, since cc-lfs is a native
          compiler.
        </para>
      </listitem>
    </itemizedlist>

    <para>So we have to re-build libstdc++ later as a part of gcc stage 2.</para>

    <para>In &ch-final; (or <quote>stage 3</quote>), all the packages needed for
    the LFS system are built. Even if a package has already been installed into
    the LFS system in a previous chapter, we still rebuild the package.  The main reason for
    rebuilding these packages is to make them stable: if we reinstall a LFS
    package on a complete LFS system, the installed content of the package
    should be the same as the content of the same package when installed in
    &ch-final;.  The temporary packages installed in &ch-tmp-cross; or
    &ch-tmp-chroot; cannot satisfy this requirement, because some of them
    are built without optional dependencies, and autoconf cannot
    perform some feature checks in &ch-tmp-cross; because of cross-compilation,
    causing the temporary packages to lack optional features,
    or use suboptimal code routines. Additionally, a minor reason for
    rebuilding the packages is to run the test suites.</para>

  </sect2>

  <sect2 id="other-details">

    <title>Other Procedural Details</title>

    <para>The cross-compiler will be installed in a separate <filename
    class="directory">$LFS/tools</filename> directory, since it will not
    be part of the final system.</para>

    <para>Binutils is installed first because the <command>configure</command>
    runs of both gcc and glibc perform various feature tests on the assembler
    and linker to determine which software features to enable or disable. This
    is more important than one might realize at first. An incorrectly configured
    gcc or glibc can result in a subtly broken toolchain, where the impact of
    such breakage might not show up until near the end of the build of an
    entire distribution. A test suite failure will usually highlight this error
    before too much additional work is performed.</para>

    <para>Binutils installs its assembler and linker in two locations,
    <filename class="directory">$LFS/tools/bin</filename> and <filename
    class="directory">$LFS/tools/$LFS_TGT/bin</filename>. The tools in one
    location are hard linked to the other. An important facet of the linker is
    its library search order. Detailed information can be obtained from
    <command>ld</command> by passing it the <parameter>--verbose</parameter>
    flag. For example, <command>$LFS_TGT-ld --verbose | grep SEARCH</command>
    will illustrate the current search paths and their order. (Note that this
    example can be run as shown only while logged in as user
    <systemitem class="username">lfs</systemitem>. If you come back to this
    page later, replace <command>$LFS_TGT-ld</command> with
    <command>ld</command>).</para>

    <para>The next package installed is gcc. An example of what can be
    seen during its run of <command>configure</command> is:</para>

<screen><computeroutput>checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>

    <para>This is important for the reasons mentioned above. It also
    demonstrates that gcc's configure script does not search the PATH
    directories to find which tools to use. However, during the actual
    operation of <command>gcc</command> itself, the same search paths are not
    necessarily used. To find out which standard linker <command>gcc</command>
    will use, run: <command>$LFS_TGT-gcc -print-prog-name=ld</command>. (Again,
    remove the <command>$LFS_TGT-</command> prefix if coming back to this
    later.)</para>

    <para>Detailed information can be obtained from <command>gcc</command> by
    passing it the <parameter>-v</parameter> command line option while compiling
    a program. For example, <command>$LFS_TGT-gcc -v
    <replaceable>example.c</replaceable></command> (or without <command>
    $LFS_TGT-</command> if coming back later) will show
    detailed information about the preprocessor, compilation, and assembly
    stages, including <command>gcc</command>'s search paths for included
    headers and their order.</para>

    <para>Next up: sanitized Linux API headers. These allow the
    standard C library (glibc) to interface with features that the Linux
    kernel will provide.</para>

    <para>Next comes glibc. The most important
    considerations for building glibc are the compiler, binary tools, and
    kernel headers. The compiler is generally not an issue since glibc will
    always use the compiler relating to the <parameter>--host</parameter>
    parameter passed to its configure script; e.g., in our case, the compiler
    will be <command>$LFS_TGT-gcc</command>. The binary tools and kernel
    headers can be a bit more complicated. Therefore, we take no risks and use
    the available configure switches to enforce the correct selections. After
    the run of <command>configure</command>, check the contents of the
    <filename>config.make</filename> file in the <filename
    class="directory">build</filename> directory for all important details.
    Note the use of <parameter>CC="$LFS_TGT-gcc"</parameter> (with
    <envar>$LFS_TGT</envar> expanded) to control which binary tools are used
    and the use of the <parameter>-nostdinc</parameter> and
    <parameter>-isystem</parameter> flags to control the compiler's include
    search path. These items highlight an important aspect of the glibc
    package&mdash;it is very self-sufficient in terms of its build machinery,
    and generally does not rely on toolchain defaults.</para>

    <para>As mentioned above, the standard C++ library is compiled next, followed in
    <xref linkend="chapter-temporary-tools"/> by other programs that must
    be cross-compiled to break circular dependencies at build time.
    The install step of all those packages uses the
    <envar>DESTDIR</envar> variable to force installation
    in the LFS filesystem.</para>

    <para>At the end of <xref linkend="chapter-temporary-tools"/> the native
    LFS compiler is installed. First binutils-pass2 is built,
    in the same <envar>DESTDIR</envar> directory as the other programs,
    then the second pass of gcc is constructed, omitting some
    non-critical libraries.  Due to some weird logic in gcc's
    configure script, <envar>CC_FOR_TARGET</envar> ends up as
    <command>cc</command> when the host is the same as the target, but
    different from the build system. This is why
    <parameter>CC_FOR_TARGET=$LFS_TGT-gcc</parameter> is declared explicitly
    as one of the configuration options.</para>

    <para>Upon entering the chroot environment in <xref
    linkend="chapter-chroot-temporary-tools"/>,
    the temporary installations of programs needed for the proper
    operation of the toolchain are performed. From this point onwards, the
    core toolchain is self-contained and self-hosted. In
    <xref linkend="chapter-building-system"/>, final versions of all the
    packages needed for a fully functional system are built, tested, and
    installed.</para>

  </sect2>

</sect1>