ii. Notes techniques sur la chaîne d'outils

This section explains some of the rationale and technical details behind the overall build method. Don't try to immediately understand everything in this section. Most of this information will be clearer after performing an actual build. Come back and re-read this chapter at any time during the build process.

The overall goal of Chapitre 5 and Chapitre 6 is to produce a temporary area containing a set of tools that are known to be good, and that are isolated from the host system. By using the chroot command, the compilations in the remaining chapters will be isolated within that environment, ensuring a clean, trouble-free build of the target LFS system. The build process has been designed to minimize the risks for new readers, and to provide the most educational value at the same time.

This build process is based on cross-compilation. Cross-compilation is normally used to build a compiler and its associated toolchain for a machine different from the one that is used for the build. This is not strictly necessary for LFS, since the machine where the new system will run is the same as the one used for the build. But cross-compilation has one great advantage: anything that is cross-compiled cannot depend on the host environment.

À propos de la compilation croisée

[Note]

Note

The LFS book is not (and does not contain) a general tutorial to build a cross- (or native) toolchain. Don't use the commands in the book for a cross-toolchain for some purpose other than building LFS, unless you really understand what you are doing.

Cross-compilation involves some concepts that deserve a section of their own. Although this section may be omitted on a first reading, coming back to it later will help you gain a fuller understanding of the process.

Let us first define some terms used in this context.

The build

is the machine where we build programs. Note that this machine is also referred to as the « host ».

The host

est la machine ou le système où les programmes seront lancés. Remarquez que nous n'utilisons pas le terme « hôte » de la même manière ici que dans les autres sections.

The target

is only used for compilers. It is the machine the compiler produces code for. It may be different from both the build and the host.

As an example, let us imagine the following scenario (sometimes referred to as « Canadian Cross »). We have a compiler on a slow machine only, let's call it machine A, and the compiler ccA. We also have a fast machine (B), but no compiler for (B), and we want to produce code for a third, slow machine (C). We will build a compiler for machine C in three stages.

Étape Construction Hôte Cible Action
1 A A B Build cross-compiler cc1 using ccA on machine A.
2 A B C Build cross-compiler cc2 using cc1 on machine A.
3 B C C Build compiler ccC using cc2 on machine B.

Then, all the programs needed by machine C can be compiled using cc2 on the fast machine B. Note that unless B can run programs produced for C, there is no way to test the newly built programs until machine C itself is running. For example, to run a test suite on ccC, we may want to add a fourth stage:

Étape Construction Hôte Cible Action
4 C C C Rebuild and test ccC using ccC on machine C.

Dans l'exemple au dessus, seuls cc1 et cc2 sont des compilateurs croisés, c'est à dire qu'ils produisent du code pour une machine différente de celle sur laquelle ils tournent. Les autres compilateurs ccA et ccC produisent du code pour la machine sur laquelle ils tournent. Ces compilateurs sont appelés des compilateurs natifs.

Implémentation de la compilation croisée dans LFS

[Note]

Note

All the cross-compiled packages in this book use an autoconf-based building system. The autoconf-based building system accepts system types in the form cpu-vendor-kernel-os, referred to as the system triplet. Since the vendor field is often irrelevant, autoconf lets you omit it.

An astute reader may wonder why a « triplet » refers to a four component name. The kernel field and the os field began as a single « system » field. Such a three-field form is still valid today for some systems, for example, x86_64-unknown-freebsd. But two systems can share the same kernel and still be too different to use the same triplet to describe them. For example, Android running on a mobile phone is completely different from Ubuntu running on an ARM64 server, even though they are both running on the same type of CPU (ARM64) and using the same kernel (Linux).

Without an emulation layer, you cannot run an executable for a server on a mobile phone or vice versa. So the « system » field has been divided into kernel and os fields, to designate these systems unambiguously. In our example, the Android system is designated aarch64-unknown-linux-android, and the Ubuntu system is designated aarch64-unknown-linux-gnu.

The word « triplet » remains embedded in the lexicon. A simple way to determine your system triplet is to run the config.guess script that comes with the source for many packages. Unpack the binutils sources, run the script ./config.guess, and note the output. For example, for a 32-bit Intel processor the output will be i686-pc-linux-gnu. On a 64-bit system it will be x86_64-pc-linux-gnu. On most Linux systems the even simpler gcc -dumpmachine command will give you similar information.

You should also be aware of the name of the platform's dynamic linker, often referred to as the dynamic loader (not to be confused with the standard linker ld that is part of binutils). The dynamic linker provided by package glibc finds and loads the shared libraries needed by a program, prepares the program to run, and then runs it. The name of the dynamic linker for a 32-bit Intel machine is ld-linux.so.2; it's ld-linux-x86-64.so.2 on 64-bit systems. A sure-fire way to determine the name of the dynamic linker is to inspect a random binary from the host system by running: readelf -l <name of binary> | grep interpreter and noting the output. The authoritative reference covering all platforms is in the shlib-versions file in the root of the glibc source tree.

In order to fake a cross-compilation in LFS, the name of the host triplet is slightly adjusted by changing the "vendor" field in the LFS_TGT variable so it says "lfs". We also use the --with-sysroot option when building the cross-linker and cross-compiler to tell them where to find the needed host files. This ensures that none of the other programs built in Chapitre 6 can link to libraries on the build machine. Only two stages are mandatory, plus one more for tests.

Étape Construction Hôte Cible Action
1 pc pc lfs Build cross-compiler cc1 using cc-pc on pc.
2 pc lfs lfs Build compiler cc-lfs using cc1 on pc.
3 lfs lfs lfs Rebuild and test cc-lfs using cc-lfs on lfs.

In the preceding table, « on pc » means the commands are run on a machine using the already installed distribution. « On lfs » means the commands are run in a chrooted environment.

Now, there is more about cross-compiling: the C language is not just a compiler, but also defines a standard library. In this book, the GNU C library, named glibc, is used (there is an alternative, "musl"). This library must be compiled for the LFS machine; that is, using the cross-compiler cc1. But the compiler itself uses an internal library providing complex subroutines for functions not available in the assembler instruction set. This internal library is named libgcc, and it must be linked to the glibc library to be fully functional. Furthermore, the standard library for C++ (libstdc++) must also be linked with glibc. The solution to this chicken and egg problem is first to build a degraded cc1-based libgcc, lacking some functionalities such as threads and exception handling, and then to build glibc using this degraded compiler (glibc itself is not degraded), and also to build libstdc++. This last library will lack some of the functionality of libgcc.

The upshot of the preceding paragraph is that cc1 is unable to build a fully functional libstdc++ with the degraded libgcc, but cc1 is the only compiler available for building the C/C++ libraries during stage 2. Of course, the compiler built by stage 2, cc-lfs, would be able to build those libraries, but:

  • Generally cc-lfs cannot run on pc (the host distro). Despite the triplets of pc and lfs are compatible to each other, an executable for lfs will depend on glibc-2.36 while the host distro may utilize a different libc implementation (for example, musl) or a previous release of glibc (for example, glibc-2.13).

  • Even if cc-lfs happens to run on pc, using it on pc would create a risk of linking to the pc libraries, since cc-lfs is a native compiler.

So when we build gcc stage 2, we instruct the building system to rebuild libgcc and libstdc++ with cc1, but link libstdc++ to the newly rebuilt libgcc instead of the degraded build. Then the rebuilt libstdc++ will be fully functional.

In Chapitre 8 (or « stage 3 »), all the packages needed for the LFS system are built. Even if a package has already been installed into the LFS system in a previous chapter, we still rebuild the package. The main reason for rebuilding these packages is to make them stable: if we reinstall a LFS package on a complete LFS system, the installed content of the package should be the same as the content of the same package when installed in Chapitre 8. The temporary packages installed in Chapitre 6 or Chapitre 7 cannot satisfy this requirement, because some of them are built without optional dependencies, and autoconf cannot perform some feature checks in Chapitre 6 because of cross-compilation, causing the temporary packages to lack optional features, or use suboptimal code routines. Additionally, a minor reason for rebuilding the packages is to run the test suites.

Other Procedural Details

Le compilateur croisé sera installé dans un répertoire $LFS/tools séparé, comme il ne fera pas partie du système final.

Binutils is installed first because the configure runs of both gcc and glibc perform various feature tests on the assembler and linker to determine which software features to enable or disable. This is more important than one might realize at first. An incorrectly configured gcc or glibc can result in a subtly broken toolchain, where the impact of such breakage might not show up until near the end of the build of an entire distribution. A test suite failure will usually highlight this error before too much additional work is performed.

Binutils installs its assembler and linker in two locations, $LFS/tools/bin and $LFS/tools/$LFS_TGT/bin. The tools in one location are hard linked to the other. An important facet of the linker is its library search order. Detailed information can be obtained from ld by passing it the --verbose flag. For example, $LFS_TGT-ld --verbose | grep SEARCH will illustrate the current search paths and their order. (Note that this example can be run as shown only while logged in as user lfs. If you come back to this page later, replace $LFS_TGT-ld with ld).

The next package installed is gcc. An example of what can be seen during its run of configure is:

checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld

This is important for the reasons mentioned above. It also demonstrates that gcc's configure script does not search the PATH directories to find which tools to use. However, during the actual operation of gcc itself, the same search paths are not necessarily used. To find out which standard linker gcc will use, run: $LFS_TGT-gcc -print-prog-name=ld. (Again, remove the $LFS_TGT- prefix if coming back to this later.)

Detailed information can be obtained from gcc by passing it the -v command line option while compiling a program. For example, $LFS_TGT-gcc -v example.c (or without $LFS_TGT- if coming back later) will show detailed information about the preprocessor, compilation, and assembly stages, including gcc's search paths for included headers and their order.

Next up: sanitized Linux API headers. These allow the standard C library (glibc) to interface with features that the Linux kernel will provide.

Next comes glibc. The most important considerations for building glibc are the compiler, binary tools, and kernel headers. The compiler is generally not an issue since glibc will always use the compiler relating to the --host parameter passed to its configure script; e.g., in our case, the compiler will be $LFS_TGT-gcc. The binary tools and kernel headers can be a bit more complicated. Therefore, we take no risks and use the available configure switches to enforce the correct selections. After the run of configure, check the contents of the config.make file in the build directory for all important details. Note the use of CC="$LFS_TGT-gcc" (with $LFS_TGT expanded) to control which binary tools are used and the use of the -nostdinc and -isystem flags to control the compiler's include search path. These items highlight an important aspect of the glibc package—it is very self-sufficient in terms of its build machinery, and generally does not rely on toolchain defaults.

As mentioned above, the standard C++ library is compiled next, followed in Chapitre 6 by other programs that must be cross-compiled to break circular dependencies at build time. The install step of all those packages uses the DESTDIR variable to force installation in the LFS filesystem.

At the end of Chapitre 6 the native LFS compiler is installed. First binutils-pass2 is built, in the same DESTDIR directory as the other programs, then the second pass of gcc is constructed, omitting some non-critical libraries. Due to some weird logic in gcc's configure script, CC_FOR_TARGET ends up as cc when the host is the same as the target, but different from the build system. This is why CC_FOR_TARGET=$LFS_TGT-gcc is declared explicitly as one of the configuration options.

Upon entering the chroot environment in Chapitre 7, the temporary installations of programs needed for the proper operation of the toolchain are performed. From this point onwards, the core toolchain is self-contained and self-hosted. In Chapitre 8, final versions of all the packages needed for a fully functional system are built, tested, and installed.