The Executable and Linkable Format is a binary file format used to hold information about executables and shared libraries, and can be output by many compilers. It’s also very easy to parse and can be output by gcc and clang. Thusly, it’s a great choice for the MosquitOS application binary format!
This is an article I originally wrote back in 2013 (while in high school… yuck) that I was able to recover from backups. It’s up here for posterity (I really have been obsessed with osdev for way too long!) but you’ll likely notice some broken links and other bitrot.
You probably shouldn’t take anything written in this post seriously, and assume it’s even less correct than my new stuff. Comments are disabled, but feel free to contact me if you have corrections for dead links.
With an operating system kernel, the need to load executable data separate from the kernel binary quickly becomes apparent. The decision to create a new kind of executable format might seem like a logical idea, but with myriad of different binary formats out there, why re-invent uniform tetrahedral polyhedra?
Not only does a new binary format provide the opportunity for mistakes to appear in code, it also causes complications with compilers, as they need to be modified to support this new format; or worse, yet, an additional tool added to the build process to convert from an already supported file format.
That’s where the Executable and Linkable Format file format comes in. Originally appearing in Solaris 2.0, (also known as SunOS 5.0) ELF quickly replaced many other contemporary binary formats, eventually being adopted as the standard executable format on x86.
ELF File Format
One of the great things about ELF is that it isn’t dependant on any single platform, architecture or operating system. A file produced on an x86 machine will be readable by a PowerPC machine, or even a Motorola 68k machine1.
It’s also standardised to a relatively large extent and easy to parse, which makes it great for use in an operating system. The fact that gcc, clang and many other compilers can output it is a nice bonus.
Header
The header of an ELF file contains a few useful fields that tells us the offset
from the start to the file to the program header table (ph_offset
) and the
section header table (sh_offset
), which makes it real easy for code to begin
traversing these structures. In addition, the number of structures in each table
(ph_entry_count
and sh_entry_count
) are specified.
Program Headers
Program headers are essentially a list of commands that tell the binary loader where to load sections of the binary in virtual address space. According to the System V ABI, they are defined as follows1:
1
2
3
4
5
6
7
8
9
10
typedef struct {
unsigned int p_type;
unsigned int p_offset;
unsigned int p_vaddr;
unsigned int p_paddr;
unsigned int p_filesz;
unsigned int p_memsz;
unsigned int p_flags;
unsigned int p_align;
} Elf32_Phdr;
p_type
specifies the type, or action, that is supposed to be performed in
response to this header. A value of 0 means that this header should be ignored,
while a value of 0x0001
indicates a LOAD command. This means that p_filesz
should be loaded from offset p_offset
in the file to virtual memory location
p_vaddr.
If p_memsz
is greater than p_filesz,
the remaining memory space
should be filled with zeros. This is often the case with a .bss
section, which
contains uninitialized variables and should be cleared.
readelf
A handy tool to view the program headers in an ELF file is elftool
, a part of
GNU binutils. For example, to view the program headers in an ELF file called
TEST.ELF
, one would simply type the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ greadelf -l TEST.ELF
Elf file type is EXEC (Executable file)
Entry point 0x8000000
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x08000000 0x08000000 0x00ae9 0x00ae9 R E 0x1000
LOAD 0x002000 0x08001000 0x08001000 0x0029c 0x01008 RW 0x1000
Section to Segment mapping:
Segment Sections...
00 .text .init .fini .text.startup
01 .rodata .rodata.str1.1 .eh_frame .rodata.str1.4 .bss
Section Headers
Much like program headers, section headers are a listing of all available sections in the file, including where they should be loaded, size, offset, and (optionally) a name. Again, the System V ABI tells us that a section header is defined as follows:
1
2
3
4
5
6
7
8
9
10
11
12
typedef struct {
unsigned int sh_name;
unsigned int sh_type;
unsigned int sh_flags;
unsigned int sh_addr;
unsigned int sh_offset;
unsigned int sh_size;
unsigned int sh_link;
unsigned int sh_info;
unsigned int sh_addralign;
unsigned int sh_entsize;
} Elf32_Shdr;
The definitions of the fields are similar to the program header, save for flags,
types and the like. However, the interesting field in most cases is sh_name
.
This field contains an offset into the ELF file’s string table (which can be
located by looking for a section header with the type SHT_STRTAB
or by using
the sh_str_index
field of the header), which holds several zero-terminated
strings. In the section dump below, the string table can be identified as being
section number 13, with the name .shstrtab
and of type STRTAB
. A hex dump
reveals the following:
1
2
3
4
5
6
7
8
9
0x00004a27 00 2e 73 79 6d 74 61 62 00 2e 73 74 72 74 61 62 |..symtab..strtab|
0x00004a37 00 2e 73 68 73 74 72 74 61 62 00 2e 74 65 78 74 |..shstrtab..text|
0x00004a47 00 2e 69 6e 69 74 00 2e 66 69 6e 69 00 2e 74 65 |..init..fini..te|
0x00004a57 78 74 2e 73 74 61 72 74 75 70 00 2e 72 6f 64 61 |xt.startup..roda|
0x00004a67 74 61 00 2e 72 6f 64 61 74 61 2e 73 74 72 31 2e |ta..rodata.str1.|
0x00004a77 31 00 2e 65 68 5f 66 72 61 6d 65 00 2e 72 6f 64 |1..eh_frame..rod|
0x00004a87 61 74 61 2e 73 74 72 31 2e 34 00 2e 62 73 73 00 |ata.str1.4..bss.|
0x00004a97 2e 73 74 61 62 00 2e 63 6f 6d 6d 65 6e 74 00 2e |.stab..comment..|
0x00004aa7 73 74 61 62 73 74 72 00 |stabstr.|
Most of the time, these section headers won’t be of much importance when just executing a program. They can come in handy, however, to locate debug information and for dynamic linking.
readelf output
As with program headers, the readelf tool can be used to output information
about sections. If we continue using the hypothetical TEST.ELF
file, the
output may look like the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ greadelf -S ~/SquelchenOS/TEST.ELF
There are 16 section headers, starting at offset 0x4ab0:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 08000000 001000 000abb 00 AX 0 0 4096
[ 2] .init PROGBITS 08000abb 001abb 000004 00 AX 0 0 1
[ 3] .fini PROGBITS 08000abf 001abf 000004 00 AX 0 0 1
[ 4] .text.startup PROGBITS 08000ad0 001ad0 000019 00 AX 0 0 16
[ 5] .rodata PROGBITS 08001000 002000 000124 00 WA 0 0 4096
[ 6] .rodata.str1.1 PROGBITS 08001124 002124 000015 01 AMS 0 0 1
[ 7] .eh_frame PROGBITS 0800113c 00213c 000110 00 A 0 0 4
[ 8] .rodata.str1.4 PROGBITS 0800124c 00224c 000050 01 AMS 0 0 4
[ 9] .bss NOBITS 08002000 00229c 000008 00 WA 0 0 4096
[10] .stab PROGBITS 00000000 00229c 001b60 0c 12 0 4
[11] .comment PROGBITS 00000000 003dfc 000011 01 MS 0 0 1
[12] .stabstr STRTAB 00000000 003e0d 000c1a 00 0 0 1
[13] .shstrtab STRTAB 00000000 004a27 000088 00 0 0 1
[14] .symtab SYMTAB 00000000 004d30 0001f0 10 15 21 4
[15] .strtab STRTAB 00000000 004f20 00008e 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Note the output of flags, which indicate whether the section needs to be allocated memory (A), should be executable (X), writeable (W), as well as indicating if it should be merged into another section (M).
MosquitOS and ELF files
Really, the point of this post is just so I could organise my thoughts and have a nice convenient document to refer to in my later adventures, but of course, it wouldn’t be complete if I didn’t ramble on about how MosquitOS handles ELF files2.
ELF Loading
During ELF parsing, sections are enumerated and several key sections are located
for later use, including .text
, which contains the application’s executable
code. Then, the program header table is traversed.
This traversal happens in two passes: The first, which is done in the ELF parser, builds a virtual memory map and associates an in-file offset with it. Any superfluous sections are ignored and any program headers that aren’t LOAD commands are ignored as well.
When a task is created from an in-memory parsed ELF, the second pass happens,
which creates a page table for the process. If a section is aligned to a page
boundary, as it should be when the default linker script is used, the kernel
directly maps the page to the ELF file. However, if there’s an unaligned
section, or the section has the PF_W
flag, it will be copied into a separate
kernel memory section that is kept track of in the task structure.
During this process, the system’s security policy can cause an error to be raised when a section that is both writeable and executable is to be created.
The ELF file and associated structures are kept in memory while the process is executing, which allows for debugging symbols to be used by the kernel debugger module when a process faults or a breakpoint is reached. When a process is terminated, all associated resources are released so another process can use the memory.
Screenshots!
Because ELF file parsing is so particularly fascinating, have a screenshot of what the MosquitOS parser with debugging code outputs.
See the System V ABI for lots of fun information about the ELF format. ↩ ↩2