Home MosquitOS and ELF files
Post
Cancel

MosquitOS and ELF files

The Executable and Linkable Format is a binary file format used to hold information about executables and shared libraries, and can be output by many compilers. It’s also very easy to parse and can be output by gcc and clang. Thusly, it’s a great choice for the MosquitOS application binary format!

This is an article I originally wrote back in 2013 (while in high school… yuck) that I was able to recover from backups. It’s up here for posterity (I really have been obsessed with osdev for way too long!) but you’ll likely notice some broken links and other bitrot.

You probably shouldn’t take anything written in this post seriously, and assume it’s even less correct than my new stuff. Comments are disabled, but feel free to contact me if you have corrections for dead links.

With an operating system kernel, the need to load executable data separate from the kernel binary quickly becomes apparent. The decision to create a new kind of executable format might seem like a logical idea, but with myriad of different binary formats out there, why re-invent uniform tetrahedral polyhedra?

Not only does a new binary format provide the opportunity for mistakes to appear in code, it also causes complications with compilers, as they need to be modified to support this new format; or worse, yet, an additional tool added to the build process to convert from an already supported file format.

That’s where the Executable and Linkable Format file format comes in. Originally appearing in Solaris 2.0, (also known as SunOS 5.0) ELF quickly replaced many other contemporary binary formats, eventually being adopted as the standard executable format on x86.

ELF File Format

One of the great things about ELF is that it isn’t dependant on any single platform, architecture or operating system. A file produced on an x86 machine will be readable by a PowerPC machine, or even a Motorola 68k machine1.

It’s also standardised to a relatively large extent and easy to parse, which makes it great for use in an operating system. The fact that gcc, clang and many other compilers can output it is a nice bonus.

The header of an ELF file contains a few useful fields that tells us the offset from the start to the file to the program header table (ph_offset) and the section header table (sh_offset), which makes it real easy for code to begin traversing these structures. In addition, the number of structures in each table (ph_entry_count and sh_entry_count) are specified.

Program Headers

Program headers are essentially a list of commands that tell the binary loader where to load sections of the binary in virtual address space. According to the System V ABI, they are defined as follows1:

1
2
3
4
5
6
7
8
9
10
typedef struct {
	unsigned int p_type;
	unsigned int p_offset;
	unsigned int p_vaddr;
	unsigned int p_paddr;
	unsigned int p_filesz;
	unsigned int p_memsz;
	unsigned int p_flags;
	unsigned int p_align;
} Elf32_Phdr;

p_type specifies the type, or action, that is supposed to be performed in response to this header. A value of 0 means that this header should be ignored, while a value of 0x0001 indicates a LOAD command. This means that p_filesz should be loaded from offset p_offset in the file to virtual memory location p_vaddr. If p_memsz is greater than p_filesz, the remaining memory space should be filled with zeros. This is often the case with a .bss section, which contains uninitialized variables and should be cleared.

readelf

A handy tool to view the program headers in an ELF file is elftool, a part of GNU binutils. For example, to view the program headers in an ELF file called TEST.ELF, one would simply type the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ greadelf -l TEST.ELF
Elf file type is EXEC (Executable file)
Entry point 0x8000000
There are 2 program headers, starting at offset 52

Program Headers:
	Type		Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
	LOAD		0x001000 0x08000000 0x08000000 0x00ae9 0x00ae9 R E 0x1000
	LOAD		0x002000 0x08001000 0x08001000 0x0029c 0x01008 RW  0x1000

 Section to Segment mapping:
   Segment Sections...
 	00     .text .init .fini .text.startup
 	01     .rodata .rodata.str1.1 .eh_frame .rodata.str1.4 .bss

Section Headers

Much like program headers, section headers are a listing of all available sections in the file, including where they should be loaded, size, offset, and (optionally) a name. Again, the System V ABI tells us that a section header is defined as follows:

1
2
3
4
5
6
7
8
9
10
11
12
typedef struct {
	unsigned int sh_name;
	unsigned int sh_type;
	unsigned int sh_flags;
	unsigned int sh_addr;
	unsigned int sh_offset;
	unsigned int sh_size;
	unsigned int sh_link;
	unsigned int sh_info;
	unsigned int sh_addralign;
	unsigned int sh_entsize;
} Elf32_Shdr;

The definitions of the fields are similar to the program header, save for flags, types and the like. However, the interesting field in most cases is sh_name. This field contains an offset into the ELF file’s string table (which can be located by looking for a section header with the type SHT_STRTAB or by using the sh_str_index field of the header), which holds several zero-terminated strings. In the section dump below, the string table can be identified as being section number 13, with the name .shstrtab and of type STRTAB. A hex dump reveals the following:

1
2
3
4
5
6
7
8
9
0x00004a27  00 2e 73 79 6d 74 61 62  00 2e 73 74 72 74 61 62  |..symtab..strtab|
0x00004a37  00 2e 73 68 73 74 72 74  61 62 00 2e 74 65 78 74  |..shstrtab..text|
0x00004a47  00 2e 69 6e 69 74 00 2e  66 69 6e 69 00 2e 74 65  |..init..fini..te|
0x00004a57  78 74 2e 73 74 61 72 74  75 70 00 2e 72 6f 64 61  |xt.startup..roda|
0x00004a67  74 61 00 2e 72 6f 64 61  74 61 2e 73 74 72 31 2e  |ta..rodata.str1.|
0x00004a77  31 00 2e 65 68 5f 66 72  61 6d 65 00 2e 72 6f 64  |1..eh_frame..rod|
0x00004a87  61 74 61 2e 73 74 72 31  2e 34 00 2e 62 73 73 00  |ata.str1.4..bss.|
0x00004a97  2e 73 74 61 62 00 2e 63  6f 6d 6d 65 6e 74 00 2e  |.stab..comment..|
0x00004aa7  73 74 61 62 73 74 72 00                           |stabstr.|

Most of the time, these section headers won’t be of much importance when just executing a program. They can come in handy, however, to locate debug information and for dynamic linking.

readelf output

As with program headers, the readelf tool can be used to output information about sections. If we continue using the hypothetical TEST.ELF file, the output may look like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ greadelf -S ~/SquelchenOS/TEST.ELF
There are 16 section headers, starting at offset 0x4ab0:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        08000000 001000 000abb 00  AX  0   0 4096
  [ 2] .init             PROGBITS        08000abb 001abb 000004 00  AX  0   0  1
  [ 3] .fini             PROGBITS        08000abf 001abf 000004 00  AX  0   0  1
  [ 4] .text.startup     PROGBITS        08000ad0 001ad0 000019 00  AX  0   0 16
  [ 5] .rodata           PROGBITS        08001000 002000 000124 00  WA  0   0 4096
  [ 6] .rodata.str1.1    PROGBITS        08001124 002124 000015 01 AMS  0   0  1
  [ 7] .eh_frame         PROGBITS        0800113c 00213c 000110 00   A  0   0  4
  [ 8] .rodata.str1.4    PROGBITS        0800124c 00224c 000050 01 AMS  0   0  4
  [ 9] .bss              NOBITS          08002000 00229c 000008 00  WA  0   0 4096
  [10] .stab             PROGBITS        00000000 00229c 001b60 0c     12   0  4
  [11] .comment          PROGBITS        00000000 003dfc 000011 01  MS  0   0  1
  [12] .stabstr          STRTAB          00000000 003e0d 000c1a 00      0   0  1
  [13] .shstrtab         STRTAB          00000000 004a27 000088 00      0   0  1
  [14] .symtab           SYMTAB          00000000 004d30 0001f0 10     15  21  4
  [15] .strtab           STRTAB          00000000 004f20 00008e 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Note the output of flags, which indicate whether the section needs to be allocated memory (A), should be executable (X), writeable (W), as well as indicating if it should be merged into another section (M).

MosquitOS and ELF files

Really, the point of this post is just so I could organise my thoughts and have a nice convenient document to refer to in my later adventures, but of course, it wouldn’t be complete if I didn’t ramble on about how MosquitOS handles ELF files2.

ELF Loading

During ELF parsing, sections are enumerated and several key sections are located for later use, including .text, which contains the application’s executable code. Then, the program header table is traversed.

This traversal happens in two passes: The first, which is done in the ELF parser, builds a virtual memory map and associates an in-file offset with it. Any superfluous sections are ignored and any program headers that aren’t LOAD commands are ignored as well.

When a task is created from an in-memory parsed ELF, the second pass happens, which creates a page table for the process. If a section is aligned to a page boundary, as it should be when the default linker script is used, the kernel directly maps the page to the ELF file. However, if there’s an unaligned section, or the section has the PF_W flag, it will be copied into a separate kernel memory section that is kept track of in the task structure.

During this process, the system’s security policy can cause an error to be raised when a section that is both writeable and executable is to be created.

The ELF file and associated structures are kept in memory while the process is executing, which allows for debugging symbols to be used by the kernel debugger module when a process faults or a breakpoint is reached. When a process is terminated, all associated resources are released so another process can use the memory.

Screenshots!

Because ELF file parsing is so particularly fascinating, have a screenshot of what the MosquitOS parser with debugging code outputs.

Screenshot showing sections in an ELF file, as parsed by the MosquitOS kernel

  1. See the System V ABI for lots of fun information about the ELF format.  2

  2. See the ELF format parser in MosquitOS here. 

This post is licensed under CC BY 4.0 .