Skip to the content.

Embedded Systems Study Group

Basics of Embedded Programming

Working of C compiler

Preprocessors

The C Preprocessor is not a part of the compiler, but is a separate step in the compilation process. In simple terms, a C Preprocessor is just a text substitution tool and it instructs the compiler to do required pre-processing before the actual compilation.

All preprocessor commands begin with a hash symbol (#). It must be the first nonblank character, and for readability, a preprocessor directive should begin in the first column. The following section lists down all the important preprocessor directives. They are often called macros.

#include <stdio.h>

#define AGE 23

int main(int argc, char *argv[])
{
	printf("%d\n", AGE);
}

So, it is as good as writing:

#include <stdio.h>

int main(int argc, char *argv[])
{
	printf("%d\n", 23);
}

23 got copied in place of AGE. below is a example of a function

#include <stdio.h>

#define SUM(x, y) (x + y)

int main(int argc, char *argv[])
{
	int a = 5;
	int b = 10;
	int sum = SUM(a, b);
	printf("%d\n", sum);
}

So, it is as good as writing:

#include <stdio.h>

int main(int argc, char *argv[])
{
	int a = 5;
	int b = 10;
	int sum = (a + b);
	printf("%d\n", sum);
}

GCC also has a functionality such that we can pass values of macros at compile time, using -D parameter to the compiler. Lets consider the following code

#include <stdio.h>

int main(int argc, char *argv[])
{
	printf("%d\n", AGE);
}

Compiling

In this step the source code written in C is converted to its appropriate assembly code. So for an ARM processor the assembly code is different from that of x86 processors.

Compiling is the second step. It takes the output of the preprocessor and generates assembly language, an intermediate human readable language, specific to the target processor.

Generally file format of asm files is .s or .asm, and it varies according to system used.

So, lets convert our code to assembly. So for this we can use -S argument of the gcc compiler. Copy the following code to main.c

#include <stdio.h>

#define AGE 23

int main(int argc, char *argv[])
{
	printf("%d\n", AGE);
}

Use the following command to compile the C code into assembly: gcc -S main.c -o main_assembly.s

You will get the output in the assembly file something like this

	.file	"main.c"
	.text
	.section	.rodata
.LC0:
	.string	"%d\n"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	%edi, -4(%rbp)
	movq	%rsi, -16(%rbp)
	movl	$23, %esi
	leaq	.LC0(%rip), %rdi
	movl	$0, %eax
	call	printf@PLT
	movl	$0, %eax
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (Ubuntu 9.3.0-10ubuntu2) 9.3.0"
	.section	.note.GNU-stack,"",@progbits
	.section	.note.gnu.property,"a"
	.align 8
	.long	 1f - 0f
	.long	 4f - 1f
	.long	 5
0:
	.string	 "GNU"
1:
	.align 8
	.long	 0xc0000002
	.long	 3f - 2f
2:
	.long	 0x3
3:
	.align 8
4:

Assembly

Assembly is the third step of compilation. The assembler will convert the assembly code into pure binary code or machine code (zeros and ones). This code is also known as object code.

So, now we generated assembly code in previous step, now we will compile it into a form that CPU can understand that is binary. We can use -c command line argument to generate a binary from asm.

So, run the following command: gcc -c main_assembly.s -o main.o

If you try to run main.o it won’ run, the reason being we haven’t yet completed the compilation process.

Linking

Linking is the final step of compilation. The linker merges all the object code from multiple modules into a single one. If we are using a function from libraries, linker will link our code with that library function code.

In static linking, the linker makes a copy of all used library functions to the executable file. In dynamic linking, the code is not copied, it is done by just placing the name of the library in the binary file.

In layman terms, gcc does something very smart, since there can be thousands of .c files in a program, what it does is compile each .c file into a object (.o) file, and then joins the various .o files to make a executable.

So, in the above program we only have a single .c file then why did we need to link it, reason being, we have used stdio, which has printf function, this function is defined in some source file somewhere in the system, so to use printf we need to link the object file of that source file with ours. We do so by using the following command, inshort compile it using gcc, inshort generates a binary executable: gcc main.o -o main

Now, we can run the generated binary: ./main and then output will be 23.

Header files

header files are simply files in which you can declare your own functions that you can use in your main program or these can be used while writing large C programs.

Object and source files in C: .o and .c

Brief overview of GNU Make and CMake build systems

GNU Make

detailed tutorial: https://opensource.com/article/18/8/what-how-makefile

target: prerequisites
<TAB> recipe
say_hello:
        echo "Hello World"

command will not be displayed

say_hello:
        @echo "Hello World"
.PHONY = clean

Added the clean task as a phony task

main: library main
		@echo "linking main.o and library.o and generating binary"
		gcc -o library library.o main.o

main is the task name, and it is the default task. library and main are prerequisite tasks, need to be run before we can run this task. Since generating a binary needs .o files, so it will call the respective tasks which will generate .o files for the given .c files.

library: library.c
		@echo "compiling library.c into .object file"
		gcc -c library.c
main: main.c
		@echo "compiling main.c into .object file"
		gcc -c main.c

library and main tasks generates .o files, as we can see the command is gcc -c main.c

clean: 
		@echo "cleaning build files"
		rm main.o library.o library

This task is called by make clean and cleans all the generated .o and binary files

Following is the code for compiling the code:

.PHONY = clean

main: library.o main.o
		@echo "linking main.o and library.o and generating binary"
		gcc -o library library.o main.o

library.o: library.c
		@echo "compiling library.c into .object file"
		gcc -c library.c

main.o: main.c
		@echo "compiling main.c into .object file"
		gcc -c main.c

clean: 
		@echo "cleaning build files"
		rm main.o library.o library

Ninja

Ninja is another tool used for generating executable files according to the rules defined in corresponding CMake file. The main purpose behind the creation of Ninja is to increase the speed and improve change monitoring methods while generating executable files from large projects(It is reported to be much faster than good old Make for many common scenarios). At present, it is used by many popular open source projects such as Google Chrome, LLVM, Android, etc.

Basic specifications :

Ninja evaluates a graph of dependencies between files, and runs whichever commands are necessary to make your build target up to date as determined by file modification times. Conceptually, build statements describe the dependency graph of your project, while rule statements describe how to generate the files along a given edge of the graph.

Here’s a basic .ninja file that demonstrates most of the syntax.

cflags = -Wall

rule cc
  command = gcc $cflags -c $in -o $out

build foo.o: cc foo.c

Despite the non-goal of being convenient to write by hand, to keep build files readable (debuggable), Ninja supports declaring shorter reusable names for strings. A declaration like the following

cflags = -g

can be used on the right side of an equals sign, dereferencing it with a dollar sign, like this :

rule cc
  command = gcc $cflags -c $in -o $out

Variables can also be referenced using curly braces like ${in}.

Rules declare a short name for a command line. They begin with a line consisting of the rule keyword and a name for the rule. Then follows an indented set of variable = value lines.

The basic example above declares a new rule named cc, along with the command to run. In the context of a rule, the command variable defines the command to run, $in expands to the list of input files (foo.c), and $out to the output files (foo.o) for the command.

Build statements declare a relationship between input and output files. They begin with the build keyword, and have the format build outputs: rulename inputs. Such a declaration says that all of the output files are derived from the input files. When the output files are missing or when the inputs change, Ninja will run the rule to regenerate the outputs.

The basic example above describes how to build foo.o, using the cc rule.

A build statement may be followed by an indented set of key = value pairs, much like a rule. These variables will shadow any variables when evaluating the variables in the command. For example:

cflags = -Wall -Werror
rule cc
  command = gcc $cflags -c $in -o $out

# If left unspecified, builds get the outer $cflags.
build foo.o: cc foo.c

# But you can shadow variables like cflags for a particular build.
build special.o: cc special.c
  cflags = -Wall

# The variable was only shadowed for the scope of special.o;
# Subsequent build lines get the outer (original) cflags.
build bar.o: cc bar.c

Lets create the .ninja file for generating executable file from the same folder used in Make’s example.

rule cc
  command = gcc -c $in -I./include/ -o $out

rule ll
  command = gcc -o $out $in 

rule clean 
  command = rm -rf $in .ninja_deps .ninja_log

We start by defining required rules. Below is a short explanation for each rule :-

  1. cc : Used for generating .o files from .c files. Note that this command is also using the location of header files.

  2. ll : Used for linking .o files

  3. clean : Used for deleting all generated files.

build library.o: cc library.c

build main.o: cc main.c

build library: ll library.o main.o

build clean_files: clean library library.o main.o

Once all basic rules are defined, we move ahead by using them in build statements as per our convenience. Below is a short explanation for each build statement :-

  1. build library.o : Generates library.o from library.c using ‘cc’ rule.

  2. build main.o : Generated main.o from main.c using ‘cc’ rule.

  3. build library : Generates the executable ‘library’ after linking ‘main.o’ and ‘library.o’ using ‘ll’ rule.

  4. build clean_files : Deletes all generated files using ‘clean’ rule.

The complete build.ninja file for generating required executable is :

rule cc
  command = gcc -c $in -I./include/ -o $out

rule ll
  command = gcc -o $out $in 

rule clean 
  command = rm -rf $in .ninja_deps .ninja_log

build library.o: cc library.c

build main.o: cc main.c

build library: ll library.o main.o

build clean_files: clean library library.o main.o

For more information, you can refer the official manual here.

Note :

There are many other build file(s) generating tools apart from Make and Ninja such as Boost’s b2 (Boost.build), SCons, etc. which have many similarities and would be easier to use once you learn related basics. For the sake of brevity, we have only included the most popular build systems and you are encouraged to explore others.

CMake

CMake is an extensible, open-source system that manages the build process in an operating system and in a compiler-independent manner.

Content

Well Cmake is a build system generator which is used to generate projects over different platforms whether its be Linux, MacOS or Windows. Cmake is known as build system generator because it can generate projects using different available compilers like GCC , Clang and MSVC. CMake is able to do so because it has its own domain specific language (DSL) which allows us to generate platform-native build systems with the same set of CMake scripts. CMake scripts are always written in a file named as CMakeLists.txt . The CMake software toolset gives developers full control over the whole life cycle of a given project:

Just like GNU Make had a file where can describe the steps of compilation, similarly we have to write a cmake file. We will use cmake to compile code in the above example.

Download the following zip, extract it and cd into the folder.

Following is the CMakeLists.txt present in the folder

# Specify the minimum version for CMake
cmake_minimum_required(VERSION 2.8)

# Project's name
project(hello)

# Set the output folder where your program will be created
set(CMAKE_BINARY_DIR ${CMAKE_SOURCE_DIR}/bin)
set(EXECUTABLE_OUTPUT_PATH ${CMAKE_BINARY_DIR})

# The following folder will be included
include_directories("${PROJECT_SOURCE_DIR}/include")

# add the files which are needed to generate the binary and also the name of the binary
add_executable(library ${PROJECT_SOURCE_DIR}/main.c ${PROJECT_SOURCE_DIR}/string_add.c ${PROJECT_SOURCE_DIR}/library.c)

Now to compile we follow the following steps:

Using Make :

mkdir build
cd build
cmake ..
make

Using Ninja :

mkdir build
cd build
cmake -G Ninja .. 
ninja

Now you can see in the root directory that a /bin folder has been generated, it contains the generated binary. Run and see it work.

Memory allocation in C

Static

Static allocation is what happens when you declare a static or global variable. Each static or global variable defines one block of space, of a fixed size. The space is allocated once, when your program is started (part of the exec operation), and is never freed.

#include<stdio.h> 
int fun() 
{ 
  static int count = 0; 
  count++; 
  return count; 
} 
   
int main() 
{ 
  printf("%d ", fun()); 
  printf("%d ", fun()); 
  return 0; 
}

If you run this program, you will see something unexpected.

Automatic

Automatic allocation happens when you declare an automatic variable, such as a function argument or a local variable. The space for an automatic variable is allocated when the compound statement containing the declaration is entered, and is freed when that compound statement is exited.

Example of this allocation is when we declare the size of an array during run time. Inshort compiler doesn’t know the actual size of array during compile time

Note that that the size of the array is declared (and known) before the declaration of the array:

#include <stdio.h>

int main()
{
    int number_of_elems = 0;
    printf("enter number of elems: ");
    scanf("%d", &number_of_elems);
    
    char arr[number_of_elems];
    
    printf("size: %lu", sizeof(arr));
    return 0;
}

The compiler now inserts assembler code to reserve space on the stack for the array, something like (pseudo assembler):

add sp, number_of_elems*sizeof_int

Dynamic

Dynamic memory allocation is a technique in which programs determine as they are running where to store some information. You need dynamic allocation when the amount of memory you need, or how long you continue to need it, depends on factors that are not known before the program runs.

For example, you may need a block to store a line read from an input file; since there is no limit to how long a line can be, you must allocate the memory dynamically and make it dynamically larger as you read more of the line.

Or, you may need a block for each record or each definition in the input data; since you can’t know in advance how many there will be, you must allocate a new block for each record or definition as you read it.

When you use dynamic allocation, the allocation of a block of memory is an action that the program requests explicitly. You call a function or macro when you want to allocate space, and specify the size with an argument. If you want to free the space, you do so by calling another function or macro. You can do these things whenever you want, as often as you want.

Dynamic allocation is not supported by C variables; there is no storage class “dynamic”, and there can never be a C variable whose value is stored in dynamically allocated space. The only way to get dynamically allocated memory is via a system call (which is generally via a GNU C Library function call), and the only way to refer to dynamically allocated space is through a pointer. Because it is less convenient, and because the actual process of dynamic allocation requires more computation time, programmers generally use dynamic allocation only when neither static nor automatic allocation will serve.

For example, if you want to allocate dynamically some space to hold a struct foobar, you cannot declare a variable of type struct foobar whose contents are the dynamically allocated space. But you can declare a variable of pointer type struct foobar * and assign it the address of the space. Then you can use the operators ‘*’ and ‘->’ on this pointer variable to refer to the contents of the space:

{
  struct foobar *ptr
     = (struct foobar *) malloc (sizeof (struct foobar));
  ptr->name = x;
  ptr->next = current_foobar;
  current_foobar = ptr;
}

Assignment