First the specifics, then the curiosity

This text was first written in Polish, then machine-translated to English.

Bottom-up thinking - Moving from the visible effect to the mechanisms underneath. People call this bottom-up thinking or mechanical curiosity. For example, I see C++ turning into binary, and then I wonder what's really going on under the hood. There's another way to understand it: bottom-up thinking = proof, top-down thinking = hypothesis.

I was fascinated by what computers could do, and when I watched "Hackers" as a kid, I simply wanted to understand how these machines worked.

I put together a few simple Delphi programs. They worked just fine; some of them are still on my blog. Back then, I was incredibly excited to understand physics, math, and chemistry. I dreamed of a great career, and honestly, I still have some of those dreams.

The first program that actually did anything was Ryzyk Fizyk. It was used to spout out formulas for simple physics problems. I showed it to the school, and the principal immediately sensed some potential. And that's how I ended up with her invitation to the Junior High School Olympiad in Informatics.

I solved a few problems in an ancient language called Object Pascal. The fact that the language was object-oriented didn't hinder my problem-solving. What did was the syntax and lack of wider acceptance in the computer science community.

In 2008, during the summer, I went on vacation with some computer scientists. I was then introduced to C++. I started wondering how code is translated from a high-level language like C++ into binary code, meaning it's understood by a computer.

Then the idea of creating my own compiler was born. There were quite a few people there who wanted to create something like that, or were already involved in it. One idea I really liked was GLI - Goovno Language Interpreter.

Compiler Operation

A compiler, like many other programs, has input and output. A program in a high-level programming language is fed to the program's standard input, and the output is usually machine code. There are also transpilers, programs that translate code from one programming language to another.

Today, we'll delve into how the compilation process works. We'll take a simple program and write a simple compiler that will hopefully spit out a finished executable file. We'll learn how a text file is decomposed into a tree, and then we'll transform that tree into an executable file.

Decomposing a File

The first step in building a compiler is tokenization. Normally, we'd do this using a program like flex, but here we'll simplify things and move straight to building a syntax tree.

AST

The basis of a compiler's operation is the AST (Abstract Syntax Tree). It describes how a source file is constructed. By traversing this tree, we can perform various tasks using algorithms (DFS, checking for well-declared variables, generating assembly code, checking data consistency, etc.).

We'll build a simple syntax tree using the Recursive Descent algorithm. There are more advanced tree building methods (LALR) that allow for the creation of syntax trees in more advanced programming languages, but we'll focus on the simplest variant and implement the simplest non-trivial subset of the C language.

Our compiler will scan the data stream from the beginning of the file to the end. (In the meantime, it will likely use a stack and a few other data structures.)

To be continued...