How ILMA Compiles: From .ilma to Native Binary

When you type ilma build hello.ilma, a chain of transformations happens in under a second that takes your friendly English-like source code and produces a native binary ready to run on your machine. This post walks through every stage of that pipeline in detail.

The pipeline at a glance

diagram

source.ilma
    │
    ▼
[ Lexer ] ──── tokens
    │
    ▼
[ Parser ] ──── AST (Abstract Syntax Tree)
    │
    ▼
[ Resolver ] ── scope / name resolution
    │
    ▼
[ Transpiler ] ─ C source code
    │
    ▼
[ GCC / Clang ] native binary

Stage 1: Lexing

The lexer reads your ILMA source file character by character and groups them into tokens — the atomic units of meaning in the language. A token has a type (keyword, identifier, number, string, operator, newline) and a value.

For example, the line remember age = 12 is tokenised as:

token stream

KEYWORD("remember")
IDENTIFIER("age")
OP("=")
NUMBER(12)
NEWLINE

The lexer handles ILMA's multi-word keywords — give back, for each, keep going while, when wrong, comes from — by looking ahead when it sees the first word. This is a deliberate design choice: it means the grammar stays clean and unambiguous even though keywords look like phrases.

Indentation is significant in ILMA, so the lexer emits INDENT and DEDENT tokens (similar to Python) whenever the indentation level changes. This avoids needing curly braces to denote blocks.

Stage 2: Parsing

The parser takes the token stream and builds an Abstract Syntax Tree — a tree structure that represents the logical structure of your program, independent of the exact syntax used.

ILMA's parser is a hand-written recursive descent parser. Each grammar rule has a corresponding function. Here's what the AST looks like for a simple if statement:

ilma source

if score >= 90:
    say "Excellent!"
otherwise:
    say "Try harder."

AST (pseudo-JSON)

{
  "type": "IfStatement",
  "condition": {
    "type": "BinaryExpr",
    "op": ">=",
    "left":  { "type": "Identifier", "name": "score" },
    "right": { "type": "NumberLiteral", "value": 90 }
  },
  "body": [
    { "type": "SayStatement", "expr": { "type": "StringLiteral", "value": "Excellent!" } }
  ],
  "otherwise": [
    { "type": "SayStatement", "expr": { "type": "StringLiteral", "value": "Try harder." } }
  ]
}

Errors caught at the parse stage produce friendly messages that explain the problem in plain language — designed to be understandable by a child learning to code, not just an experienced programmer.

Stage 3: Name Resolution

Before generating code, the compiler walks the AST to resolve all names. It verifies that every variable referenced has been declared, every recipe called exists, and every give back occurs inside a recipe body. This stage catches a large class of common mistakes early with helpful error messages.

ILMA uses lexical scoping: each block (recipe body, loop body, if body) introduces a new scope. Inner scopes can read variables from outer scopes, but inner assignments create local variables by default — consistent with how most modern languages work.

Stage 4: C Transpilation

This is the most distinctive part of ILMA's architecture. Rather than generating machine code directly, the compiler transpiles the AST to C source code. This was a deliberate choice that provides several advantages:

Portability: Any platform with a C compiler (GCC, Clang, MSVC) can compile ILMA programs.
Performance: Modern C compilers apply years of optimisation work that would take a new language decades to replicate.
Correctness: We can rely on the C standard library for memory management, I/O and maths — battle-tested code we don't need to rewrite.
Transparency: Advanced learners can inspect the generated C code and see the direct translation of their ILMA constructs.

Here's what a simple ILMA recipe transpiles to:

ilma input

recipe square(n):
    give back n * n

say square(7)

generated C

#include <stdio.h>
#include <stdlib.h>
#include "ilma_runtime.h"

IlmaValue ilma_square(IlmaValue n) {
    return ilma_mul(n, n);
}

int main(void) {
    IlmaValue _v0 = ilma_square(ilma_number(7));
    ilma_say(_v0);
    return 0;
}

ILMA values are boxed in a tagged union type IlmaValue that can hold numbers, strings, booleans, lists and dictionaries. The runtime library (ilma_runtime.h and ilma_runtime.c) provides the implementation of all built-in operations.

Stage 5: Compilation with GCC

Once the C file is written to a temporary directory, the compiler shells out to gcc (or clang if GCC is not found) to produce the final binary:

internal command

gcc -O2 -o hello /tmp/ilma_build_abc123/hello.c \
    /usr/local/lib/ilma/ilma_runtime.c \
    -lm -I/usr/local/include/ilma

The -O2 flag enables standard optimisations — loop unrolling, constant propagation, inlining. For production builds, ilma build --release uses -O3 -flto for maximum performance.

The REPL

The ILMA REPL (ilma repl) uses a slightly different path: instead of producing a binary, it compiles each entered statement to a shared object (.so) and dynamically loads it. This gives true interactive compilation with sub-millisecond latency for simple expressions, which is essential for a language aimed at learners who expect immediate feedback.

What's next in the compiler

The current compiler is functional but has room to grow. Planned improvements include:

Type inference — automatically inferring types where possible to produce tighter C code
Incremental compilation — only recompiling changed modules
WebAssembly output — so ILMA programs can run in the browser natively (not just in the JS interpreter)
Debug info — DWARF symbols so gdb and VS Code debuggers show ILMA source lines, not C

If you are interested in contributing to the compiler, the source is on GitHub. The codebase is written in C with no dependencies outside the standard library.

Written by Raihan • March 2026

Install ILMA More Articles