Skip to content

Compiler Pipeline

Accepted

Accepted.

Catalyst uses a staged compiler pipeline. The main path is linear, but the schema names both the compiler phase and the artifact handed across the phase boundary:

flowchart LR
  %% Node categories (artifact / verified / backend / deferred) are colored
  %% by the site stylesheet so they stay readable in light and dark mode.
  classDef artifact stroke-width:1.5px
  classDef verified stroke-width:1.5px
  classDef backend stroke-width:1.5px
  classDef deferred stroke-width:1.5px

  subgraph Frontend["front end"]
    direction LR
    Source["source files<br/>.ct text"]:::artifact -->|lexer| Tokens["token stream<br/>tokens + trivia"]:::artifact
    Tokens -->|parser| AST["AST<br/>source syntax + spans"]:::artifact
  end

  subgraph SemanticCore["semantic core"]
    direction LR
    SIR["SIR<br/>resolved names + typed expressions"]:::artifact -->|lowering| IR["IR<br/>backend-facing representation"]:::artifact
    IR -->|verifier| VerifiedIR["verified IR"]:::verified
  end

  subgraph Consumers["IR consumers"]
    direction TB
    Interpreter["IR interpreter<br/>runtime tests + comptime mode"]:::backend
    CBackend["C backend"]:::backend
    LLVMBackend["LLVM backend later"]:::deferred
  end

  AST -->|sema| SIR
  VerifiedIR -->|execute| Interpreter
  VerifiedIR -->|emit| CBackend
  VerifiedIR -.->|emit later| LLVMBackend

The pipeline is intentionally explicit so each phase can be tested and changed locally. Lexer snapshots (.tokens.snap), parser snapshots (.ast.snap), sema diagnostics (.stderr.snap), and IR snapshots (.ir.snap) are separate contracts. Phase Boundaries owns the AST/SIR/IR seam contract, while Comptime Bootstrapping and Compiler Objects owns the compiler-facing seam between demanded comptime evaluation, compiler-owned semantic handles, public compiler intrinsics, and prelude bootstrap.

Phase Terms

source loading
Finds root .ct files and later declaration-scope module(...) or include(...) dependencies. Loaded files enter the same lexer and parser path; source loading does not define a separate evaluator.
lexer
Converts source text into tokens and preserved trivia for parser, formatter, and token snapshots. Lexing does not resolve names or assign types.
parser
Converts tokens into AST. Parsing owns source syntax, grouping, spans, and syntactic diagnostics, but not name resolution, type checking, or desugaring.
sema
Performs semantic analysis over AST and produces SIR. Sema owns name resolution, type checking, source-level capability checks, demanded comptime evaluation requests, resulting semantic facts, and semantic diagnostics.
lowering
Converts SIR into backend-facing IR. Lowering makes backend-visible control flow, values, ABI/export facts, ownership lowering, dynamic dispatch facts, and required safety checks explicit without encoding a specific backend.
IR verifier
Checks lowered IR before any backend consumes it. The C backend, runtime interpreter, and comptime interpreter consume verified IR only.
backends
Consume verified IR. V1 has the C backend and the shared IR interpreter; LLVM is a later backend that must consume the same verified IR contract.

Comptime Demand Loop

Comptime evaluation is demand-driven during compilation unless source explicitly forces it with a comptime expression, block, or comptime fn call. The demand loop is still routed through checked SIR, lowered IR, the IR verifier, and the shared interpreter:

flowchart TB
  %% The loop starts at a semantic demand, builds a verified IR unit, executes it, then resumes the demanding phase.
  %% Node categories are colored by the site stylesheet for light/dark mode.
  classDef start stroke-width:1.5px
  classDef demand stroke-width:1.5px
  classDef artifact stroke-width:1.5px
  classDef execution stroke-width:1.5px
  classDef result stroke-width:1.5px

  Start(("start")):::start --> Need["sema or lowering<br/>needs comptime value"]:::demand
  Need --> Demand["create comptime demand<br/>semantic instance + context"]:::demand

  subgraph BuildUnit["build demanded unit"]
    direction TB
    Check["check demanded unit<br/>AST to SIR as needed"]:::artifact --> Lower["lower demanded unit<br/>SIR to IR"]:::artifact
    Lower --> Verify["verify demanded IR"]:::artifact
  end

  Demand --> Check
  Verify --> Execute["IR interpreter<br/>comptime mode"]:::execution
  Execute --> Result["comptime result<br/>value, handle, diagnostics, facts"]:::result
  Result -.->|resume| Continue["sema or lowering<br/>with produced facts"]:::demand

When sema or lowering needs a comptime value, the compiler lowers the required already-checkable unit to IR, verifies that IR, and executes it with the shared IR interpreter in comptime mode. The interpreter must not become a separate AST/SIR evaluator or a second source-level semantics implementation. If a comptime operation needs compiler tables, visibility, semantic identity, target facts, diagnostics, source loading, or phase-owned mutation, that access must be represented by an explicit compiler intrinsic handler rather than by letting the interpreter inspect earlier phases directly.

Within one compilation, a comptime result may be cached only for the same evaluated semantic instance, explicit comptime arguments, target, safety mode, captured scope context, and active semantic environment. V1 does not define cross-run incremental comptime-cache validity; incremental cache keys and invalidation are deferred to CEP-0061: Incremental Comptime Cache Invalidation.

Cycles in comptime dependencies are compile errors. The compiler must report them deterministically instead of relying on evaluation order or repeated global passes.

Current Backend Strategy

The first backend emits C. This keeps generated output inspectable, easy to snapshot, and independent of LLVM setup.

The interpreter consumes verified IR directly. It is both an IR backend for deterministic execution tests and the execution engine used by demand-driven and forced comptime evaluation. It must not become a separate AST/SIR evaluator or a second source-level semantics implementation.

LLVM is a later native backend and must consume the same backend-facing IR rather than reaching into AST or SIR.

Target and safety-mode vocabulary is defined in Target and Safety Modes.

Implementation Direction

The implementation should use small phase-local data structures and typed IDs rather than raw pointer graphs.

Global mutable state should not be introduced.