Compiler Pipeline¶
Accepted
Accepted.
Catalyst uses a staged compiler pipeline. The main path is linear, but the schema names both the compiler phase and the artifact handed across the phase boundary:
flowchart LR
%% Node categories (artifact / verified / backend / deferred) are colored
%% by the site stylesheet so they stay readable in light and dark mode.
classDef artifact stroke-width:1.5px
classDef verified stroke-width:1.5px
classDef backend stroke-width:1.5px
classDef deferred stroke-width:1.5px
subgraph Frontend["front end"]
direction LR
Source["source files<br/>.ct text"]:::artifact -->|lexer| Tokens["token stream<br/>tokens + trivia"]:::artifact
Tokens -->|parser| AST["AST<br/>source syntax + spans"]:::artifact
end
subgraph SemanticCore["semantic core"]
direction LR
SIR["SIR<br/>resolved names + typed expressions"]:::artifact -->|lowering| IR["IR<br/>backend-facing representation"]:::artifact
IR -->|verifier| VerifiedIR["verified IR"]:::verified
end
subgraph Consumers["IR consumers"]
direction TB
Interpreter["IR interpreter<br/>runtime tests + comptime mode"]:::backend
CBackend["C backend"]:::backend
LLVMBackend["LLVM backend later"]:::deferred
end
AST -->|sema| SIR
VerifiedIR -->|execute| Interpreter
VerifiedIR -->|emit| CBackend
VerifiedIR -.->|emit later| LLVMBackend
The pipeline is intentionally explicit so each phase can be tested and changed locally. Lexer snapshots (.tokens.snap), parser snapshots (.ast.snap), sema diagnostics (.stderr.snap), and IR snapshots (.ir.snap) are separate contracts. Phase Boundaries owns the AST/SIR/IR seam contract, while Comptime Bootstrapping and Compiler Objects owns the compiler-facing seam between demanded comptime evaluation, compiler-owned semantic handles, public compiler intrinsics, and prelude bootstrap.
Phase Terms¶
- source loading
- Finds root
.ctfiles and later declaration-scopemodule(...)orinclude(...)dependencies. Loaded files enter the same lexer and parser path; source loading does not define a separate evaluator. - lexer
- Converts source text into tokens and preserved trivia for parser, formatter, and token snapshots. Lexing does not resolve names or assign types.
- parser
- Converts tokens into AST. Parsing owns source syntax, grouping, spans, and syntactic diagnostics, but not name resolution, type checking, or desugaring.
- sema
- Performs semantic analysis over AST and produces SIR. Sema owns name resolution, type checking, source-level capability checks, demanded comptime evaluation requests, resulting semantic facts, and semantic diagnostics.
- lowering
- Converts SIR into backend-facing IR. Lowering makes backend-visible control flow, values, ABI/export facts, ownership lowering, dynamic dispatch facts, and required safety checks explicit without encoding a specific backend.
- IR verifier
- Checks lowered IR before any backend consumes it. The C backend, runtime interpreter, and comptime interpreter consume verified IR only.
- backends
- Consume verified IR. V1 has the C backend and the shared IR interpreter; LLVM is a later backend that must consume the same verified IR contract.
Comptime Demand Loop¶
Comptime evaluation is demand-driven during compilation unless source explicitly forces it with a comptime expression, block, or comptime fn call. The demand loop is still routed through checked SIR, lowered IR, the IR verifier, and the shared interpreter:
flowchart TB
%% The loop starts at a semantic demand, builds a verified IR unit, executes it, then resumes the demanding phase.
%% Node categories are colored by the site stylesheet for light/dark mode.
classDef start stroke-width:1.5px
classDef demand stroke-width:1.5px
classDef artifact stroke-width:1.5px
classDef execution stroke-width:1.5px
classDef result stroke-width:1.5px
Start(("start")):::start --> Need["sema or lowering<br/>needs comptime value"]:::demand
Need --> Demand["create comptime demand<br/>semantic instance + context"]:::demand
subgraph BuildUnit["build demanded unit"]
direction TB
Check["check demanded unit<br/>AST to SIR as needed"]:::artifact --> Lower["lower demanded unit<br/>SIR to IR"]:::artifact
Lower --> Verify["verify demanded IR"]:::artifact
end
Demand --> Check
Verify --> Execute["IR interpreter<br/>comptime mode"]:::execution
Execute --> Result["comptime result<br/>value, handle, diagnostics, facts"]:::result
Result -.->|resume| Continue["sema or lowering<br/>with produced facts"]:::demand
When sema or lowering needs a comptime value, the compiler lowers the required already-checkable unit to IR, verifies that IR, and executes it with the shared IR interpreter in comptime mode. The interpreter must not become a separate AST/SIR evaluator or a second source-level semantics implementation. If a comptime operation needs compiler tables, visibility, semantic identity, target facts, diagnostics, source loading, or phase-owned mutation, that access must be represented by an explicit compiler intrinsic handler rather than by letting the interpreter inspect earlier phases directly.
Within one compilation, a comptime result may be cached only for the same evaluated semantic instance, explicit comptime arguments, target, safety mode, captured scope context, and active semantic environment. V1 does not define cross-run incremental comptime-cache validity; incremental cache keys and invalidation are deferred to CEP-0061: Incremental Comptime Cache Invalidation.
Cycles in comptime dependencies are compile errors. The compiler must report them deterministically instead of relying on evaluation order or repeated global passes.
Current Backend Strategy¶
The first backend emits C. This keeps generated output inspectable, easy to snapshot, and independent of LLVM setup.
The interpreter consumes verified IR directly. It is both an IR backend for deterministic execution tests and the execution engine used by demand-driven and forced comptime evaluation. It must not become a separate AST/SIR evaluator or a second source-level semantics implementation.
LLVM is a later native backend and must consume the same backend-facing IR rather than reaching into AST or SIR.
Target and safety-mode vocabulary is defined in Target and Safety Modes.
Implementation Direction¶
The implementation should use small phase-local data structures and typed IDs rather than raw pointer graphs.
Global mutable state should not be introduced.