Phase Boundaries¶
Accepted
Accepted for the V1 compiler phase boundary model. SIR/IR schema obligations and representative lowering recipes are documented in SIR, IR, and Lowering.
Catalyst keeps AST, SIR, and IR separate so each phase can be tested and changed locally.
AST¶
AST is the parser output. It preserves source syntax and source spans.
AST owns: - declaration and expression syntax - original names as written - optional syntax such as omitted return types - source-level grouping, calls, literals, and operators
AST must not: - resolve names - assign semantic types - desugar syntax - encode backend details
Parser snapshots (.ast.snap) are the public contract for AST shape and deterministic printing.
SIR¶
SIR means Semantic IR. It is the result of semantic analysis. It is still language-level, but source names have been resolved and expressions have types.
SIR owns: - resolved function and parameter references - checked basic types - typed expressions and statements - resolved declaration attributes such as C ABI call convention, export, import, and link names - typed array/slice indexing, slicing, and length/member semantics - demand-driven comptime evaluation requests and resulting semantic facts - semantic diagnostics before lowering
SIR may simplify source syntax when the simplification is semantic, not backend-specific. For example, a resolved binary expression may carry the selected operator and operand types.
SIR must not: - preserve purely syntactic ambiguity - emit or model C/LLVM concepts - depend on backend layout decisions - contain unresolved names
SIR dumps are useful for sema debugging, while sema error snapshots (.stderr.snap) define diagnostic behavior.
IR¶
IR is the backend-facing compiler representation. Backends consume IR only.
IR owns: - backend-consumable functions, blocks, values, calls, and returns - explicit types needed for code generation - backend-facing ABI/export facts for declarations - explicit slice descriptors, element access, and bounds-check operations - verifier-enforced invariants - target and safety-mode facts needed by lowering, interpretation, and backend emission - traps with stable trap kinds for checked safety failures - accepted V1 backend-neutral metadata needed for layout, ABI, dynamic dispatch, and ownership lowering - optional source provenance such as spans and debug names for diagnostics, deterministic dumps, and generated debug-friendly internal symbols
IR must not: - inspect AST or SIR - contain unresolved names, semantic contracts, generics, or source-level method calls - treat source-derived debug names as semantic identity, lookup keys, linkage names, or verifier requirements - encode C-only or LLVM-only details - carry speculative optimizer, platform-feature, fast-math, noalias, or realtime/kernel metadata unless a V1 owning item accepts that fact
IR snapshots (.ir.snap) are the public contract for lowering and verifier-visible structure.
Allowed Information Flow¶
flowchart LR
%% Node categories are colored by the site stylesheet for light/dark mode.
classDef phase stroke-width:1.5px
Source[source]:::phase --> Tokens[tokens]:::phase
Tokens --> AST[AST]:::phase
AST --> SIR[SIR]:::phase
SIR --> IR[IR]:::phase
IR --> Backend[backend]:::phase
Each arrow is a one-way boundary. Later phases may depend on the previous phase's public data, but must not reach around it. If a backend needs information that IR does not provide, the fix belongs in SIR-to-IR lowering or the IR schema, not in backend access to AST or SIR.
Practical Rule¶
When adding a feature, update the narrowest phase first:
- syntax-only change
- Lexer/parser/AST snapshot.
- name or type rule
- Sema/SIR and diagnostic snapshot.
- codegen fact
- IR/lowering/verifier snapshot.
- target or safety-mode-sensitive behavior
- Sema/SIR, IR, backend, interpreter, or comptime snapshot with explicit target and safety context.
- output formatting
- Backend snapshot.
Do not combine those changes unless the task is explicitly an integration milestone.