CEP-0072: Algebraic Data Types¶
Draft
Draft proposal for future nominal payload-bearing variants. V1 has tag-only enums and enum unions, but no algebraic data types, payload variants, or discriminated unions.
Summary¶
Catalyst should eventually support nominal closed variant types whose cases may carry payload data. This is the family of types often called algebraic data types, tagged unions, discriminated unions, or Rust-style enums with data.
An algebraic data type is distinct from a concrete type union from CEP-0001. A concrete type union is an anonymous union over existing member types. An algebraic data type declares named cases as part of one nominal type, and those cases may have no payload, a single payload, or named fields.
Motivation¶
Tag-only enums work well for closed sets of labels:
const Ordering = enum {
less,
equal,
greater,
}
Many real models need a closed set of alternatives where the chosen alternative determines what data is present:
- parse trees and tokens;
- protocol messages;
- state machines with state-specific data;
- result-like values where success and failure carry different payloads;
- reflection facts and diagnostics that should not be represented as a tag plus nullable fields.
Without payload-bearing variants, Catalyst code must model these shapes as a tag-only enum plus a struct containing optional or invalid-when-inactive fields. That makes impossible states representable and pushes exhaustiveness and payload validity checks into convention.
Proposed Direction¶
An algebraic data type should be a closed nominal value type with an explicit list of cases. Each value stores exactly one active case at a time. The active case determines which payload, if any, is initialized and available after narrowing.
Possible future shape:
const Message = enum {
ping,
text([]const u8),
data([]const u8),
close {
code: u16,
reason: []const u8,
},
}
The exact declaration syntax is open. The accepted design may extend enum with payload cases, introduce a separate data or union type expression, or choose another spelling. The semantic model should remain the same: one nominal closed case family with payload-bearing variants.
Case names are scoped by the algebraic data type. Unit cases behave like tag-only enum cases:
const msg: Message = Message.ping
Payload cases expose constructors through the type namespace:
const msg: Message = Message.text("hello")
const closed: Message = Message.close{
.code = 1000,
.reason = "done",
}
Expected-type shorthand may be available when the expected type uniquely determines the case:
const msg: Message = .text("hello")
Payload cases with the same payload type remain distinct because the case identity is part of the nominal type:
const Change = enum {
created(UserId),
deleted(UserId),
}
Matching and Narrowing¶
Payload access depends on a narrowing story. A value of an algebraic data type should not expose payload fields until control flow has proven the active case.
Pattern matching from CEP-0002 is the likely primary access form:
match msg {
.ping => handle_ping(),
.text(text) => write_text(text),
.data(bytes) => write_bytes(bytes),
.close { code, reason } => close_connection(code, reason),
}
The first accepted version should define whether matching borrows, copies, or moves payloads by default. That choice must compose with ownership, resources, and partial moves.
Relationship to Concrete Type Unions¶
Algebraic data types and concrete type unions solve related but different problems.
Concrete type unions from CEP-0001 are useful when existing concrete types are already the alternatives:
const Fact = ImplementsFact | SatisfiesFact | TypeKindFact
Algebraic data types are useful when the cases themselves are the public vocabulary:
const Fact = enum {
implements {
subject: type,
contract: type,
},
satisfies {
subject: type,
shape: type,
},
type_kind(TypeKind),
}
The accepted designs should explain how these forms interact, but neither should be treated as mere sugar for the other. A concrete type union injects member values by member type. An algebraic data type constructs values by case name.
Representation Direction¶
Algebraic data types should have a predictable closed representation:
- a discriminant identifying the active case;
- payload storage large and aligned enough for any payload case;
- no heap allocation implied by the algebraic data type itself;
- no public memory layout guarantee in the first accepted version;
- copyability only when every payload case is copyable;
- cleanup by active case when any payload may own resources.
The first accepted version should not promise C ABI layout. External representation, @repr controls, explicit discriminant representation, and C interop should be designed separately or as a later extension.
Reflection Direction¶
Reflection should expose algebraic data types separately from tag-only enums and enum unions. The shape might include:
T.type.is_algebraic_data_type() TypeDef.Predicate
T.type.algebraic_data_type_cases() []const TypeDef.AdtCase!TypeDef.ReflectionError
Case metadata should be able to report:
- case name;
- source location and docs;
- whether the case is unit-like, single-payload, or named-field payload;
- payload type or payload fields;
- declaration order for diagnostics and documentation.
Reflection should not make payload access dynamic. Ordinary source should still narrow through the language's pattern or case-test syntax.
Open Questions¶
- Should the source syntax extend
enum, or should payload-bearing variants use a distinct keyword? - Are tuple-like payload cases part of the first version, or only named-field payload cases?
- Does construction use function-call syntax, aggregate syntax, or both?
- How do generic algebraic data types spell type parameters if generic syntax is still deferred?
- How do recursive algebraic data types interact with sizedness rules?
- Can unit-only algebraic data types be identical to V1 tag-only enums, or should they remain distinct type forms?
- How are equality, ordering, hashing, and formatting conformances derived or rejected?
- What are the exact move, borrow, and copy rules for matching payloads?
- Should explicit discriminant values or representation modes be part of this proposal or a later representation proposal?
V1 Compatibility¶
V1 remains unchanged:
enum { ... }creates only tag-only enum types;- enum unions compose only tag-only enum types;
- payload-bearing variants and discriminated unions are rejected;
- payload access through pattern matching is unavailable because V1 has no
matchform.
Current V1 code can continue to use tag-only enums plus structs, optionals, or closed error sets where appropriate. This proposal captures the future direction for replacing tag-plus-payload conventions with a first-class closed variant type.