With the introduction of nullable types, we want the
compiler to be smart in cases like
> if (x == null) return;
> // x is int now
or
> if (x == null) x = 0;
> // x is int now
These are called smart casts: when the type of variable
at particular usage might differ from its declaration.
Implementing smart casts is very challenging. They are based
on building control-flow graph and handling every AST vertex
with care. Actually, I represent cfg not a as a "graph with
edges". Instead, it's a "structured DFS" for the AST:
1) at every point of inferring, we have "current flow facts"
2) when we see an `if (...)`, we create two derived contexts
3) after `if`, finalize them at the end and unify
4) if we detect unreachable code, we mark that context
In other words, we get the effect of a CFG but in a more direct
approach. That's enough for AST-level data-flow.
Smart casts work for local variables and tensor/tuple indices.
Compilation errors have been reworked and now are more friendly.
There are also compilation warnings for always true/false
conditions inside if, assert, etc.
This commit introduces nullable types `T?` that are
distinct from non-nullable `T`.
Example: `int?` (int or null) and `int` are different now.
Previously, `null` could be assigned to any primitive type.
Now, it can be assigned only to `T?`.
A non-null assertion operator `!` was also introduced,
similar to `!` in TypeScript and `!!` in Kotlin.
If `int?` still occupies 1 stack slot, `(int,int)?` and
other nullable tensors occupy N+1 slots, the last for
"null precedence". `v == null` actually compares that slot.
Assigning `(int,int)` to `(int,int)?` implicitly creates
a null presence slot. Assigning `null` to `(int,int)?` widens
this null value to 3 slots. This is called "type transitioning".
All stdlib functions prototypes have been updated to reflect
whether they return/accept a nullable or a strict value.
This commit also contains refactoring from `const FunctionData*`
to `FunctionPtr` and similar.
FunC's (and Tolk's before this PR) type system is based on Hindley-Milner.
This is a common approach for functional languages, where
types are inferred from usage through unification.
As a result, type declarations are not necessary:
() f(a,b) { return a+b; } // a and b now int, since `+` (int, int)
While this approach works for now, problems arise with the introduction
of new types like bool, where `!x` must handle both int and bool.
It will also become incompatible with int32 and other strict integers.
This will clash with structure methods, struggle with proper generics,
and become entirely impractical for union types.
This PR completely rewrites the type system targeting the future.
1) type of any expression is inferred and never changed
2) this is available because dependent expressions already inferred
3) forall completely removed, generic functions introduced
(they work like template functions actually, instantiated while inferring)
4) instantiation `<...>` syntax, example: `t.tupleAt<int>(0)`
5) `as` keyword, for example `t.tupleAt(0) as int`
6) methods binding is done along with type inferring, not before
("before", as worked previously, was always a wrong approach)
This is a huge refactoring focusing on untangling compiler internals
(previously forked from FunC).
The goal is to convert AST directly to Op (a kind of IR representation),
doing all code analysis at AST level.
Noteable changes:
- AST-based semantic kernel includes: registering global symbols,
scope handling and resolving local/global identifiers,
lvalue/rvalue calc and check, implicit return detection,
mutability analysis, pure/impure validity checks,
simple constant folding
- values of `const` variables are calculated NOT based on CodeBlob,
but via a newly-introduced AST-based constant evaluator
- AST vertices are now inherited from expression/statement/other;
expression vertices have common properties (TypeExpr, lvalue/rvalue)
- symbol table is rewritten completely, SymDef/SymVal no longer exist,
lexer now doesn't need to register identifiers
- AST vertices have references to symbols, filled at different
stages of pipeline
- the remaining "FunC legacy part" is almost unchanged besides Expr
which was fully dropped; AST is converted to Ops (IR) directly
This is a very big change.
If FunC has `.methods()` and `~methods()`, Tolk has only dot,
one and only way to call a `.method()`.
A method may mutate an object, or may not.
It's a behavioral and semantic difference from FunC.
- `cs.loadInt(32)` modifies a slice and returns an integer
- `b.storeInt(x, 32)` modifies a builder
- `b = b.storeInt()` also works, since it not only modifies, but returns
- chained methods also work, they return `self`
- everything works exactly as expected, similar to JS
- no runtime overhead, exactly same Fift instructions
- custom methods are created with ease
- tilda `~` does not exist in Tolk at all
Lots of changes, actually. Most noticeable are:
- traditional //comments
- #include -> import
- a rule "import what you use"
- ~ found -> !found (for -1/0)
- null() -> null
- is_null?(v) -> v == null
- throw is a keyword
- catch with swapped arguments
- throw_if, throw_unless -> assert
- do until -> do while
- elseif -> else if
- drop ifnot, elseifnot
- drop rarely used operators
A testing framework also appears here. All tests existed earlier,
but due to significant syntax changes, their history is useless.
Since I've implemented AST, now I can drop forward declarations.
Instead, I traverse AST of all files and register global symbols
(functions, constants, global vars) as a separate step, in advance.
That's why, while converting AST to Expr/Op, all available symbols are
already registered.
This greatly simplifies "intermediate state" of yet unknown functions
and checking them afterward.
Redeclaration of local variables (inside the same scope)
is now also prohibited.
Now, the whole .tolk file can be loaded as AST tree and
then converted to Expr/Op.
This gives a great ability to implement AST transformations.
In the future, more and more code analysis will be moved out of legacy to AST-level.