Owen specification 0.28.0
The specification is versioned using Semver 2.0. Parser Expression Grammar, extended with Unicode Code Points, is used to define the syntax. Files are encoded using UTF-8 and uses the .owen extension.
The design guidelines are:
- Control > language safety.
- Compiler/spec simplicity > programmer convenience unless really, really convenient.
- Compiler/spec simplicity > backwards compatibility.
- If there is no reasonable default from the spec's perspective it is the programmer's decision.
What remains unsaid is mostly left so intentionally, either because it is derivable from stated rules of the language, or because it would unnecessarily restrict the freedom of implementors.
1. Files
file <- whitespace
namespace_directive
use_directive*
declaration*
!.
namespace_directive <- namespace upper_case_identifier
use_directive <- use upper_case_identifier
declaration <- function_declaration
/ external_function_declaration
/ compound_declaration
/ enumeration_declaration
/ version_declaration
upper_case_identifier <- [A-Z][a-z0-9]* (_[A-Z][a-z0-9]*)* whitespace
lower_case_identifier <- !keyword [a-z][a-z0-9]* (_[a-z][a-z0-9]*)* whitespace
keyword <- namespace
/ use
/ public
/ external
/ function
/ end
/ if
/ elif
/ else
/ for
/ while
/ break
/ continue
/ structure
/ enumeration
/ sizeof
/ union
/ return
/ true
/ false
/ null
/ version
/ readonly
/ noalias
namespace <- 'namespace' whitespace
use <- 'use' whitespace
public <- 'public' whitespace
external <- 'external' whitespace
function <- 'function' whitespace
end <- 'end' whitespace
if <- 'if' whitespace
elif <- 'elif' whitespace
else <- 'else' whitespace
for <- 'for' whitespace
while <- 'while' whitespace
break <- 'break' whitespace
continue <- 'continue' whitespace
structure <- 'structure' whitespace
enumeration <- 'enumeration' whitespace
sizeof <- 'sizeof' whitespace
union <- 'union' whitespace
return <- 'return' whitespace
true <- 'true' whitespace
false <- 'false' whitespace
null <- 'null' whitespace
version <- 'version' whitespace
readonly <- 'readonly' whitespace
noalias <- 'noalias' whitespace
whitespace <- (' ' / U+000A / comment)*
comment <- '//' (!(U+000A / invalid_code_point) .)* U+000A?
invalid_code_point <- [U+0000-U+0009]
/ [U+000B-U+001F]
/ U+007F
/ U+0085
/ U+2028
/ U+2029
The namespace_directive specifies that all declarations in the file are in the given name space. The namespace_directive and use_directive puts all the public declarations in the given name space in scope of the file.
2. Declarations
If there are multiple declarations in scope that matches, it is an error.
A type declaration with the same name as a primitive or as another type declaration with the same amount of formal_type_parameters match.
function_declarations with the same name and nth formal_parameter's type match. formal_type_parameters differ from formal_parameter types but for the purpose of overloading any formal_type_parameter matches any other formal_type_parameter. Qualifiers are ignored here.
2.1. Functions
function_declaration <- public? function_signature
statements
end
function_signature <- function
lower_case_identifier
formal_type_parameters?
formal_parameters
return_types?
formal_type_parameters <- left_square_bracket
formal_type_parameter
(comma formal_type_parameter)*
right_square_bracket
formal_type_parameter <- upper_case_identifier
formal_parameters <- left_parenthesis
formal_parameter
(comma formal_parameter)*
right_parenthesis
formal_parameter <- type lower_case_identifier
return_types <- colon types
colon <- ':' whitespace
Declares a function with the function_signature's lower_case_identifier as the name. formal_parameters defines a list of values that callers must pass to the function. formal_parameters are in the same scope as the statements, they must have unique names and they must be used in a reference_expression. A formal_parameter's type cannot be or contain a non-public type if the function is declared public.
A function with formal_type_parameters is called a Polymorphic Function otherwise it is a Monomorphic Function. Any reference to the formal_type_parameters are substituted with actual_type_parameters when the function is referenced or called. formal_type_parameters cannot have the same name as any primitive, enumeration or monomorphic compound in scope and they must be unique.
The function_type of a function_declaration is as follows, the nth formal_parameter's type is the nth type in the function_type's types and the function_declaration's return_types are equal to the function_type's return_types.
statements must have a terminating statement if the function have return_types.
2.1.1. The main function
The main function is the entry point of the program and it has one of the function_signatures:
function main() : I32
function main(readonly #[][]U8 arguments) : I32
2.1.2. Foreign function interface
external_function_declaration <- public? external function_signature
utf8_string_literal
utf8_string_literal
Defines a reference to a C function in a library. The first utf8_string_literal is the path to the library. The second utf8_string_literal is the function's name in said library. Foreign functions cannot return multiple values.
2.3. Compounds
compound_declaration <- public? (structure / union) upper_case_identifier formal_type_parameters?
field+
end
field <- type lower_case_identifier
The upper_case_identifier is the name of the compound. A compound with formal_type_parameters is called a Polymorphic Compound otherwise it is a Monomorphic Compound. The nth formal_type_parameter is replaced by the nth type in actual_type_parameters when the polymorphic compound is referenced. actual_type_parameters cannot be qualified in this context.
field names must be unique within the compound_declaration. A field's type cannot be or contain a non-public type if the compound is declared public and cannot be qualified.
structure fields are laid out in memory as they are lexically declared. Padding may be inserted after a field. The size of the structure is the sum of its fields and padding.
union fields starts at the same address. The size of the union is the size of the largest field.
2.4. Enumerations
enumeration_declaration <- public? enumeration upper_case_identifier colon upper_case_identifier
enumeration_constant*
end
enumeration_constant <- lower_case_identifier (assign expression)?
The first upper_case_identifier is the name of the enumeration. The last upper_case_identifier is the underlying type of the enumeration which must be an IXX or UXX.
Each enumeration_constant must have an unique lower_case_identifier within the declaration. If an enumeration_constant omits the expression then it is the value of the last constant + 1. If the first constant omits the expression then its value is 0. The expression must be constant.
2.5. Versions
version_declaration <- version upper_case_identifier
declaration*
end
Conditionally compiles declarations if the upper_case_identifier has been declared in the command line. The upper_case_identifier is not part of the program's scope.
3. Statements
statements <- ( declaration_statement
/ assignment_statement
/ expression_statement
/ if_statement
/ for_statement
/ while_statement
/ version_statement)*
( break_statement
/ continue_statement
/ return_statement)?
Each statement are executed in lexical order.
A terminating statement the last statement in statements and it is either a return_statement or if_statement with an else branch and all branches have terminating statements.
3.1. Declaration statements
declaration_statement <- variables (equal expressions)?
equal <- '=' whitespace
variables <- variable (comma variable)*
variable <- type (lower_case_identifier / blank_identifier)
Declares variables in the current scope with the given lower_case_identifier as the name and type. A variable cannot have the same name as anything else in scope. blank_identifier can only be on the left hand side of Tuple assignments and anything assigned to them is thrown away.
variables are evaluated from left to right and exists in scope only after it has been assigned an expression. Each variable must be used in a reference_expression. If the equal expressions part is omitted then all the variables must be named are zeroed otherwise:
A Ballanced assignment consist of the same amount of expressions on each side of the assignment_operator. The nth expression on the right hand side is assigned to the nth variable on the left hand side.
Tuple assignment consist of a call_expression on the right hand side that returns multiple values and the nth variable on the left hand side is assigned the nth returned value.
The nth expression's type must match the nth variable's type or be inferable as the nth variable's type.
3.2. Assignment statements
assignment_statement <- expressions assignment_operator expressions
assignment_operator <- ([+-*/&|^%] / '<<' / '>>')? equal
assignment_statement works the same as declaration_statement except that:
- Each expression on the left hand side must be addressable.
-
x op= expression
is equivalent tox = x op expression
wherex
evaluates once. - = is the only operator defined for blank_identifier.
- The nth left hand side expression is evaluated then the nth right hand side expression is evaluated going left to right.
3.3. Expression statements
expression_statement <- expression
The expression must be a call_expression that returns no values.
3.4. If statements
if_statement <- if (declaration_statement semicolon)? expression
statements
(elif (declaration_statement semicolon)? expression
statements)*
(else
statements)?
end
semicolon <- ';' whitespace
Each expression is evaluated in lexical order until one is true. The statements following the expression are then executed. If expressions are false and the else block is defined then its statements are executed.
each declaration_statement is executed before the following expression. The declared variables are in the scope of all the following statements in the if_statement.
3.5. For statements
for_statement <- for declaration_statement semicolon expression semicolon assignment_statement
statements
end
The declaration_statement is executed before the first iteration. The declared variables are in the scope of all the following statements. The expression is the condition for executing the statements which must be true. The assignment_statement is executed after each iteration. The break_statement skips execution of the assignment_statement.
3.6. While statements
while_statement <- while (declaration_statement semicolon)? expression
statements
end
The expression must be of type of Bool. If the expression is true, then the statements are executed. After the statements have executed, the expression is evaluated again, and if true the statements are executed again. This continues until the expression is false. The declaration_statement is executed before the first iteration and declared the variables is in the same scope as statements.
3.7. Break statements
break_statement <- break
The break_statement stops the execution of the innermost loop in which it is declared. Execution resumes after the innermost loop.
3.8. Continue statements
continue_statement <- continue
The continue_statement begins the next iteration of the innermost loop in which it is declared.
3.9. Return statements
return_statement <- return expressions?
Exits the current function and returns the expressions to the caller. The nth expression's type must either match the nth return type of the current function or be inferable as the nth return type of the current function.
3.10. Version statements
version_statement <- version upper_case_identifier
statements
end
Works the same way as Version Declarations but for statementss instead.
4. Expressions
expressions <- expression (comma expression)*
expression <- logical_or_expression
expressions are evaluated from left to right. logical_or_expression has the lowest precedence and postfix_expression has the highest precedence. Operators are left associative unless noted otherwise.
Reading an expression with a Pointerless type turns its type into an unqualified_type.
These expressions are addressable:
- reference_expression
- field_access_expression unless the compound's type is a fixed_array_type
- array_access_expression
- dereference_expression
- blank_identifier
Expressions consisting of the following are considered constant expressions:
- float_literal
- integer_literal
- Binary expressions
- negate_expression
4.1. Logical or expressions
logical_or_expression <- logical_and_expression ('||' whitespace logical_and_expression)*
Both operands must be type of Bool. The type of the logical_or_expression is Bool. The right hand side is not executed if the left hand side is true. The logical_or_expression is true if either operand is true.
4.2. Logical and expressions
logical_and_expression <- relational_expression ('&&' whitespace relational_expression)*
Both operands must be type of Bool. The type of the logical_and_expression is Bool. The right hand side is not executed if the left hand side is false. The logical_and_expression is true if both operands are true.
4.3. Relational expressions
relational_expression <- additive_expression (('==' / '!=' / '<=' / '>=' / '<' / '>') whitespace additive_expression)*
The comparison results in a Bool.
The operators are ==
(equal), !=
(not equal),
<=
(less than or equal), >=
(greater than or equal),
<
(less than) and >
(greater than).
For ==
and !=
both operands must be of the same primitive, pointer or enumeration type.
For <=
, >=
, <
and >
both operands must be of the same number, pointer or enumeration type.
4.4. Additive expressions
additive_expression <- multiplicative_expression (('+' / '-' / '|' / '^') whitespace multiplicative_expression)*
If both operands are the same number type the type of the expression is the same as the operands.
+
(add) and -
(subtract) apply to any number type. |
(bitwise or) and
^
(bitwise xor) only applies to IXX and UXX.
If one multiplicative_expression is a pointer type and the other multiplicative_expression
is a UXX type and the operator is either +
or -
then UXX value * size of the base type code is either added or subtracted.
The type of the additive expression is the same as the pointer type.
If both operands are the same enumeration type then |
and ^
applies to the underlying value resulting in the same type as the operands.
If one of the multiplicative_expressions are inferable as the other multiplicative_expression's type then it is inferred as that type.
4.5. Multiplicative expressions
multiplicative_expression <- prefix_expression (('*' / '/' / '%' / '&' / '<<' / '>>') whitespace prefix_expression)*
If both operands are the same number type the type of the expression is the same as the operands.
*
(multiply), /
(divide), %
(modulo), &
(bitwise and)
<<
(left shift) and >>
(right shift) applies to IXX
and UXX. *
, /
and %
also applies to FXX.
If both operands are the same enumeration type then &
applies to the
underlying value resulting in the same type as the operands.
If one of the prefix_expressions are inferable as the other prefix_expression's type then it is inferred as that type.
4.6. Prefix expressions
prefix_expression <- not_expression
/ negate_expression
/ address_of_expression
/ dereference_expression
/ postfix_expression
Prefix operators are right associative.
4.6.1. Not expressions
not_expression <- '!' whitespace postfix_expression
Inverts a Bool.
4.6.2. Negate expressions
negate_expression <- '-' whitespace postfix_expression
Negates the value of IXX or FXX. The type of the expression is the same as the negated value.
4.6.3. Address of expressions
address_of_expression <- '#' whitespace postfix_expression
Takes the address of an addressable postfix_expression. The type of the address_of_expression is a pointer_type, with the postfix_expression's qualifiers, where the base type is the same type as the postfix_expression's type, with said qualifiers.
4.6.4. Dereference expressions
dereference_expression <- '@' whitespace postfix_expression
Deferences a possibly qualified pointer_typed postfix_expression. The type of the dereference_expression is the same as the pointer_type's base type with said qualifiers.
4.7. Postfix expressions
postfix_expression <- primary_expression (field_access_expression / call_expression / array_access_expression)*
4.7.1. Field access expressions
field_access_expression <- dot lower_case_identifier
dot <- '.' whitespace
The accessed expression must be a compound or array type or a pointer to one. The lower_case_identifier must be the name of a field defined by the accessed expression's type. If the accessed type is a pointer then it is dereferenced. The type of the field_access_expression is the same as the field's type.
4.7.2. Call expressions
call_expression <- left_parenthesis actual_parameters right_parenthesis
actual_parameters <- expressions?
If the expression being called is not a reference_expression to a function but has a function type then the nth actual parameter's type must match the nth formal_parameter's type and the type of the call is the return_types of the function type. Otherwise the expression is a reference_expression to a function and the type of the call is the return_types of the resolved function overload. The specific overload is resolved as follows:
With actual_type_parameters the overload resolves to the polymorphic function with the same name and amount of formal_type_parameters as there are actual_type_parameters. The nth formal_type_parameter is substituted with the nth actual_type_parameter and then the nth actual parameter's type matches the nth formal_parameter's type.
Without actual_type_parameters the overload resolves to:
- The monomorphic function with the same name and amount of formal_parameters where the nth actual parameter's type matches the nth formal_parameter's type.
-
The polymorphic function with the same name and amount of
formal_parameters where the nth
expression's type matches the nth formal_parameter's type.
The formal_type_parameters are inferred from
the formal_parameters
and actual_parameters as follows:
- If the actual parameter's type is a primitive, enumeration or a monomorphic compound and the formal_parameter's type is a formal_type_parameter then that formal_type_parameter is inferred to be the actual parameter's type.
- If the actual parameter's type and the formal_parameter's type match the same polymorphic compound then these rules are applied to the nth actual_type_parameter and the nth formal_type_parameter recursively.
The actual_parameters are passed to the called function by value.
4.7.3. Array access expressions
array_access_expression <- left_square_bracket expression right_square_bracket
left_square_bracket <- '[' whitespace
right_square_bracket <- ']' whitespace
The indexed expression must be an array type or a pointer to an array type. If the indexed expression is an array type it is implicitly dereferenced. The expression is the index into the array. The expression must be integer typed, enumeration typed or an integer_literal which type is inferred as U32
The result is the nth element, starting from 0, which type is the same as the array's base type.
4.8. Primary expressions
primary_expression <- float_literal
/ integer_literal
/ boolean_literal
/ unicode_code_point_literal
/ utf8_string_literal
/ compound_literal
/ array_literal
/ null_literal
/ uninitialized_literal
/ size_of_expression
/ parenthesized_expression
/ reference_expression
/ enumeration_constant_access
/ cast_expression
/ blank_identifier
4.8.1. Floating point literals
float_literal <- [0-9]+ '.' [0-9]+ whitespace
The type of a float_literal is inferred. float_literals are not allowed to overflow their type.
4.8.2. Integer literals
integer_literal <- (binary_integer / decimal_integer / hexadecimal_integer) whitespace
binary_integer <- '0b' [01]+ ('_' [01]+)*
decimal_integer <- [0-9]+ ('_' [0-9]+)*
hexadecimal_integer <- '0x' hexadecimal_digit+ ('_' hexadecimal_digit+)*
hexadecimal_digit <- [0-9A-F]
The type of a integer_literal is inferred. The underscore doesn't change the value of the integer_literal. integer_literals are not allowed to overflow their type.
4.8.3. Boolean literals
boolean_literal <- true / false
boolean_literals are type of Bool.
4.8.4. Unicode code point literals
unicode_code_point_literal <- "'" (unicode_escape_sequence / !("'" / invalid_code_point) .) "'" whitespace
unicode_escape_sequence <- '\u' hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
/ '\U' hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
The hexadecimal_digits in unicode_escape_sequence represents a valid Unicode Code Point. The unicode_code_point_literal is replaced by with an integer_literal of the same value.
4.8.5. UTF-8 string literals
utf8_string_literal <- '"' (unicode_escape_sequence / !('"' / invalid_code_point) .)* '"' whitespace
utf8_string_literals are encoded as UTF-8 stored in a readonly #[]U8.
4.8.6. Compound literals
compound_literal <- left_curly_bracket
field_initializer (comma field_initializer)*
right_curly_bracket
field_initializer <- lower_case_identifier equal expression
Initializes the specified fields of a compound type which is inferred. The field_initializer's lower_case_identifier is the name of the field to initialize. The expression is the value of the given field. The type of the expression must match field's type or the expression must be inferable as the field's type.
0 or 1 field_initializer per field and uninitialized fields have zeroed values.
4.8.7. Array literals
array_literal <- left_curly_bracket
element_initializer (comma element_initializer)*
right_curly_bracket
element_initializer <- (left_square_bracket (integer_literal | enumeration_constant_access) right_square_bracket equal)? expression
left_curly_bracket <- '{' whitespace
right_curly_bracket <- '}' whitespace
Initializes the specified elements of a fixed array which type is inferred.
The integer_literal or enumeration_constant_access is an explicit index of the element, starting from 0, to initialize. The explicit index cannot underflow or overflow the size of the inferred fixed_array_type. The integer_literal is inferred as U32. If there is no explicit index then it is implicitly equal to the previous index + 1 or 0 if this is the first element_initializer. Elements can be initialized once. Elements without an initializer are zeroed.
The expression is the value of the indexed element and its type is inferred as the array's base type.
4.8.8. Null literals
null_literal <- null
The value is what C dictates on the given platform. Only pointer typed variables can be null and the type of the null_literal is inferred.
4.8.9. Uninitialized literals
uninitialized_literal <- '---' whitespace
uninitialized_literal can only be used for declaring a variable and it must be the whole expression. It is either assigned to a variable or returned from a function. The actual value is undefined.
4.8.10. Size of expressions
size_of_expression <- sizeof left_parenthesis (type / expression) right_parenthesis
size_of_expression is substituted for an integer_literal of the size of a type or an expression's type in bytes.
4.8.11. Parenthesized expressions
parenthesized_expression <- left_parenthesis expression right_parenthesis
left_parenthesis <- '(' whitespace
right_parenthesis <- ')' whitespace
The type and value of parenthesized_expression is the same as the expression.
4.8.12. Reference expressions
reference_expression <- lower_case_identifier actual_type_parameters?
reference_expression refers to a variable, formal_parameter or a function with the same name in scope. If the reference_expression refers to a function the type is inferred.
4.8.13. Enumeration constant access expression
enumeration_constant_access <- upper_case_identifier dot lower_case_identifier
upper_case_identifier is the enumeration type and lower_case_identifier is the enumeration_constant on said enumeration. The type of the expression is the same the enumeration and the value of the expression is the same as the enumeration_constant.
4.8.14. Cast expressions
cast_expression <- type left_parenthesis expression right_parenthesis
Converts the value of expression to type. enumeration types act as their underlying type.
- When casting an integer type to another integer type
- If type is smaller than expression's type then the most significant bits of the expression are truncated and the remaining bits are reinterpreted as the type.
- If type is larger than expression's type then the value is either Sign extended or Zero extended to the same size as type then the bits are reinterpreted as the type.
- If type is the same size as expression's type then the bits are reinterpreted as the type.
- When casting a FXX to FXX
- If type is F64 and expression's type is F32 the expression is rounded to the nearest F32 value. If the expression is too low to represent as F32 then the result is ±0. If the expression is too large to represent as F32 the result is ±infinity. If the expression is NaN then the result is NaN.
- If type is F32 and expression's type is F64 the result is a F64 representing the same value as the expression.
- If type is the same as expression's type then the result is expression.
- When casting from a FXX to an integer type the fraction is discarded (truncation towards zero). If the value of the expression is NaN, infinite or value doesn't fit in type the result is undefined.
- When casting from an integer type to a FXX the result is a FXX representing the same value as the expression.
- When the expression is a reference_expression to a function then type must be a Function type which matches an overload in scope. The reference_expression is not allowed to specify actual_type_parameters. The result is a reference to that specific overload.
- When the expression's type must be inferred it is inferred as the type.
4.8.15. Blank identifiers
blank_identifier <- '_' whitespace
Used in place of reference_expression in declaration_statement and assignment_statement and only on the left hand side of the operator.
5. Types
types <- type (comma type)*
type <- type_qualifiers unqualified_type
unqualified_type <- (dynamic_array_type / fixed_array_type / pointer_type)*
(polymorphic_compound_type / named_type / function_type)
polymorphic_compound_type <- upper_case_identifier actual_type_parameters
named_type <- upper_case_identifier
actual_type_parameters <- left_square_bracket types right_square_bracket
function_type <- 'Function' whitespace
left_parenthesis types? right_parenthesis
return_types?
Number types and Bool are considered primitive types and they are always in scope.
The type following a dynamic_array_type, fixed_array_type or pointer_type is called the base type.
5.1. Zero values
The zero value for:
- Number types is 0.
- Bool is false.
- Pointer types is null.
- enumeration types is its first enumeration_constant if any or the zero value of its underlying type.
- Fields and array elements are zeroed recursively.
5.2. Matching types
2 types match if their type_qualifiers are the same and their unqualified_types match which they do in the following cases:
- named_types with the same names match.
- polymorphic_compound_types with the same name and their actual_type_parameters match.
- function_types where the nth formal_parameters and the nth return_types match matches.
- dynamic_array_types match when their base types match.
- fixed_array_types match when their size and base types match.
- pointer_types match when their base type match.
5.3. Integer types
Type | Min | Max |
---|---|---|
I8 | -27 | 27-1 |
I16 | -215 | 215-1 |
I32 | -231 | 231-1 |
I64 | -263 | 263-1 |
U8 | 0 | 28-1 |
U16 | 0 | 216-1 |
U32 | 0 | 232-1 |
U64 | 0 | 264-1 |
The number in each type is the number of bits used to store the its data. IXX is stored using Two's complement. IXX and UXX expressions besides literals may wrap around.
5.4. Floating point types
The F32 and F64 types behaves as binary32 and binary64 respectively as specified in IEEE 754.
5.5. Boolean types
Booleans are stored as an U8 and have the Bool type. Any non-zero value is true and zero is false.
5.6. Pointer types
pointer_type <- '#' whitespace
5.7. Qualifiers
type_qualifiers <- readonly? noalias?
A type is Pointerless if it doesn't have a pointer_type, dynamic_array_type and if pressent, polymorphic_compound_type and named_type doesn't refer to a compound with a field that is not Pointerless.
5.7.1. readonly
readonly qualified expressions cannot appear on the left hand side of assignment_operators.
5.7.2. noalias
formal_parameter's types that are not Pointerless can be noalias qualified. It is assumed that data accessed in a noalias qualified formal_parameter is exclusively accessed through that formal_parameter.
5.8. Dynamic array types
dynamic_array_type <- left_square_bracket right_square_bracket
[]T
behaves as:
structure Dynamic_Array[Base_Type]
#Base_Type elements
U32 length
U32 capacity
end
but without putting
Dynamic_Array[Base_Type]
in scope.
5.9. Fixed array types
fixed_array_type <- left_square_bracket integer_literal right_square_bracket
Is a fixed sized array of Base_Type with integer_literal > 0 elements stored contiguously without padding between elements. Its U32 length field is equal to the integer_literal.