Owen specification 0.28.0

The specification is versioned using Semver 2.0. Parser Expression Grammar, extended with Unicode Code Points, is used to define the syntax. Files are encoded using UTF-8 and uses the .owen extension.

The design guidelines are:

What remains unsaid is mostly left so intentionally, either because it is derivable from stated rules of the language, or because it would unnecessarily restrict the freedom of implementors.

1. Files

file <- whitespace
        namespace_directive
        use_directive*
        declaration*
        !.

namespace_directive <- namespace upper_case_identifier
use_directive       <- use       upper_case_identifier

declaration <- function_declaration
             / external_function_declaration
             / compound_declaration
             / enumeration_declaration
             / version_declaration

upper_case_identifier <-          [A-Z][a-z0-9]* (_[A-Z][a-z0-9]*)* whitespace
lower_case_identifier <- !keyword [a-z][a-z0-9]* (_[a-z][a-z0-9]*)* whitespace
keyword <- namespace
         / use
         / public
         / external
         / function
         / end
         / if
         / elif
         / else
         / for
         / while
         / break
         / continue
         / structure
         / enumeration
         / sizeof
         / union
         / return
         / true
         / false
         / null
         / version
         / readonly
         / noalias
        
namespace   <- 'namespace'   whitespace
use         <- 'use'         whitespace
public      <- 'public'      whitespace
external    <- 'external'    whitespace
function    <- 'function'    whitespace
end         <- 'end'         whitespace
if          <- 'if'          whitespace
elif        <- 'elif'        whitespace
else        <- 'else'        whitespace
for         <- 'for'         whitespace
while       <- 'while'       whitespace
break       <- 'break'       whitespace
continue    <- 'continue'    whitespace
structure   <- 'structure'   whitespace
enumeration <- 'enumeration' whitespace
sizeof      <- 'sizeof'      whitespace
union       <- 'union'       whitespace
return      <- 'return'      whitespace
true        <- 'true'        whitespace
false       <- 'false'       whitespace
null        <- 'null'        whitespace
version     <- 'version'     whitespace
readonly    <- 'readonly'    whitespace
noalias     <- 'noalias'     whitespace

whitespace  <- (' ' / U+000A / comment)*
comment     <- '//' (!(U+000A / invalid_code_point) .)* U+000A?

invalid_code_point <- [U+0000-U+0009]
                    / [U+000B-U+001F]  
                    /  U+007F
                    /  U+0085
                    /  U+2028
                    /  U+2029

The namespace_directive specifies that all declarations in the file are in the given name space. The namespace_directive and use_directive puts all the public declarations in the given name space in scope of the file.

2. Declarations

If there are multiple declarations in scope that matches, it is an error.

A type declaration with the same name as a primitive or as another type declaration with the same amount of formal_type_parameters match.

function_declarations with the same name and nth formal_parameter's type match. formal_type_parameters differ from formal_parameter types but for the purpose of overloading any formal_type_parameter matches any other formal_type_parameter. Qualifiers are ignored here.

2.1. Functions

function_declaration <- public? function_signature
                            statements
                        end

function_signature <- function
                      lower_case_identifier
                      formal_type_parameters?
                      formal_parameters
                      return_types?

formal_type_parameters <- left_square_bracket
                          formal_type_parameter
                          (comma formal_type_parameter)*
                          right_square_bracket

formal_type_parameter <- upper_case_identifier
                          
formal_parameters <- left_parenthesis
                     formal_parameter
                    (comma formal_parameter)* 
                     right_parenthesis

formal_parameter <- type lower_case_identifier
                     
return_types <- colon types

colon <- ':' whitespace

Declares a function with the function_signature's lower_case_identifier as the name. formal_parameters defines a list of values that callers must pass to the function. formal_parameters are in the same scope as the statements, they must have unique names and they must be used in a reference_expression. A formal_parameter's type cannot be or contain a non-public type if the function is declared public.

A function with formal_type_parameters is called a Polymorphic Function otherwise it is a Monomorphic Function. Any reference to the formal_type_parameters are substituted with actual_type_parameters when the function is referenced or called. formal_type_parameters cannot have the same name as any primitive, enumeration or monomorphic compound in scope and they must be unique.

The function_type of a function_declaration is as follows, the nth formal_parameter's type is the nth type in the function_type's types and the function_declaration's return_types are equal to the function_type's return_types.

statements must have a terminating statement if the function have return_types.

2.1.1. The main function

The main function is the entry point of the program and it has one of the function_signatures:

function main() : I32
function main(readonly #[][]U8 arguments) : I32

2.1.2. Foreign function interface

external_function_declaration <- public? external function_signature
                                     utf8_string_literal
                                     utf8_string_literal

Defines a reference to a C function in a library. The first utf8_string_literal is the path to the library. The second utf8_string_literal is the function's name in said library. Foreign functions cannot return multiple values.

2.3. Compounds

compound_declaration <- public? (structure / union) upper_case_identifier formal_type_parameters?
                            field+
                        end

field <- type lower_case_identifier

The upper_case_identifier is the name of the compound. A compound with formal_type_parameters is called a Polymorphic Compound otherwise it is a Monomorphic Compound. The nth formal_type_parameter is replaced by the nth type in actual_type_parameters when the polymorphic compound is referenced. actual_type_parameters cannot be qualified in this context.

field names must be unique within the compound_declaration. A field's type cannot be or contain a non-public type if the compound is declared public and cannot be qualified.

structure fields are laid out in memory as they are lexically declared. Padding may be inserted after a field. The size of the structure is the sum of its fields and padding.

union fields starts at the same address. The size of the union is the size of the largest field.

2.4. Enumerations

enumeration_declaration <- public? enumeration upper_case_identifier colon upper_case_identifier
                               enumeration_constant*
                           end

enumeration_constant <- lower_case_identifier (assign expression)?

The first upper_case_identifier is the name of the enumeration. The last upper_case_identifier is the underlying type of the enumeration which must be an IXX or UXX.

Each enumeration_constant must have an unique lower_case_identifier within the declaration. If an enumeration_constant omits the expression then it is the value of the last constant + 1. If the first constant omits the expression then its value is 0. The expression must be constant.

2.5. Versions

version_declaration <- version upper_case_identifier
                           declaration*
                       end

Conditionally compiles declarations if the upper_case_identifier has been declared in the command line. The upper_case_identifier is not part of the program's scope.

3. Statements

statements <- ( declaration_statement
              / assignment_statement
              / expression_statement
              / if_statement
              / for_statement
              / while_statement
              / version_statement)*

              ( break_statement
              / continue_statement
              / return_statement)?

Each statement are executed in lexical order.

A terminating statement the last statement in statements and it is either a return_statement or if_statement with an else branch and all branches have terminating statements.

3.1. Declaration statements

declaration_statement <- variables (equal expressions)?

equal                 <- '=' whitespace
variables             <- variable (comma variable)*
variable              <- type (lower_case_identifier / blank_identifier)

Declares variables in the current scope with the given lower_case_identifier as the name and type. A variable cannot have the same name as anything else in scope. blank_identifier can only be on the left hand side of Tuple assignments and anything assigned to them is thrown away.

variables are evaluated from left to right and exists in scope only after it has been assigned an expression. Each variable must be used in a reference_expression. If the equal expressions part is omitted then all the variables must be named are zeroed otherwise:

A Ballanced assignment consist of the same amount of expressions on each side of the assignment_operator. The nth expression on the right hand side is assigned to the nth variable on the left hand side.

Tuple assignment consist of a call_expression on the right hand side that returns multiple values and the nth variable on the left hand side is assigned the nth returned value.

The nth expression's type must match the nth variable's type or be inferable as the nth variable's type.

3.2. Assignment statements

assignment_statement <- expressions assignment_operator expressions
assignment_operator <- ([+-*/&|^%] / '<<' / '>>')? equal

assignment_statement works the same as declaration_statement except that:

3.3. Expression statements

expression_statement <- expression

The expression must be a call_expression that returns no values.

3.4. If statements

if_statement <- if (declaration_statement semicolon)? expression
                    statements
               (elif (declaration_statement semicolon)? expression
                    statements)*
               (else
                    statements)?
                end

semicolon <- ';' whitespace

Each expression is evaluated in lexical order until one is true. The statements following the expression are then executed. If expressions are false and the else block is defined then its statements are executed.

each declaration_statement is executed before the following expression. The declared variables are in the scope of all the following statements in the if_statement.

3.5. For statements

for_statement <- for declaration_statement semicolon expression semicolon assignment_statement
                     statements
                 end

The declaration_statement is executed before the first iteration. The declared variables are in the scope of all the following statements. The expression is the condition for executing the statements which must be true. The assignment_statement is executed after each iteration. The break_statement skips execution of the assignment_statement.

3.6. While statements

while_statement <- while (declaration_statement semicolon)? expression
                       statements
                   end

The expression must be of type of Bool. If the expression is true, then the statements are executed. After the statements have executed, the expression is evaluated again, and if true the statements are executed again. This continues until the expression is false. The declaration_statement is executed before the first iteration and declared the variables is in the same scope as statements.

3.7. Break statements

break_statement <- break

The break_statement stops the execution of the innermost loop in which it is declared. Execution resumes after the innermost loop.

3.8. Continue statements

continue_statement <- continue

The continue_statement begins the next iteration of the innermost loop in which it is declared.

3.9. Return statements

return_statement <- return expressions?

Exits the current function and returns the expressions to the caller. The nth expression's type must either match the nth return type of the current function or be inferable as the nth return type of the current function.

3.10. Version statements

version_statement <- version upper_case_identifier
                         statements
                     end

Works the same way as Version Declarations but for statementss instead.

4. Expressions

expressions <- expression (comma expression)*
expression  <- logical_or_expression

expressions are evaluated from left to right. logical_or_expression has the lowest precedence and postfix_expression has the highest precedence. Operators are left associative unless noted otherwise.

Reading an expression with a Pointerless type turns its type into an unqualified_type.

These expressions are addressable:

Expressions consisting of the following are considered constant expressions:

4.1. Logical or expressions

logical_or_expression <- logical_and_expression ('||' whitespace logical_and_expression)*

Both operands must be type of Bool. The type of the logical_or_expression is Bool. The right hand side is not executed if the left hand side is true. The logical_or_expression is true if either operand is true.

4.2. Logical and expressions

logical_and_expression <- relational_expression ('&&' whitespace relational_expression)*

Both operands must be type of Bool. The type of the logical_and_expression is Bool. The right hand side is not executed if the left hand side is false. The logical_and_expression is true if both operands are true.

4.3. Relational expressions

relational_expression <- additive_expression (('==' / '!=' / '<=' / '>=' / '<' / '>') whitespace additive_expression)*

The comparison results in a Bool. The operators are == (equal), != (not equal), <= (less than or equal), >= (greater than or equal), < (less than) and > (greater than).

For == and != both operands must be of the same primitive, pointer or enumeration type.

For <=, >=, < and > both operands must be of the same number, pointer or enumeration type.

4.4. Additive expressions

additive_expression <- multiplicative_expression (('+' / '-' / '|' / '^') whitespace multiplicative_expression)*

If both operands are the same number type the type of the expression is the same as the operands. + (add) and - (subtract) apply to any number type. | (bitwise or) and ^ (bitwise xor) only applies to IXX and UXX.

If one multiplicative_expression is a pointer type and the other multiplicative_expression is a UXX type and the operator is either + or - then UXX value * size of the base type code is either added or subtracted. The type of the additive expression is the same as the pointer type.

If both operands are the same enumeration type then | and ^ applies to the underlying value resulting in the same type as the operands.

If one of the multiplicative_expressions are inferable as the other multiplicative_expression's type then it is inferred as that type.

4.5. Multiplicative expressions

multiplicative_expression <- prefix_expression (('*' / '/' / '%' / '&' / '<<' / '>>') whitespace prefix_expression)*

If both operands are the same number type the type of the expression is the same as the operands. * (multiply), / (divide), % (modulo), & (bitwise and) << (left shift) and >> (right shift) applies to IXX and UXX. *, / and % also applies to FXX.

If both operands are the same enumeration type then & applies to the underlying value resulting in the same type as the operands.

If one of the prefix_expressions are inferable as the other prefix_expression's type then it is inferred as that type.

4.6. Prefix expressions

prefix_expression <- not_expression
                   / negate_expression
                   / address_of_expression
                   / dereference_expression
                   / postfix_expression

Prefix operators are right associative.

4.6.1. Not expressions

not_expression <- '!' whitespace postfix_expression

Inverts a Bool.

4.6.2. Negate expressions

negate_expression <- '-' whitespace postfix_expression

Negates the value of IXX or FXX. The type of the expression is the same as the negated value.

4.6.3. Address of expressions

address_of_expression <- '#' whitespace postfix_expression

Takes the address of an addressable postfix_expression. The type of the address_of_expression is a pointer_type, with the postfix_expression's qualifiers, where the base type is the same type as the postfix_expression's type, with said qualifiers.

4.6.4. Dereference expressions

dereference_expression <- '@' whitespace postfix_expression

Deferences a possibly qualified pointer_typed postfix_expression. The type of the dereference_expression is the same as the pointer_type's base type with said qualifiers.

4.7. Postfix expressions

postfix_expression <- primary_expression (field_access_expression / call_expression / array_access_expression)*

4.7.1. Field access expressions

field_access_expression <- dot lower_case_identifier
dot <- '.' whitespace

The accessed expression must be a compound or array type or a pointer to one. The lower_case_identifier must be the name of a field defined by the accessed expression's type. If the accessed type is a pointer then it is dereferenced. The type of the field_access_expression is the same as the field's type.

4.7.2. Call expressions

call_expression   <- left_parenthesis actual_parameters right_parenthesis
actual_parameters <- expressions?

If the expression being called is not a reference_expression to a function but has a function type then the nth actual parameter's type must match the nth formal_parameter's type and the type of the call is the return_types of the function type. Otherwise the expression is a reference_expression to a function and the type of the call is the return_types of the resolved function overload. The specific overload is resolved as follows:

With actual_type_parameters the overload resolves to the polymorphic function with the same name and amount of formal_type_parameters as there are actual_type_parameters. The nth formal_type_parameter is substituted with the nth actual_type_parameter and then the nth actual parameter's type matches the nth formal_parameter's type.

Without actual_type_parameters the overload resolves to:

  1. The monomorphic function with the same name and amount of formal_parameters where the nth actual parameter's type matches the nth formal_parameter's type.
  2. The polymorphic function with the same name and amount of formal_parameters where the nth expression's type matches the nth formal_parameter's type. The formal_type_parameters are inferred from the formal_parameters and actual_parameters as follows:
    • If the actual parameter's type is a primitive, enumeration or a monomorphic compound and the formal_parameter's type is a formal_type_parameter then that formal_type_parameter is inferred to be the actual parameter's type.
    • If the actual parameter's type and the formal_parameter's type match the same polymorphic compound then these rules are applied to the nth actual_type_parameter and the nth formal_type_parameter recursively.

The actual_parameters are passed to the called function by value.

4.7.3. Array access expressions

array_access_expression <- left_square_bracket expression right_square_bracket

left_square_bracket  <- '[' whitespace
right_square_bracket <- ']' whitespace

The indexed expression must be an array type or a pointer to an array type. If the indexed expression is an array type it is implicitly dereferenced. The expression is the index into the array. The expression must be integer typed, enumeration typed or an integer_literal which type is inferred as U32

The result is the nth element, starting from 0, which type is the same as the array's base type.

4.8. Primary expressions

primary_expression <- float_literal
                    / integer_literal
                    / boolean_literal
                    / unicode_code_point_literal
                    / utf8_string_literal
                    / compound_literal
                    / array_literal
                    / null_literal
                    / uninitialized_literal
                    / size_of_expression
                    / parenthesized_expression
                    / reference_expression
                    / enumeration_constant_access
                    / cast_expression
                    / blank_identifier

4.8.1. Floating point literals

float_literal <- [0-9]+ '.' [0-9]+ whitespace

The type of a float_literal is inferred. float_literals are not allowed to overflow their type.

4.8.2. Integer literals

integer_literal     <- (binary_integer / decimal_integer / hexadecimal_integer) whitespace

binary_integer      <- '0b' [01]+ ('_' [01]+)*
decimal_integer     <- [0-9]+ ('_' [0-9]+)*
hexadecimal_integer <- '0x' hexadecimal_digit+ ('_' hexadecimal_digit+)*
hexadecimal_digit   <- [0-9A-F]

The type of a integer_literal is inferred. The underscore doesn't change the value of the integer_literal. integer_literals are not allowed to overflow their type.

4.8.3. Boolean literals

boolean_literal <- true / false

boolean_literals are type of Bool.

4.8.4. Unicode code point literals

unicode_code_point_literal <- "'" (unicode_escape_sequence / !("'" / invalid_code_point) .) "'" whitespace
                
unicode_escape_sequence    <- '\u' hexadecimal_digit
                                   hexadecimal_digit
                                   hexadecimal_digit
                                   hexadecimal_digit
                                 
                            / '\U' hexadecimal_digit
                                   hexadecimal_digit
                                   hexadecimal_digit
                                   hexadecimal_digit
                                 
                                   hexadecimal_digit
                                   hexadecimal_digit
                                   hexadecimal_digit
                                   hexadecimal_digit

The hexadecimal_digits in unicode_escape_sequence represents a valid Unicode Code Point. The unicode_code_point_literal is replaced by with an integer_literal of the same value.

4.8.5. UTF-8 string literals

utf8_string_literal <- '"' (unicode_escape_sequence / !('"' / invalid_code_point) .)* '"' whitespace

utf8_string_literals are encoded as UTF-8 stored in a readonly #[]U8.

4.8.6. Compound literals

compound_literal <- left_curly_bracket
                        field_initializer (comma field_initializer)*
                    right_curly_bracket

field_initializer <- lower_case_identifier equal expression

Initializes the specified fields of a compound type which is inferred. The field_initializer's lower_case_identifier is the name of the field to initialize. The expression is the value of the given field. The type of the expression must match field's type or the expression must be inferable as the field's type.

0 or 1 field_initializer per field and uninitialized fields have zeroed values.

4.8.7. Array literals

array_literal       <- left_curly_bracket 
                           element_initializer (comma element_initializer)*
                       right_curly_bracket
                        
element_initializer <- (left_square_bracket (integer_literal | enumeration_constant_access) right_square_bracket equal)? expression 

left_curly_bracket  <- '{' whitespace
right_curly_bracket <- '}' whitespace

Initializes the specified elements of a fixed array which type is inferred.

The integer_literal or enumeration_constant_access is an explicit index of the element, starting from 0, to initialize. The explicit index cannot underflow or overflow the size of the inferred fixed_array_type. The integer_literal is inferred as U32. If there is no explicit index then it is implicitly equal to the previous index + 1 or 0 if this is the first element_initializer. Elements can be initialized once. Elements without an initializer are zeroed.

The expression is the value of the indexed element and its type is inferred as the array's base type.

4.8.8. Null literals

null_literal <- null

The value is what C dictates on the given platform. Only pointer typed variables can be null and the type of the null_literal is inferred.

4.8.9. Uninitialized literals

uninitialized_literal <- '---' whitespace

uninitialized_literal can only be used for declaring a variable and it must be the whole expression. It is either assigned to a variable or returned from a function. The actual value is undefined.

4.8.10. Size of expressions

size_of_expression <- sizeof left_parenthesis (type / expression) right_parenthesis

size_of_expression is substituted for an integer_literal of the size of a type or an expression's type in bytes.

4.8.11. Parenthesized expressions

parenthesized_expression <- left_parenthesis expression right_parenthesis

left_parenthesis  <- '(' whitespace
right_parenthesis <- ')' whitespace

The type and value of parenthesized_expression is the same as the expression.

4.8.12. Reference expressions

reference_expression <- lower_case_identifier actual_type_parameters?

reference_expression refers to a variable, formal_parameter or a function with the same name in scope. If the reference_expression refers to a function the type is inferred.

4.8.13. Enumeration constant access expression

enumeration_constant_access <- upper_case_identifier dot lower_case_identifier

upper_case_identifier is the enumeration type and lower_case_identifier is the enumeration_constant on said enumeration. The type of the expression is the same the enumeration and the value of the expression is the same as the enumeration_constant.

4.8.14. Cast expressions

cast_expression <- type left_parenthesis expression right_parenthesis

Converts the value of expression to type. enumeration types act as their underlying type.

4.8.15. Blank identifiers

blank_identifier <- '_' whitespace

Used in place of reference_expression in declaration_statement and assignment_statement and only on the left hand side of the operator.

5. Types

types            <- type (comma type)*
type             <- type_qualifiers unqualified_type
unqualified_type <- (dynamic_array_type / fixed_array_type / pointer_type)*
                    (polymorphic_compound_type / named_type / function_type)

polymorphic_compound_type <- upper_case_identifier actual_type_parameters
named_type                <- upper_case_identifier

actual_type_parameters <- left_square_bracket types right_square_bracket

function_type <- 'Function' whitespace
                  left_parenthesis types? right_parenthesis
                  return_types?

Number types and Bool are considered primitive types and they are always in scope.

The type following a dynamic_array_type, fixed_array_type or pointer_type is called the base type.

5.1. Zero values

The zero value for:

5.2. Matching types

2 types match if their type_qualifiers are the same and their unqualified_types match which they do in the following cases:

5.3. Integer types

Type Min Max
I8 -27 27-1
I16 -215 215-1
I32 -231 231-1
I64 -263 263-1
U8 0 28-1
U16 0 216-1
U32 0 232-1
U64 0 264-1

The number in each type is the number of bits used to store the its data. IXX is stored using Two's complement. IXX and UXX expressions besides literals may wrap around.

5.4. Floating point types

The F32 and F64 types behaves as binary32 and binary64 respectively as specified in IEEE 754.

5.5. Boolean types

Booleans are stored as an U8 and have the Bool type. Any non-zero value is true and zero is false.

5.6. Pointer types

pointer_type <- '#' whitespace

5.7. Qualifiers

type_qualifiers <- readonly? noalias?

A type is Pointerless if it doesn't have a pointer_type, dynamic_array_type and if pressent, polymorphic_compound_type and named_type doesn't refer to a compound with a field that is not Pointerless.

5.7.1. readonly

readonly qualified expressions cannot appear on the left hand side of assignment_operators.

5.7.2. noalias

formal_parameter's types that are not Pointerless can be noalias qualified. It is assumed that data accessed in a noalias qualified formal_parameter is exclusively accessed through that formal_parameter.

5.8. Dynamic array types

dynamic_array_type <- left_square_bracket right_square_bracket

[]T behaves as:

structure Dynamic_Array[Base_Type] 
    #Base_Type elements
    U32 length
    U32 capacity
end

but without putting Dynamic_Array[Base_Type] in scope.

5.9. Fixed array types

fixed_array_type <- left_square_bracket integer_literal right_square_bracket

Is a fixed sized array of Base_Type with integer_literal > 0 elements stored contiguously without padding between elements. Its U32 length field is equal to the integer_literal.