Owen specification 0.22.0

The specification is versioned using Semver 2.0. Parser Expression Grammar is used to define the syntax. Files are encoded using UTF-8 and uses the .owen extension.

The design guidelines are:

What remains unsaid is mostly left so intentionally, either because it is derivable from stated rules of the language, or because it would unnecessarily restrict the freedom of implementors.

1. Files

file <- whitespace
        namespace_directive
        use_directive*
        declaration*

namespace_directive <- namespace upper_case_identifier
use_directive <- use upper_case_identifier

declaration <- function_declaration
             / external_function_declaration
             / proposition_declaration
             / compound_declaration
             / enumeration_declaration
             / version_declaration

upper_case_identifier <- [A-Z][a-z0-9]* (_[A-Z][a-z0-9]*)* whitespace
lower_case_identifier <- !keyword [a-z][a-z0-9]* (_[a-z][a-z0-9]*)* whitespace
keyword <- namespace
         / use
         / public
         / external
         / function
         / input
         / output
         / end
         / if
         / elif
         / else
         / for
         / while
         / break
         / continue
         / structure
         / proposition
         / enumeration
         / of
         / sizeof
         / union
         / return
         / true
         / false
         / assert
         / null
         / generalize
         / version
        
namespace <- 'namespace' whitespace
use <- 'use' whitespace
public <- 'public' whitespace
external <- 'external' whitespace
function <- 'function' whitespace
input <- 'input' whitespace
output <- 'output' whitespace
end <- 'end' whitespace
if <- 'if' whitespace
elif <- 'elif' whitespace
else <- 'else' whitespace
for <- 'for' whitespace
while <- 'while' whitespace
break <- 'break' whitespace
continue <- 'continue' whitespace
structure <- 'structure' whitespace
proposition <- 'proposition' whitespace
enumeration <- 'enumeration' whitespace
of <- 'of' whitespace
sizeof <- 'sizeof' whitespace
union <- 'union' whitespace
return <- 'return' whitespace
true <- 'true' whitespace
false <- 'false' whitespace
assert <- 'assert' whitespace
null <- 'null' whitespace
generalize <- 'generalize' whitespace
version <- 'version' whitespace

whitespace <- (' ' / '\n' / comment)*
comment <- '//' (!'\n' .)* '\n'?

The namespace_directive specifies that all declarations in the file are in the given name space. The namespace_directive and use_directive puts all the public declarations in the given name space in scope of the file.

2. Declarations

If there are multiple declarations in scope that matches, it is an error.

Types with the same name and amount of generalized types matches.

Functions with the same name and input types (generalized types differs from non-generalized types) in the same order matches.

2.1 Functions

function_declaration <- public? function_signature
                            statements
                        end

function_signature <- function lower_case_identifier
                          (generalize upper_case_identifiers)?
                          (input arguments)?
                          (output types)?

upper_case_identifiers <- upper_case_identifier (comma upper_case_identifier)*
arguments <- type lower_case_identifier (comma type lower_case_identifier)*

Declares a function named lower_case_identifier. input defines a list of arguments that a caller must pass to the function. The arguments are in the same scope as statements and must have unique names. Each input must be used in a reference_expression. Functions can be overloaded with a different order of input types. The output list is the types of the values that the function returns in the order they are listed.

Functions defining output must end with a terminating statement. A terminating statements is either a return_statement or if_statement where the else branch is present and all branches ends with terminating statements.

2.1.1. The main function

The main function is the entry point of the program and it has one of the function signatures:

function main
    output I32
    
function main
    input #[][]U8
    output I32

2.1.2. Foreign function interface

external_function_declaration <- public? external function_signature 
                                     utf8_string_literal
                                     utf8_string_literal

Defines a reference to a C function in a library. The first utf8_string_literal is the path to the library. The second utf8_string_literal is the function's name in said library. Foreign functions cannot return multiple values.

2.2. Propositions

proposition_declaration <- proposition
                               statements
                           end

Propositions are nameless functions that returns no values. They are run before the main function if they are included.

2.3. Compounds

compound_declaration <- public? (structure / union) upper_case_identifier
                            (generalize upper_case_identifiers)?
                             field+
                        end

field <- type lower_case_identifier

structure fields are laid out in memory as they are lexically declared. Padding may be inserted between fields. The size of the structure is the sum of its fields and padding.

union fields starts at the same address. The size of the union is the size of the largest field.

field names must be unique within the compound_declaration and cannot have the same name as compound_declaration.

The nth upper_case_identifier in upper_case_identifiers is replaced by the nth type in generalized_types.

2.4. Enumerations

enumeration_declaration <- public? enumeration upper_case_identifier of upper_case_identifier
                               enumeration_constant*
                           end

enumeration_constant <- lower_case_identifier (assign integer_literal)?

The first upper_case_identifier is the name of the enumeration. The last upper_case_identifier is the underlying type of the enumeration which must be an IXX or UXX.

Each enumeration_constant must have an unique lower_case_identifier within the declaration. If an enumeration_constant omits the integer_literal then it is the value of the last constant + 1. If the first constant omits the integer_literal then its value is 0. Every enumeration_constant must fit within the underlying type.

2.5. Versions

version_declaration <- version upper_case_identifier
                           declaration*
                       end

Conditionally compiles declarations if the upper_case_identifier has been declared in the command line. The upper_case_identifier is not part of the program's scope.

3. Statements

statements <- statement* block_ending_statement?
                
statement <- declaration_statement
           / assignment_statement
           / expression_statement
           / if_statement
           / for_statement
           / while_statement
           / assert_statement
           / version_statement
             
block_ending_statement <- break_statement 
                        / continue_statement
                        / return_statement

Each statement are executed in lexical order.

3.1. Declaration statements

declaration_statement <- variable (comma variable)* '=' expressions 
variable <- type lower_case_identifier

Declares variables in the current scope with the given lower_case_identifier as the name and type. A variable cannot have the same name as anything else in scope. variables are evaluated from left to right and exists in scope only after it has been assigned an expression. Each variable must be used in a reference_expression.

A Ballanced assignment consist of the same amount of expressions on each side of the assignment_operator. The nth expression on the right hand side is assigned to the nth variable on the left hand side.

Tuple assignment consist of a call_expression on the right hand side that returns multiply values and the nth variable on the left hand side is assigned the nth returned value.

Both operand's types must match or the left hand side must be a pointer type and the right must be null.

3.2. Assignment statements

assignment_statement <- expressions assignment_operator expressions
expressions <- expression (comma expression)*
assignment_operator <- ([+-*/&|^%] / '<<' / '>>')? '=' whitespace

assignment_statement works the same as declaration_statement except that:

Each expression on the left hand side must be addressable. reference_expression, field_access_expression, array_access_expression and dereference_expression are addressable.

x op= expression is equivalent to x = x op expression where x evaluates once.

3.3. Expression statements

expression_statement <- expression

The expression must be a call_expression that returns no values.

3.4. If statements

if_statement <- if (declaration_statement semicolon)? expression
                    statements
               (elif (declaration_statement semicolon)? expression
                    statements)*
               (else
                    statements)?
                end

semicolon <- ';' whitespace

Each expression is evaluated in lexical order until one is true. The statements following the expression are then executed. If expressions are false and the else block is defined then its statements are executed.

each declaration_statement is executed before the following expression. The declared variables are in the scope of all the following statements in the if_statement.

3.5. For statements

for_statement <- for declaration_statement semicolon expression semicolon assignment_statement
                     statements
                 end

The declaration_statement is executed before the first iteration. The declared variables are in the scope of all the following statements. The expression is the condition for executing the statements which must be true. The assignment_statement is executed after each iteration. The break_statement skips execution of the assignment_statement.

3.6. While statements

while_statement <- while (declaration_statement semicolon)? expression
                       statements
                   end

The expression must be of type of Bool. If the expression is true, then the statements are executed. After the statements have executed, the expression is evaluated again, and if true the statements are executed again. This continues until the expression is false. The declaration_statement is executed before the first iteration and declared the variables is in the same scope as statements.

3.7. Break statements

break_statement <- break

The break_statement stops the execution of the innermost loop in which it is declared. Execution resumes after the innermost loop.

3.8. Continue statements

continue_statement <- continue

The continue_statement begins the next iteration of the innermost loop in which it is declared.

3.9. Return statements

return_statement <- return expressions?

Returns the control the function that called the one that contains the return statement. If the function containing the return statement doesn't specify any output, then the statement cannot specify any expressions to return and the function may omit the statement entirely. Since in that case the control is returned to the caller after the last statement. If output is defined, then all code paths must have a return statement where the nth expression's type matches the nth output type.

3.10. Assert statements

assert_statement <- assert expression

The expression must be type of Bool. If the expression is true, then nothing happens. If the expression is false, a description of the failing assertion is given. If the assertion is in a function the program stops, but if the assertion is in a proposition the current proposition stops and the next starts executing.

3.11. Version statements

version_statement <- version upper_case_identifier
                         statements
                     end

Works the same way as Version Declarations but for statementss instead.

4. Expressions

expression <- logical_or_expression

logical_or_expression has the lowest precedence and postfix_expression has the highest precedence. Operators are left associative unless noted otherwise.

4.2. Logical or expressions

logical_or_expression <- logical_and_expression ('||' whitespace logical_and_expression)*

Both operands must be type of Bool. The type of the logical_or_expression is Bool. The right hand side is not executed if the left hand side is true. The logical_or_expression is true if either operand is true.

4.3. Logical and expressions

logical_and_expression <- relational_expression ('&&' whitespace relational_expression)*

Both operands must be type of Bool. The type of the logical_and_expression is Bool. The right hand side is not executed if the left hand side is false. The logical_and_expression is true if both operands are true.

4.4. Relational expressions

relational_expression <- additive_expression (('==' / '!=' / '<=' / '>=' / '<' / '>') whitespace additive_expression)*

The comparison results in a Bool. The operators are == (equal), != (not equal), <= (less than or equal), >= (greater than or equal), < (less than) and > (greater than).

For == and != both operands must be of the same primitive, pointer or enumeration type.

For <=, >=, < and > both operands must be of the same number, pointer or enumeration type.

4.5. Additive expressions

additive_expression <- multiplicative_expression (('+' / '-' / '|' / '^') whitespace multiplicative_expression)*

If both operands are the same number type the type of the expression is the same as the operands. + (add) and - (subtract) apply to any number type. | (bitwise or) and ^ (bitwise xor) only applies to IXX and UXX.

If the left hand expression is a pointer type, the right hand expression is a UXX type and the operator is either + or - then expression * size of structure in bytes is either added or subtracted. The type of the additive expression is the same as the pointer type.

If both operands are the same enumeration type then | and ^ applies to the underlying value resulting in the same type as the operands.

4.6. Multiplicative expressions

multiplicative_expression <- prefix_expression (('*' / '/' / '%' / '&' / '<<' / '>>') whitespace prefix_expression)*

If both operands are the same number type the type of the expression is the same as the operands. * (multiply), / (divide), % (modulo), & (bitwise and) << (left shift) and >> (right shift) applies to IXX and UXX. *, / and % also applies to FXX.

If both operands are the same enumeration type then & applies to the underlying value resulting in the same type as the operands.

4.7. Prefix expressions

prefix_expression <- not_expression 
                   / negate_expression 
                   / address_of_expression 
                   / dereference_expression
                   / postfix_expression

Prefix operators are right associative.

4.7.1. Not expressions

not_expression <- '!' whitespace prefix_expression

Inverts a Bool.

4.7.2. Negate expressions

negate_expression <- '-' whitespace prefix_expression

Negates the value of IXX or FXX. The type of the expression is the same as the negated value.

4.7.3. Address of expressions

address_of_expression <- '#' whitespace prefix_expression

Takes the address of an addressable prefix_expression. The type of the address_of_expression is a pointer type that points to

4.7.4. Dereference expressions

dereference_expression <- '@' whitespace prefix_expression

Deferences the prefix_expression. The type of the dereference_expression is the same as what is the prefix_expression is pointing to.

4.8. Postfix expressions

postfix_expression <- primary_expression (field_access_expression / call_expression / array_access_expression)*

4.8.1. Field access expressions

field_access_expression <- dot lower_case_identifier
dot <- '.' whitespace

The lower_case_identifier must be the name of a field. The type of the expression is the same the field's type.

4.8.2. Call expressions

call_expression <- left_parenthesis expressions? right_parenthesis

The primary_expression being called must have a function type. The nth expression's type must match the nth input type.

If the primary_expression is an reference_expression to a function with generalized types that are not used by an argument then all generalized types must be specified by the reference_expression's generalized_types. The reference_expression may specify generalized_types anyway.

4.8.3. Array access expressions

array_access_expression <- left_square_bracket expression right_square_bracket

left_square_bracket <- '[' whitespace
right_square_bracket <- ']' whitespace

The indexed expression must be an array type. The type of array_access_expression is a pointer to the element type. The expression must be an IXX. The value must be within 0 and the size of the dimension it is used to index into.

4.9. Primary expressions

primary_expression <- float_literal
                    / integer_literal
                    / boolean_literal
                    / unicode_code_point_literal
                    / utf8_string_literal
                    / array_literal
                    / compound_literal
                    / null_literal
                    / uninitialized_literal
                    / size_of_expression
                    / parenthesized_expression
                    / reference_expression
                    / enumeration_constant_access
                    / cast_expression

4.9.1. Floating point literals

float_literal <- '-'? [0-9]+ '.' [0-9]+

The F32 and F64 types behaves as binary32 and binary64 respectivly as specified in IEEE 754. The type of a float_literal is inferred.

4.9.2. Integer literals

integer_literal <- '-'? (binary_integer / decimal_integer / hexadecimal_integer)

binary_integer <- '0b' [01]+ ('_' [01]+)*
decimal_integer <- [0-9]+ ('_' [0-9]+)*
hexadecimal_integer <- '0x' hexadecimal_digit+ ('_' hexadecimal_digit+)*
hexadecimal_digit <- [0-9A-F]

The type of a integer_literal is inferred. The underscore doesn't change the value of the integer_literal. IXX is stored using Two's complement. IXX and UXX wrap around.

Type Min Max
I8 -27 27-1
I16 -215 215-1
I32 -231 231-1
I64 -263 263-1
U8 0 28-1
U16 0 216-1
U32 0 232-1
U64 0 264-1

4.9.3. Boolean literals

boolean_literal <- true / false

boolean_literals are type of Bool.

4.9.4. Unicode code point literals

unicode_code_point_literal <- "'" (unicode_escape_sequence / !"'" .) "'"
                
unicode_escape_sequence <- '\\u' hexadecimal_digit hexadecimal_digit hexadecimal_digit hexadecimal_digit
                         / '\\U' hexadecimal_digit hexadecimal_digit hexadecimal_digit hexadecimal_digit
                                 hexadecimal_digit hexadecimal_digit hexadecimal_digit hexadecimal_digit

The hexadecimal_digits in unicode_escape_sequence represents a unicode code point in a U32.

4.9.5. UTF-8 string literals

utf8_string_literal <- '"' (unicode_escape_sequence / !'"' .)* '"' whitespace

utf8_string_literals are encoded as UTF-8 which are syntax sugar for []U8.

4.9.6. Array literals

array_literal <- type (left_square_bracket right_square_bracket)+
               / dimension+ elements

dimension <- left_square_bracket integer_literal? right_square_bracket
elements <- left_curly_bracket element (comma element)* right_curly_bracket
element <- elements / expressions

left_curly_bracket <- '{' whitespace
right_curly_bracket <- '}' whitespace

Dynamic arrays are defined by type and dimensions. Static arrays are defined by dimensions and elements. Each dimension's expression must fit within U32. Arrays have a length and pointer field.

4.9.7. Compound literals

compound_literal <- left_curly_bracket
                        field_initializer (comma field_initializer)*
                    right_curly_bracket

field_initializer <- lower_case_identifier equal expression

Initializes the specified fields of a compound type which is inferred. The field_initializer's lower_case_identifier is the name of the field to initialize. The expression is the value of the given field. The type of the expression must match field's type.

0 or 1 field_initializer per field and uninitialized fields have undefined values.

4.9.8. Null literals

null_literal <- null

The value is what C dictates on the given platform. Only pointer typed variables can be null.

4.9.9. Uninitialized literals

uninitialized_literal <- '---' whitespace type

uninitialized_literal can only be used for declaring a variable and it must be the whole expression assigned to the variable. The actual value is undefined.

4.9.10. Size of expressions

size_of_expression <- sizeof left_parenthesis (type / expression) right_parenthesis

size_of_expression is substituted for an U32 of the size of a type or an expression's type in bytes.

4.9.11. Parenthesized expressions

parenthesized_expression <- left_parenthesis expression right_parenthesis

The type and value of parenthesized_expression is the same as the expression.

4.9.12. Reference expressions

reference_expression <- lower_case_identifier generalized_types?

reference_expression refers to a variable, input or function with the same name in scope. If the reference_expression refers to a function the type is inferred.

4.9.13. Enumeration constant access expression

enumeration_constant_access <- upper_case_identifier dot lower_case_identifier

upper_case_identifier is the enumeration type and lower_case_identifier is the enumeration_constant on said enumeration. The type of the expression is the same the enumeration and the value of the expression is the same as the enumeration_constant.

4.9.14. Cast expressions

cast_expression <- type left_parenthesis expression right_parenthesis

Converts the value of expression to type.

When casting an integer (enumeration types act as their underlying type) type to another integer type:

When casting a FXX to FXX:

When casting from a FXX to an integer type the fraction is discarded (truncation towards zero). If the value of the expression is NaN, infinite or value doesn't fit in type the result is undefined.

When casting from an integer type to a FXX the result is a FXX representing the same value as the expression.

When the expression is a reference_expression to a function then type must a function type which matches an overload in scope. The reference_expression is not allowed to specify generalized_types. The result is a reference to that specific overload.

5. Types

types <- type (comma type)*
type <- (array_type_of / pointer_type_of)* (upper_case_identifier generalized_types? / function_type)

array_type_of <- left_square_bracket right_square_bracket

pointer_type_of <- '#' whitespace

generalized_types <- left_angle_bracket types right_angle_bracket

function_type <- function
                     (input types)?
                     (output types)?

A type can be inferred from a cast_expression, call_expression, field_initializer, the nth output type, the type of the nth variable in a declaration_statement, the type of the nth expression in an assignment_statement or the other operand of a binary expression.