A New Language

About 8 months ago I was in the middle of writing an API in C#. The API took a Parser Expression Grammar as input then generated each valid string within some parameters. The caller specified the maximum number of repetitions and such. The idea was to use it for testing handwritten parsers. The API generated code using reflection to up the performance and analyzed the PEG so it could optimize the generated code. It was also there I realized that in real world scenarios the tests would run for a really long time. I mildly succeeded in using Parallel.ForEach to squeeze out more performance. In ideal situations execution time was halved. Out of curiosity I ported the single threaded generated code to C which was nearly identical anyway. Single thread performance doubled. The reason being that in C# I used an array of characters, changed them as needed and then new'ed up a string from the array. That meant trillions of string constructions, garbage collecting them and so on. In C I just used the same array and poof, solid performance bump. Since then I have been obsessed with performance and efficiency.

I began looking for another programming language to use for my low level stuff. I already knew C... I like the power C has it's just clunky to program in. C++, a waterfall of no. Then D... you get the point, I could not find something that satisfied my needs. I could not find any language with the syntax, semantics and tooling (compilation process, refactoring etc.) I wanted. Looking back I think its weird that I had to see Jonathan Blow's Ideas about a new programming language for games video before realizing that implementing my own programming language was an option considering that I have made several compilers from scratch before watching that video.

Kevlin Henney has definitely influenced my language design and my programming in general. His talk Software Is Details is excellent IMHO. Inspired by his talks I've tried to improve documenting unit tests and line length. At least compared to what I'm used to. One thing that has bugged me since day one of unit testing is naming tests: summary_of_test_description or Method_State_Expectation etc. IMHO these names become hard to read and maintain not to mention that they often stray from the naming convention normally used in the code base. I chose to do what D does but instead of having the unittest keyword, I chose proposition:

// Detailed explanation of the behaviour being tested.
    // Asserting behaviour...

The proposition keyword avoids the awkward use of unittest for other kinds of tests such as Integration tests that says Hey! I'm a unit test! but isn't. A really nice thing about the D way is that unit test have no names which removes the need for writing documentation abstracts as function names. There is only a detailed explanation as a comment if needed.

I've also used line length as a guide line for designing Owen. It is often recommended that lines in a text doesn't exceed 60 characters. Function signatures can get long quickly. You can in many languages split them into multiple lines but that looks weird to me. Especially when { comes into play. This is where the input and output keywords comes in:

function test
    input i32 a,
          i32 b

    output i32,

    return a,

In a C like language this would look like:

(i32, i32) test(i32 a, i32 b)
    return (a, b);

This is a trivial example but put on some generics and longer type names then the C like function signature will have different weather systems at each end of the line. Owen's solution comes at a cost of more lines but I'm very curious if it is a problem in practice. Note that input and output are not statements but are part of the function signature.

The next big question is how memory is managed? I experimented with Rust since it can figure out when to allocate and free memory at compile time due to Ownership. I started programming a PL/0 compiler to get a sense of how to use it. While I did get the parser going I could not shake the feeling that there was too much friction getting there. This could very well just have been me fighting the Borrow Checker which seems to be common for beginners. Manual memory management it is then. However it might be useful to have owned and borrowed pointers be distinct types but work the same way. The idea being that the APIs can explicitly say who is in charge of the memory but as in clasic C its your job to free it if you want to.