osa1

My thoughts on OCaml

April 24, 2023 - Tagged as: en, plt, ocaml.

Since 2013 I’ve had the chance to use OCaml a few times in different jobs, and I got frustrated and disappointed every time I had to use it. I just don’t enjoy writing OCaml.

In this post I want to summarize some of the reasons why I don’t like OCaml and why I wouldn’t choose it for a new project today.

No standard and easy way of implementing interfaces

To me it’s absolutely essential that the language should have some way of defining interfaces, implementing those interfaces for the types, and programming against those interfaces.

In Haskell, this is done with typeclasses. Rust has a similar mechanism called traits. In languages with classes this is often done with abstract classes and “implementing” those classes in new classes (e.g. implements in Dart).

In OCaml there’s no way to do this. I have to explicitly pass functions along with my values, maybe in a product type, or with a functor, or as an argument.

Regardless of how I work around this limitation, it’s extremely inconvenient. Things that must be trivial in any code base, such as converting a value to a string for debugging purposes, become a chore, and sometimes even impossible.

As far as I know, there was at least one attempt at ameliorating this with modular implicits (implicit parameter passing), but I don’t know what happened to it since 2017. It looks like it’s still not a part of the language and the standard library is not using it.

Bad standard library

OCaml’s standard library is just bizarre. It has lots of small issues, and a few larger ones. It’s really just extremely painful to use.

Some examples of the issues:

Zoo of printing/debugging and conversion functions such as string_of_int, string_of_float, print_char, Int64.of_int, string_of_int, …
Overly polymorphic operators with type 'a -> 'a -> bool such as = (called “structural equality”, throws an exception if you pass a function) and >. Code that uses these operators will probably not work on user-defined types as expected.
Standard types are sometimes persistent, sometimes mutable. List, Map, and Set are persistent. Stack and Hashtbl are mutable.
Inconsistent naming:
- Length function for Map is cardinal, length function for Hashtbl is length.
- The “bytes” type is Bytes.t, the big int type is Big_int.big_int (instead of Big_int.t). The functions in these modules are also inconsistently named. Big_int functions are suffixed with _big_int, Bytes module functions are not prefixed or suffixed.
The regex module uses global state: string_match runs a regex and sets some global state. matched_string returns the last matched string using the global state.
Lack of widely used operations such as popcount for integer types, unicode character operations.
It doesn’t have proper string and character types: String is a byte array, char is a byte.

The bad state of OCaml’s standard library also causes fragmentation in the ecosystem with two competing alternatives: Core and Batteries.

Syntax problems

OCaml doesn’t have a single-line comment syntax.

The expression syntax has just too many issues. It’s inconsistent in how it uses delimiters. for and while end with end, but let, if, match, and try don’t, even though the right-most non-terminal is the same in all of these productions:

expr ::= ...
      | while <expr> do <expr> done
      | for <value-name> = <expr> ( to | downto ) <expr> do <expr> done
      | let <let-binding> in <expr>
      | if <expr> then <expr> [ else <expr> ]
      | match <expr> with (| <pattern> [ when <expr> ] -> <expr>)+
      | try <expr> with (| <pattern> [ when <expr> ] -> <expr>)+
      ...

It has for and while, but no break and continue. So you use exceptions with a try inside the loop for continue, and outside for break.

It also has lots of ambiguities, and some of these ambiguities are resolved in an unintuitive way. In addition to making OCaml difficult to parse correctly, this can actually cause incorrect reading of the code.

Most common example is probably nesting match and try expressions:

match e0 with
| p1 -> try e1 with p2 -> e2
| p3 -> e3

Here p3 -> e3 is a part of the try expression.

Another example is the sequencing syntax <expr> ; <expr> and productions with <expr> as the right-most symbol:

let test1 b =
  if b then
    print_string "1"
  else
    print_string "2"; print_string "3"

Here print_string "3" is not a part of the if expression, so this function always prints “3”.

However, even though match also has <expr> as the right-most symbol, it has different precedence in comparison to semicolon:

let test2 b =
  match b with
  | true -> print_string "1"
  | false -> print_string "2"; print_string "3"

Here print_string "3" is a part of the false -> ... branch.

Try to guess how these functions are parsed:

(* Is the last print part of `else` or not? *)
let test3 b =
  if b then
    print_string "1"
  else
    let x = "2" in
    print_string x;
    print_string "3"

(* Is this well-typed? *)
let test4 b =
  if b then
    1, 2
  else
    3, 4

(* Is the type of this `(int * int) array -> unit` or `int array -> unit * int`? *)
let test5 a = a.(0) <- 1, 2

(* What if I replace `,` with `;`? Does this set the element 1 or 2? *)
let test6 a = a.(0) <- 1; 2

When writing OCaml you have to keep these rules in mind.

It also has the “dangling else” problem:

(* Is `else` part of the inner `if` or the outer? *)
if e1 then if e2 then e3 else e4

Finally, and I think this is probably the most strange thing about OCaml’s syntax and I’m not even sure what’s exactly happening here (I can’t find anything relevant in the language documentation), comments in OCaml are somehow tokenized and those tokens need to be terminated. They can be terminated inside another comment, or even outside. This is a bit difficult to explain but here’s a simple example:

(* " *)
print_string "hi"

OCaml 5.0.0 rejects this program with this error:

File "./test.ml", line 2, characters 16-17:
2 | print_string "hi"
                    ^
  String literal begins here

From the error message it seems like the " in the comment line actually starts a string literal, which is terminated in the first quote of "hi". The closing double quote of "hi" thus starts another string literal, which is not terminated.

However that doesn’t explain why this works:

(* " *)
print_string "hi"
(* " *)
print_string "bye"

If my explanation of the previous version were correct this would fail with an unbound hi variable, but it works and prints “bye”!

Rest of the package is also not that good

I’m not following developments in OCaml ecosystem too closely, but just two years ago it was common to use Makefiles to build OCaml projects. The language server barely worked on a project with less than 50 kloc. There was no standard way of doing compile-time metaprogramming and some projects even used the C preprocessor (cpp).

Some of these things probably improved in the meantime, but the overall package is still not good enough compared to the alternatives.

But at least it’s a functional language?

Almost all modern statically typed languages have closures, higher-order functions/methods, lazy streams, and combinators that run efficiently. Persistent/immutable data structures can be implemented even in C.

Also, OCaml has no tracking of side-effects (like in Haskell), and the language and the standard library have lots of features and functions with mutation, such as the array update syntax, mutable record fields, Hashtbl, and the regex module.

The only thing that makes OCaml more “functional” than e.g. Dart, Java, or Rust is that it supports tail calls. While having tail calls is important for functional programming, I would happily give up on tail calls if that means not having the problems listed above.

Also keep in mind that when you mix imperative and functional styles tail calls become less important. For example, I don’t have to implement a stream map function in Dart with a tail call to map the rest of the stream, I can just use a while or for loop.

When should I use it?

In my opinion there is no reason to use OCaml in a new project in 2023. If you have a reason to think that OCaml is the best choice for a new project please let me know your use case, I’m genuinely curious.

(Show comments)