My first impressions with OCaml March 12, 2022 on György Kurucz's blog
~8 minutes, 1517 words

I currently am on a quest to find a generally useful functional language that I could experiment with on various projects.

While this article specifically is about OCaml, I have plans for trying out other languages as well, so first I will talk in general about my criteria for evaluating functional languages.

How am I evaluating functional languages

I don’t consider this list to be complete in any way, and expect it to be refined and extended as I learn more and try out new languages. Also, a large part of this can be applied to any software platform in general, not just functional programming languages.

Tooling

In general, a single tool for a purpose that’s “blessed” and endorsed by the ecosystem (or is at least a de facto standard) is a plus. Fragmentation of the ecosystem doesn’t really seems useful in this context.

Build system

Incremental compilation and good performance in general are nice to have. The build system also shouldn’t be overly difficult to learn and use.

Formatter

Ever since I tried prettier I have been hooked on the idea of not having to care at all about the formatting of my source code. I think having such a formatter is even more useful for functional languages, as deeply nested expressions can be especially tricky to format by hand.

Editor support

Type systems can be used to provide all kinds of useful editor features like smarter syntax highlighting, type annotations/queries, and context aware autocompletion. All of these are a plus for me.

Runtime environment

One feature I would value quite much that might sound surprising at first is the ability to target JavaScript/WASM. As it stands right now, the web is by far the largest and most commonly available software platform, despite all of its shortcomings. Being able to deploy my code onto it can be quite valuable.

For server use I don’t have any requirements, as with today’s container technology you can deploy pretty much anything anywhere.

Learnability

This criterion might even sound too obvious written out like this, but I want the language to be learnable with a reasonable amount of effort. By this I don’t mean that the language shouldn’t be complex, but there should be some resource, or collection of resources, that can give you a comprehensive understanding of the whole language.

Having a language specification is of course the best, but I am willing to compromise with a collection of other resources as well, as long as that’s a reasonable substitute for a specification. (For example, if I have a specific question about how some part of the language works, there should be a way to get an answer to that with a reasonable amount of effort.)

Functional purity

While purity is certainly useful, you eventually do have to interface with the real world, and have some side effects in one way or another.

I already use pure function as a generally useful tool in basically any language, and never felt the need for the language to enforce that. Nevertheless, my impression is that the methods for describing side effects in pure languages are quite mature too, and I wouldn’t mind using such a language either.

Type system

I would like a statically typed language. Not that I take any issue with dynamic typing, but I already have my default go-to language for that (Python), and various alternatives as well (JavaScript, Lua, even any kind of LISP).

The issue is that beyond a certain level of complexity, you are bound to make mistakes in a dynamic language, and a type system can effectively alleviate large classes of them. While I agree that static typing can also be a source of mental overhead, and it’s sometimes more convenient to just program without strong types, I want the option to ask the compiler for help with checking my work whenever I feel like I need it. I am just a human after all, and bound to eventually make mistakes.

I am open to trying various type systems, though some features I probably want are algebraic data types, and some method of generalizing over types. But stronger features like higher kinded types, or even dependent types are a plus too.

My first impressions with OCaml

So with all that said, I will give you my first impressions of OCaml.

It has a pretty mature community, a long history, and there are large companies (Facebook1Yes it’s “Meta” I know… for instance) using it and helping with the development of the ecosystem.

The package management seems quite good, opam is the de facto standard package manager. I really like that the opam repository is curated, and the compiler developers actually use it as a giant test suite for the compiler. Submitting a package requires a pull request in opam-repository, so this way every package publication is reviewed. The compiler, package manager, package repository, and the build system all live under the same GitHub organization.

The formatter, ocamlformat, is a relatively new addition to the ecosystem, but it seems to work very well, I haven’t had any problems with it.

The editor support seems good as well. I only tried the plugin for VS Code for ease of installation, but I assume that it would work well with any LSP capable editor.

As for actually running your programs, OCaml compiles natively to most mainstream processor architectures. There are also multiple solutions for compiling it to JavaScript, namely js_of_ocaml and ReScript. While js_of_ocaml is intended for OCaml developers to compile their code to JavaScript, and integrates with the ecosystem, ReScript is targeted more at JavaScript developers seeking better type safety, and actually comes with an alternative JavaScript-like syntax for OCaml, but still has support for the regular syntax. (Note that actually js_of_ocaml also supports the ReScript syntax.) The ReScript syntax can be useful for using React as it supports JSX.

While I read from multiple sources that the language was lacking good learning materials, I had very good experience with the Real World OCaml book. There is a sort of language reference available as well, but the exact type checking and inference rules seem to be scattered in various research papers and university course notes. Nevertheless, because of the academic background, I have confidence in the soundness of the type system.

The language is not pure, you can have side effects, mutable values, and global variables. The language makes it explicit which values are mutable, but not which functions do and do not have side effects.

The type system is okay, you get amazing type inference, GADTs2Generalized algebraic data types, but no higher kinded types (though you can supposedly emulate them with modules). The module system is powerful, but also pretty weird, there is a distinction between first-class and non-first-class modules, with explicit conversions between them, an artifact of first-class modules having been added later into the language. One annoying thing is that the language calls parameterized modules functors, which don’t really have anything to do with actual functors3I of course mean functors from category theory. For some reason programming languages really like naming random things functors, C++ and Prolog are also guilty of this..

Metaprogramming is done with the so-called ppx system, which can apply arbitrary AST level transformations to the source code. This sounds quite powerful, but the system is used for some core functionality like generating comparison functions and pretty printers, and it actually has some pretty major flaws.

First of all, the AST is apparently not very stable between compiler versions, and in practice ppx derivers have to be updated with each compiler version. The barrier to entry to writing them is also very high, requiring understanding of some compiler internals. I also had issues with composing extension nodes (using one inside of another), which is again pretty annoying given that they provide some core functionality.

There supposedly is a debugger for the language, but I could not get it working based on the official docs.

By far the largest issue I have with the language so far is the difficulty of debugging. It’s hard to insert random print statements to inspect arbitrary values, since there is no default pretty printer for values. Setting them up is non-trivial, given that the ppx derivers are non-recursive, so you have to annotate each type definition separately. The issue is exacerbated by the standard library’s (well actually there are multiple standard libraries, I was using base) lack of pretty printing definitions. All of these issues together make printf debugging needlessly difficult.

In summary, my largest complaints with the language are lack of good debugging methods, and lack of higher kinded types. But as there does not seem to be a fundamental obstacle to solving either of these (as I said earlier, higher kinded types can in theory be emulated with modules), I might consider revisiting OCaml sometime in the future.


  1. Yes it’s “Meta” I know… ↩︎

  2. Generalized algebraic data types ↩︎

  3. I of course mean functors from category theory. For some reason programming languages really like naming random things functors, C++ and Prolog are also guilty of this. ↩︎