blog.robur.coop

The Robur cooperative blog.
Back to index

A review about ptt

2026-05-08

Perhaps it's time to take stock of our email work! We’ve been working on several email-related projects for over a year now, and it feels like a good moment to step back and offer an overview of everything we’ve built and the results we’ve achieved. The whole effort revolves around a single project: ptt.

Along the way, we also took the opportunity to consolidate a number of building blocks that weren’t strictly required for our goals, but which could pave a better path for unikernels in general. The resulting unikernels confirm that this path is, indeed, very promising.

For that reason, this article is divided into two parts: the first covers emails and what we have developed for them, and the second focuses on all the 'side projects' that ultimately allowed us to bring ptt to completion.

Below is a comprehensive list of the email-related libraries we use, with a short description of each.

Composing!

After seeing this list, you might think: what a nightmare! Too many libraries (with odd names, I admit), and it’s hard to make sense of it all.

This design did not come out of nowhere. Some people prefer a homogeneous ecosystem (built around their own rules) with a strict hierarchy of dependencies (where their own libraries sit at the centre) and authoritarian composition mechanisms that wire everything to everything. We prefer an organisation that is a bit more... erratic, as the saying goes.

Each library listed here has its own purpose and can be used in several contexts. Unikernels first, of course, but the existence of the blaze tool shows that we can take these libraries piece by piece and end up with something quite different from what we originally imagined. The fact that you can use ocaml-base64 in your web application whilst we use it in our email stack illustrates our ability to reach our own goals whilst remaining mindful of what the OCaml community actually does in practice (and yes, everyone uses ocaml-base64!).

Many people complain about this "disorganisation" within the OCaml community. I have always seen it as an opportunity: an opportunity for our cooperative to carve out its own space, and an opportunity for others to do the same, with occasional bridges of interconnection (rather than, as some would like, links of subjugation).

Encoding & decoding emails

I would really like to take the time to explain this part, even if my talk already gives a good overview of just how complex it is to parse emails properly.

If we look at what is being done in other communities, very few solutions actually parse emails correctly according to the standards. The approach taken by most developers is to skip the standards entirely and rely on a vague idea of what an email might look like.

This often leads to entire email stacks built on top of rather scary parsers! Mr.MIME (together with the related libraries prettym and unstrctrd) is designed not only to decode emails but also to encode them, from OCaml values. Taking a step back, the real problem is that very few people have taken the time to truly understand the email format. Most have written their parsers by hand and, as a mirror effect, generate emails by hand as well, without any real awareness of the standards that formally describe what an email is.

On top of that, there is the predatory behaviour of companies that have wanted to extend the format with proprietary attachments to lock their users into their software. The end result is a whole host of emails that exist in the wild but do not comply with the rules, and a long tail of technical debt in implementations that have to handle these special cases that email encoders and decoders have (falsely) treated as legitimate.

As a consequence, there are some parts of Mr.MIME where we have had to extend the definitions given in the standards in order to handle certain real-world emails.

The Postel law

I know that Postel's law is open to debate, but in the specific case of emails it has not really been applied (as far as I can see). That has a very direct impact on how emails can be verified.

Mr.MIME provides a solid foundation that can be trusted when it comes to computing signatures, as is the case with ocaml-dkim and ocaml-arc. We can be confident that what we parse (with all the imaginable and unimaginable effort accumulated over the years) is truly equivalent to what is stored: to this day, we have not observed any discrepancy between the original email and the OCaml value we manipulate. And when a signature is computed down to the last byte, that really matters.

Our archive system, indexing and search engine

This part really was a case of building from scratch: very little was available in the OCaml ecosystem for archiving, indexing and searching emails. But since we enjoy building operating systems from scratch, this felt like just another challenge, an opportunity to invent new solutions and learn a great deal along the way.

Indexing

For quite some time, I had been keeping an eye on notmuch, and more specifically on xapian, with indexing in mind. As mentioned above, the Adaptive Radix Tree, along with work such as ROWEX and P-ART (RECIPE), had also always interested me as a research topic, but it had to be done in OCaml!

After implementing art as best I could (and the performance was already there), bancos became the natural culmination of this work: a persistent index in which values of type int64 can be accessed in parallel. This started before OCaml 5, which forced us to keep the design compatible with unikernels running a single kernel.

Storing values of variable size is, of course, still missing, but since composition is always the rule, we leave that question to the user. In our case, the int64 values are positions inside a PACKv2 file.

We have already talked about all of this in some detail in this article. What is left is to grow the project into new applications, such as a database, or a unikernel that reimplements etcd!

Archive

As mentioned earlier, I always had a hunch that the PACKv2 format could be a very good fit for archiving emails. We finally proved it in this article.

When I first started working on ocaml-git, I had to implement the Smart protocol, and in particular the PACKv2 format, which stores multiple Git objects using a binary diff system inherited from libXdiff (this is what later became duff). At the time, I also had to decompress and compress data using the zlib format, which led to decompress. Both projects are still in use today (which makes me feel my age) and show a certain stability of those implementations.

carton, as a library for manipulating and generating PACKv2 files, came much later. There was a bit of a learning curve around the API, how to structure things properly, whether (at the time) I should be using lwt, some experimentation with Ephemerons (which are no longer there!), and so on. As a result, I gained an almost physical understanding of the format, given how little documentation existed on the subject.

Using this format to archive emails was therefore a bet that we could have lost, but our experiments (in particular when compared with public-inbox) confirmed that we were on the right track. The compression ratio is much more favourable, and we can still access individual entries without having to decompress the entire PACK file (which is what would happen with, say, a *.tar.gz).

The final piece was the ability to delete an email from an archive (the right to be forgotten). We now provide a delete command in both our carton tool and our blaze tool, allowing us to actually remove an email from an archive. We therefore have a complete toolbox for manipulating such archives; in particular, we can:

$ blaze make -o pack.pack <<EOF
> 0001.eml
> 0002.eml
> ...
> EOF
$ blaze pack index pack.pack
$ blaze okapi pack.idx "Revolt" | head -n1 | cut -d':' -f1
6f56bd209555cc217d0ba3a0f23099c33b9d438e
$ blaze pack get pack.pack 6f56bd209555cc217d0ba3a0f23099c33b9d438e
From: foo@eron.com
Subject: Revolt!

=C3=80=20bas=20la=20hi=C3=A9rarchie!
$ blaze pack list pack.pack
0000000c 6f56bd209555cc217d0ba3a0f23099c33b9d438e
000013be 97e4bcb4d4a18decccb1025ea3eab896da340df4
...
$ blaze pack delete pack.pack 6f56bd209555cc217d0ba3a0f23099c33b9d438e
$ blaze pack list pack.pack
000013be 97e4bcb4d4a18decccb1025ea3eab896da340df4
...

Searching

The last piece of the archive puzzle is the ability to actually find something inside it. Once you have hundreds of thousands of emails neatly packed into a PACKv2 file and indexed by bancos, you still need to answer the most basic question: "where is the email that talks about X?".

This is what stem and bm25 are for.

stem is a small library that takes a string of text and turns it into a list of normalised tokens. Concretely, it lowercases the text, strips punctuation, removes the most common ("stop") words, and reduces what is left to its morphological root, so that running, runs and ran all end up as the same token. Without this step, a search engine quickly becomes useless: you would only ever match the exact form of the words a user typed, not the underlying concepts.

On top of stem, bm25 implements the Okapi BM25 ranking function. BM25 is a well-studied, robust formula that, given a query and a corpus of documents, computes a relevance score for each document. It takes into account how often a term appears in a document (term frequency), how rare that term is across the whole corpus (inverse document frequency), and the length of the document (so that very long emails are not unfairly favoured over short, focused ones). It is the same family of algorithms that powers a number of mainstream search engines, but here we have a small, dependency-light OCaml implementation that we can embed inside a unikernel.

Glued together with bancos for indexing and carton for storage, this gives us a complete, self-contained search stack: parse the email with mrmime, tokenise its body and headers with stem, store each token position in bancos, and rank queries with bm25. The blaze okapi command shown earlier is exactly that pipeline at work:

$ blaze okapi pack.idx "Revolt" | head -n3
6f56bd209555cc217d0ba3a0f23099c33b9d438e:12.84
97e4bcb4d4a18decccb1025ea3eab896da340df4:9.71
...

There is, of course, plenty of room to grow: better tokenisation for non-English content, support for phrase queries, smarter handling of quoted replies, and so on. But the foundations are now in place, in OCaml, and ready to be used inside a unikernel.

Protocols

One of the core skills of our cooperative is implementing protocols in OCaml. The mirage organisation already hosts plenty of them. Very quickly, our implementations started converging on a single idea, one that emerged implicitly from ocaml-tls and is best summarised by the sans-io movement. The same concern applies to schedulers, which is why we (unlike others) can offer solutions that do not depend on Miou.

SMTP for client and server implementations

colombe is a project in which I ran a number of experiments, as mentioned earlier, including using [linocaml][linocaml] to implement an SMTP state machine with linear types. Apart from being a great learning exercise, that line of experimentation did not go very far.

What we did keep is the monad that lets us describe an execution flow and combine the sending and receiving of SMTP packets through two primitives, without having to care whether the underlying implementation uses the Unix module or our TCP/IP stack for unikernels.

let run ... =
  let open Monad in
  recv ctx PP_220 >>= fun _txts ->
  let* txts = send ctx Helo domain >>= fun () -> recv ctx PP_250 in
  let has_8bit_mime_transport_extension =
    has_8bit_mime_transport_extension txts in
  (match authentication with
    | Some a -> auth ctx a.mechanism (Some (a.username, a.password))
    | None -> return `Anonymous)
  >>= fun status ->
  let parameters =
    if has_8bit_mime_transport_extension
    then [ ("BODY", Some "8BITMIME") ]
    else [] in
  let* code, txts =
    send ctx Mail_from (sender, parameters) >>= fun () ->
    recv ctx Code in
  ...

Then came the question of STARTTLS support, and thanks to the design of ocaml-tls we were able to inject a TLS state on the fly while staying abstract over the underlying TCP implementation. It would be impossible to pull off the same feat with ssl whilst also guaranteeing support for unikernels.

Implementing a client is one thing; implementing a server is quite another. A number of improvements were therefore needed, in particular to support the sending of multiple emails to multiple recipients over a single connection.

The key takeaway is that being able to offer a tool such as letters, to avoid depending on the host system's TCP/IP stack, and to implement a brand-new state machine (this time for the server) on top of the same library, confirms that the organisation of our libraries, however erratic it may look, ultimately makes sense.

Email verification

We will not go into too much detail about ocaml-dkim, ocaml-arc or uspf (the latter already has its own article), but we do want to highlight one thing: it was very difficult to find an "oracle" for these libraries. Signature generation and verification are tricky because they are computed down to the byte, and many things can go wrong, from a misbehaving DNS resolver to a poorly configured SMTP server.

There was also quite a bit of back-and-forth about whether we should functorise these implementations over a DNS implementation, use value-passing (as with ocaml-tls) and/or functorise them via a Mirage_flow.S to abstract over the origin of the email. The final design of the libraries is now robust enough to let us build tools such as blaze and to use the same libraries inside our unikernels:

$ blaze arc verify --newline crlf < new.eml
ptt@mailingl.st -✓-> 01:google.com -✓-> 02:mailingl.st -✓-> 03:data.coop

Finally, some people may believe that you can simply "slop" together some code and end up with something that works for email verification, without taking the time to look at what already exists (which any researcher or engineer would normally do first). But that ignores the lack of an oracle, the fact that real emails do not always follow the standards, and the fact that, even today, some people still enjoy quietly introducing errors into their emails (such as signing a field with the ; character).

What we are offering is not just a translation of several RFCs into OCaml (even if I was, at one point, known as Mister RFC). It involves testing, discussions with other authors, reviewing bug fixes in other projects, uncovering subtle cases legitimised by certain emails, and so on (as I already said). It is far more than collecting credits (even if sometimes it involves buying a few beers), and far more resilient than the inattentive output of the "sloppers" who think they can remake the world with an AI agent.

Unikernels

We have now reached the final stage: unikernels. Shipping a unikernel rather than an executable running on top of a host system has very real benefits. The first is reproducibility, both in how the unikernel is built and in how it behaves. From a security standpoint, the usual argument is reduced attack surface, but we should also mention how much easier it becomes to audit a unikernel, given how self-contained the whole thing is. In short, there are plenty of advantages, and we are trying to consolidate them into real, ready-to-use solutions.

A new way to build unikernels

ptt is a bit of an exception because it represents the culmination of a paradigm shift in how we build unikernels. We have been experimenting with mkernel, dune and mnet for some time now, and ptt is the result of a new workflow for developing unikernels.

For anyone who wants to try it out, we have written a tutorial showing how to build a website that zips files on the fly, packaged as a unikernel.

The essentials are now in place. We are going to significantly upgrade our existing unikernels to adopt this new workflow, and we will also be shipping brand new unikernels built directly with it. The whole point was to make unikernel development fun again, and we have succeeded!

mailingl.st

We can now offer a proper mailing list that runs exclusively on unikernels. It is still a test mailing list, but its main purpose is to let you follow our work on email-related topics.

There is plenty of room for improvement, and we would love to receive feedback, identify and fix bugs little by little, and turn this into a robust service that we could later extend to other mailing lists, such as those for MirageOS or Solo5.

To that end, we have also deployed a unikernel hosting the MirageOS mailing list archive, available at https://blame.mailingl.st.

Our ambitions regarding email

After all this work, nothing is really "finished". Our goal, of course, is to keep improving what we have built. This is maintenance work that we know well, having maintained several pieces of software for more than ten years.

From now on, however (as far as I am concerned), the work is more about running services. So if you would like us to host your own mailing lists, manage your domain name on your behalf (rather than your registrar), or deploy your beautiful static website (built with YOCaml), please do not hesitate to get in touch... by email!

An email server

One last piece is missing: the IMAP protocol. With it, we will be able to build a suite of unikernels (in the same spirit as ptt) so that people can deploy their own email service! This may take some time, but it is the ultimate goal of the project.

blame

blame is our unikernel for serving a website that lets you read email archives. The interface is currently quite basic, but we would like to push it further: sorting emails, displaying threads, and so on. The goal is to turn this from a proof of concept into a genuinely good tool for browsing email archives!

Conclusion

This is a promising project that has finally crossed the Rubicon, and one that we hope will keep growing and improving over time. It also shows that it is possible to take on a problem that looks deceptively simple yet is genuinely fundamental (email), and take the time to design a proper solution for it.

It is also an example of what we would like to see much more of: unikernels! Everywhere! It has been a long journey, but a very satisfying one. If you have been following us since the very beginning of this adventure, we hope that you have, at the very least, learnt a thing or two :)! Many thanks to all of you!