A review about ptt
2026-05-08Perhaps it's time to take stock of our email work! We’ve been working on
several email-related projects for over a year now, and it feels like a good
moment to step back and offer an overview of everything we’ve built and the
results we’ve achieved. The whole effort revolves around a single project:
ptt.
Along the way, we also took the opportunity to consolidate a number of building blocks that weren’t strictly required for our goals, but which could pave a better path for unikernels in general. The resulting unikernels confirm that this path is, indeed, very promising.
For that reason, this article is divided into two parts: the first covers
emails and what we have developed for them, and the second focuses on all the
'side projects' that ultimately allowed us to bring ptt to completion.
Our ptt project and everything related to it
Below is a comprehensive list of the email-related libraries we use, with a short description of each.
-
prettym: a small library similar to OCaml's standardFormatmodule, but it returns a continuation rather thanunit. This makes it possible to generate bytes in a memory-bounded way, which is a crucial property when running inside a unikernel. -
unstrctrd: the definitive representation of a value inside an email field. The library is small, but it carefully summarises the various RFCs that describe how those fields are structured. -
mrmime: this is the centrepiece. It lets us decode and encode emails as OCaml values. There is no shortage of email-parsing software out there, but what setsmrmimeapart is that it sticks to the standards and has been battle-tested against huge data sets such as caml-list, Enron, LKML, and Hamlet. -
rosetta,uuuu&coin: emails often carry content encoded in ISO-8859 or KOI8. This small bundle of libraries converts such content to UTF-8. -
multipart_form: a library that has also turned out to be very useful for HTTP. Themultipart/form-dataformat was originally designed for email attachments, and none of the existing OCaml libraries met our requirements (particularly with regard to the memory constraints of unikernels). -
pecu: decodes and encodes text in quoted-printable format, a standard encoding used in emails (RFC 2045 § 6.7). -
base64: it is also quite common to encode email content (especially attachments) in Base64, though a few subtleties apply, such as the maximum line length permitted by the SMTP protocol (RFC 2045 § 6.8). -
uspf: verifies the origin of an email by issuing the DNS queries required to check its SPF record (RFC 7208). -
dkim(RFC 6376): verifies an email's signature (again by issuing the necessary DNS queries) and can also sign an email. The library is carefully designed to run inside a unikernel: it streams the email rather than relying on a file on disk to compute or verify the signature.There are very few DKIM-related tools out there; hopefully this package can at least replace
opendkimfor die-hard OCaml enthusiasts. -
dmarc: combinesuspfandocaml-dkimto perform DMARC validation. I can now say with confidence that RFC 7489 is one of the most difficult standards I have ever had to read. -
arc: I often call it the email blockchain! It is a way of signing an email that takes its provenance into account, so that the next server in the chain can trust the previous checks (SPF, DKIM and DMARC) even if we have modified the email along the way (RFC 8617). -
blaze: still experimental, this project is our synthesis of all these libraries put to real-world use. We are not re-implementing an entire email stack just for the sake of it; we want to actively use it. The tests are particularly interesting because they show how we use email in practice. The project is just waiting to be improved further and used more widely. -
carton: originally written forocaml-gitto handle Git's PACKv2 format. I always had a hunch that such a format could be useful for archiving emails, so we extended it accordingly whilst keeping full compatibility with Git. -
colombe&sendmail: people often describe SMTP as a "simple" protocol, but that quickly falls apart once you look at the details. The project has gone through several iterations, including an attempt to encode the state machine using linear types via [linocaml][linocaml]. The practical side of our needs eventually led us to use a plain monad, which still allowed us to abstract over the protocol and implementSTARTTLS(usingtls) in a rather satisfying way.On top of that, the project is actively used by the OCaml community to... send emails, and that is just as satisfying.
-
encore: originally designed forocaml-git, a small project that lets you define a format and, by construction, derive both an encoder and a decoder from it, guaranteeing that the two are isomorphic. -
emile: parsing email addresses is (very) difficult.emileis the library you need to do it correctly. -
art: a small library implementing a data structure (a radix tree on steroids) in OCaml. The benchmarks are very encouraging, and I always had a feeling this could be useful for indexing emails. -
bancos: the natural extension ofart. It took a while to build, but it brings together several research papers and gives us a persistent, parallel index in OCaml! -
stem&bm25: finally, a small search engine that we have applied to emails!
Composing!
After seeing this list, you might think: what a nightmare! Too many libraries (with odd names, I admit), and it’s hard to make sense of it all.
This design did not come out of nowhere. Some people prefer a homogeneous ecosystem (built around their own rules) with a strict hierarchy of dependencies (where their own libraries sit at the centre) and authoritarian composition mechanisms that wire everything to everything. We prefer an organisation that is a bit more... erratic, as the saying goes.
Each library listed here has its own purpose and can be used in several
contexts. Unikernels first, of course, but the existence of the blaze tool
shows that we can take these libraries piece by piece and end up with
something quite different from what we originally imagined. The fact that you
can use ocaml-base64 in your web application whilst we use it in our email
stack illustrates our ability to reach our own goals whilst remaining mindful
of what the OCaml community actually does in practice (and yes, everyone uses
ocaml-base64!).
Many people complain about this "disorganisation" within the OCaml community. I have always seen it as an opportunity: an opportunity for our cooperative to carve out its own space, and an opportunity for others to do the same, with occasional bridges of interconnection (rather than, as some would like, links of subjugation).
Encoding & decoding emails
I would really like to take the time to explain this part, even if my talk already gives a good overview of just how complex it is to parse emails properly.
If we look at what is being done in other communities, very few solutions actually parse emails correctly according to the standards. The approach taken by most developers is to skip the standards entirely and rely on a vague idea of what an email might look like.
This often leads to entire email stacks built on top of rather
scary parsers! Mr.MIME (together with the related
libraries prettym and unstrctrd) is designed not
only to decode emails but also to encode them, from OCaml values. Taking a
step back, the real problem is that very few people have taken the time to
truly understand the email format. Most have written their parsers by hand
and, as a mirror effect, generate emails by hand as well, without any real
awareness of the standards that formally describe what an email is.
On top of that, there is the predatory behaviour of companies that have wanted to extend the format with proprietary attachments to lock their users into their software. The end result is a whole host of emails that exist in the wild but do not comply with the rules, and a long tail of technical debt in implementations that have to handle these special cases that email encoders and decoders have (falsely) treated as legitimate.
As a consequence, there are some parts of Mr.MIME where we have had to extend the definitions given in the standards in order to handle certain real-world emails.
The Postel law
I know that Postel's law is open to debate, but in the specific case of emails it has not really been applied (as far as I can see). That has a very direct impact on how emails can be verified.
Mr.MIME provides a solid foundation that can be trusted when it comes to
computing signatures, as is the case with ocaml-dkim and
ocaml-arc. We can be confident that what we parse (with all
the imaginable and unimaginable effort accumulated over the years) is truly
equivalent to what is stored: to this day, we have not observed any
discrepancy between the original email and the OCaml value we manipulate.
And when a signature is computed down to the last byte, that really matters.
Our archive system, indexing and search engine
This part really was a case of building from scratch: very little was available in the OCaml ecosystem for archiving, indexing and searching emails. But since we enjoy building operating systems from scratch, this felt like just another challenge, an opportunity to invent new solutions and learn a great deal along the way.
Indexing
For quite some time, I had been keeping an eye on notmuch, and
more specifically on xapian, with indexing in mind. As mentioned
above, the Adaptive Radix Tree, along with work such as
ROWEX and P-ART (RECIPE), had also always
interested me as a research topic, but it had to be done in OCaml!
After implementing art as best I could (and the performance was already
there), bancos became the natural culmination of this work:
a persistent index in which values of type int64 can be accessed in
parallel. This started before OCaml 5, which forced us to keep the design
compatible with unikernels running a single kernel.
Storing values of variable size is, of course,
still missing, but since composition is always the rule, we
leave that question to the user. In our case, the int64 values are
positions inside a PACKv2 file.
We have already talked about all of this in some detail
in this article. What is left is to grow the project into
new applications, such as a database, or a unikernel that reimplements
etcd!
Archive
As mentioned earlier, I always had a hunch that the PACKv2 format could be a very good fit for archiving emails. We finally proved it in this article.
When I first started working on ocaml-git, I had to implement
the Smart protocol, and in particular the PACKv2 format, which stores
multiple Git objects using a binary diff system inherited from
libXdiff (this is what later became duff). At the
time, I also had to decompress and compress data using the zlib
format, which led to decompress. Both projects are still in
use today (which makes me feel my age) and show a certain stability of
those implementations.
carton, as a library for manipulating and generating PACKv2
files, came much later. There was a bit of a learning curve around the API,
how to structure things properly, whether (at the time) I should be using
lwt, some experimentation with Ephemerons (which are
no longer there!), and so on. As a result, I gained an almost physical
understanding of the format, given how little documentation existed on the
subject.
Using this format to archive emails was therefore a bet that we could have
lost, but our experiments (in particular when compared with
public-inbox) confirmed that we were on the right track. The
compression ratio is much more favourable, and we can still access
individual entries without having to decompress the entire PACK file (which
is what would happen with, say, a *.tar.gz).
The final piece was the ability to delete an email from an archive (the
right to be forgotten). We now provide a delete command in both our
carton tool and our blaze tool, allowing us to actually remove an email
from an archive. We therefore have a complete toolbox for manipulating such
archives; in particular, we can:
- create an archive
- delete an entry from it (without recomputing the entire PACKv2 file)
- access entries by their offsets or identifiers
- and even 'merge' several PACKv2 files into a single one
$ blaze make -o pack.pack <<EOF
> 0001.eml
> 0002.eml
> ...
> EOF
$ blaze pack index pack.pack
$ blaze okapi pack.idx "Revolt" | head -n1 | cut -d':' -f1
6f56bd209555cc217d0ba3a0f23099c33b9d438e
$ blaze pack get pack.pack 6f56bd209555cc217d0ba3a0f23099c33b9d438e
From: foo@eron.com
Subject: Revolt!
=C3=80=20bas=20la=20hi=C3=A9rarchie!
$ blaze pack list pack.pack
0000000c 6f56bd209555cc217d0ba3a0f23099c33b9d438e
000013be 97e4bcb4d4a18decccb1025ea3eab896da340df4
...
$ blaze pack delete pack.pack 6f56bd209555cc217d0ba3a0f23099c33b9d438e
$ blaze pack list pack.pack
000013be 97e4bcb4d4a18decccb1025ea3eab896da340df4
...
Searching
The last piece of the archive puzzle is the ability to actually find
something inside it. Once you have hundreds of thousands of emails neatly
packed into a PACKv2 file and indexed by bancos, you still need to answer
the most basic question: "where is the email that talks about X?".
This is what stem and bm25 are for.
stem is a small library that takes a string of text and turns it into a
list of normalised tokens. Concretely, it lowercases the text, strips
punctuation, removes the most common ("stop") words, and reduces what is
left to its morphological root, so that running, runs and ran all end
up as the same token. Without this step, a search engine quickly becomes
useless: you would only ever match the exact form of the words a user typed,
not the underlying concepts.
On top of stem, bm25 implements the
Okapi BM25 ranking function. BM25 is a well-studied, robust
formula that, given a query and a corpus of documents, computes a relevance
score for each document. It takes into account how often a term appears in
a document (term frequency), how rare that term is across the whole corpus
(inverse document frequency), and the length of the document (so that very
long emails are not unfairly favoured over short, focused ones). It is the
same family of algorithms that powers a number of mainstream search
engines, but here we have a small, dependency-light OCaml implementation
that we can embed inside a unikernel.
Glued together with bancos for indexing and carton for storage, this
gives us a complete, self-contained search stack: parse the email with
mrmime, tokenise its body and headers with stem, store each token
position in bancos, and rank queries with bm25. The blaze okapi
command shown earlier is exactly that pipeline at work:
$ blaze okapi pack.idx "Revolt" | head -n3
6f56bd209555cc217d0ba3a0f23099c33b9d438e:12.84
97e4bcb4d4a18decccb1025ea3eab896da340df4:9.71
...
There is, of course, plenty of room to grow: better tokenisation for non-English content, support for phrase queries, smarter handling of quoted replies, and so on. But the foundations are now in place, in OCaml, and ready to be used inside a unikernel.
Protocols
One of the core skills of our cooperative is implementing protocols in OCaml. The mirage organisation already hosts plenty of them. Very quickly, our implementations started converging on a single idea, one that emerged implicitly from ocaml-tls and is best summarised by the sans-io movement. The same concern applies to schedulers, which is why we (unlike others) can offer solutions that do not depend on Miou.
SMTP for client and server implementations
colombe is a project in which I ran a number of experiments, as
mentioned earlier, including using [linocaml][linocaml] to implement an
SMTP state machine with linear types. Apart from being a great learning
exercise, that line of experimentation did not go very far.
What we did keep is the monad that lets us describe an execution flow and
combine the sending and receiving of SMTP packets through two primitives,
without having to care whether the underlying implementation uses the Unix
module or our TCP/IP stack for unikernels.
let run ... =
let open Monad in
recv ctx PP_220 >>= fun _txts ->
let* txts = send ctx Helo domain >>= fun () -> recv ctx PP_250 in
let has_8bit_mime_transport_extension =
has_8bit_mime_transport_extension txts in
(match authentication with
| Some a -> auth ctx a.mechanism (Some (a.username, a.password))
| None -> return `Anonymous)
>>= fun status ->
let parameters =
if has_8bit_mime_transport_extension
then [ ("BODY", Some "8BITMIME") ]
else [] in
let* code, txts =
send ctx Mail_from (sender, parameters) >>= fun () ->
recv ctx Code in
...
Then came the question of STARTTLS support, and thanks to the design of
ocaml-tls we were able to inject a TLS state on the fly while
staying abstract over the underlying TCP implementation. It would be
impossible to pull off the same feat with ssl whilst also
guaranteeing support for unikernels.
Implementing a client is one thing; implementing a server is quite another. A number of improvements were therefore needed, in particular to support the sending of multiple emails to multiple recipients over a single connection.
The key takeaway is that being able to offer a tool such as letters, to avoid depending on the host system's TCP/IP stack, and to implement a brand-new state machine (this time for the server) on top of the same library, confirms that the organisation of our libraries, however erratic it may look, ultimately makes sense.
Email verification
We will not go into too much detail about ocaml-dkim,
ocaml-arc or uspf (the latter already has
its own article), but we do want to highlight one thing: it
was very difficult to find an "oracle" for these libraries. Signature
generation and verification are tricky because they are computed down to the
byte, and many things can go wrong, from a misbehaving DNS resolver to a
poorly configured SMTP server.
There was also quite a bit of back-and-forth about whether we should
functorise these implementations over a DNS implementation, use
value-passing (as with ocaml-tls) and/or functorise them
via a Mirage_flow.S to abstract over the origin of the
email. The final design of the libraries is now robust enough to let us
build tools such as blaze and to use the same libraries inside
our unikernels:
$ blaze arc verify --newline crlf < new.eml
ptt@mailingl.st -✓-> 01:google.com -✓-> 02:mailingl.st -✓-> 03:data.coop
Finally, some people may believe that you can simply "slop" together some
code and end up with something that works for email verification, without
taking the time to look at what already exists (which any researcher or
engineer would normally do first). But that ignores the lack of an oracle,
the fact that real emails do not always follow the standards, and the fact
that, even today, some people still enjoy quietly introducing errors into
their emails (such as signing a field with the ; character).
What we are offering is not just a translation of several RFCs into OCaml (even if I was, at one point, known as Mister RFC). It involves testing, discussions with other authors, reviewing bug fixes in other projects, uncovering subtle cases legitimised by certain emails, and so on (as I already said). It is far more than collecting credits (even if sometimes it involves buying a few beers), and far more resilient than the inattentive output of the "sloppers" who think they can remake the world with an AI agent.
Unikernels
We have now reached the final stage: unikernels. Shipping a unikernel rather than an executable running on top of a host system has very real benefits. The first is reproducibility, both in how the unikernel is built and in how it behaves. From a security standpoint, the usual argument is reduced attack surface, but we should also mention how much easier it becomes to audit a unikernel, given how self-contained the whole thing is. In short, there are plenty of advantages, and we are trying to consolidate them into real, ready-to-use solutions.
A new way to build unikernels
ptt is a bit of an exception because it represents the culmination of a
paradigm shift in how we build unikernels. We have been experimenting with
mkernel, dune and mnet for some time now,
and ptt is the result of a new workflow for developing unikernels.
For anyone who wants to try it out, we have written a tutorial showing how to build a website that zips files on the fly, packaged as a unikernel.
The essentials are now in place. We are going to significantly upgrade our existing unikernels to adopt this new workflow, and we will also be shipping brand new unikernels built directly with it. The whole point was to make unikernel development fun again, and we have succeeded!
mailingl.st
We can now offer a proper mailing list that runs exclusively on unikernels. It is still a test mailing list, but its main purpose is to let you follow our work on email-related topics.
There is plenty of room for improvement, and we would love to receive feedback, identify and fix bugs little by little, and turn this into a robust service that we could later extend to other mailing lists, such as those for MirageOS or Solo5.
To that end, we have also deployed a unikernel hosting the MirageOS mailing list archive, available at https://blame.mailingl.st.
Our ambitions regarding email
After all this work, nothing is really "finished". Our goal, of course, is to keep improving what we have built. This is maintenance work that we know well, having maintained several pieces of software for more than ten years.
From now on, however (as far as I am concerned), the work is more about running services. So if you would like us to host your own mailing lists, manage your domain name on your behalf (rather than your registrar), or deploy your beautiful static website (built with YOCaml), please do not hesitate to get in touch... by email!
An email server
One last piece is missing: the IMAP protocol. With it, we will be able to
build a suite of unikernels (in the same spirit as ptt) so that people can
deploy their own email service! This may take some time, but it is the
ultimate goal of the project.
blame
blame is our unikernel for serving a website that lets you read
email archives. The interface is currently quite basic, but we would like
to push it further: sorting emails, displaying threads, and so on. The goal
is to turn this from a proof of concept into a genuinely good tool for
browsing email archives!
Conclusion
This is a promising project that has finally crossed the Rubicon, and one that we hope will keep growing and improving over time. It also shows that it is possible to take on a problem that looks deceptively simple yet is genuinely fundamental (email), and take the time to design a proper solution for it.
It is also an example of what we would like to see much more of: unikernels! Everywhere! It has been a long journey, but a very satisfying one. If you have been following us since the very beginning of this adventure, we hope that you have, at the very least, learnt a thing or two :)! Many thanks to all of you!