Streaming Binary Data in Mollymawk
2025-08-18Previously, whether we were uploading a unikernel image or creating a block device with initial data, mollymawk would read the entire binary payload into memory before passing it along to Albatross. While this approach worked for small files, it became problematic with larger binaries. The system would quickly run into performance issues or even crash due to excessive memory consumption.
The Problem: Memory Bottlenecks
Binary files such as unikernel images can be multiple megabytes in size, and block device images can be several gigabytes in size. When these files are read completely into memory, a single upload can exhaust available RAM. In systems running multiple services or hosting many users, the impact is even more severe. The overhead doesn't scale well, and we observed slowdowns, out-of-memory errors, and sometimes dropped requests. It is especially severe as Mollymawk is run as a unikernel - that is, a virtual machine with a fixed allocation of memory and no shared pool of memory that we can take from and give back to the operating system.
With the protocol in Albatross, it was not possible to remotely upload a block device bigger than 16 MB (due to the restrictions in the TLS handshake, where a certificate may only be up to 16 MB). With these changes, we can now upload (and download) arbitrary big block devices \o/.
The Solution: Streaming to Albatross
In the latest update, mollymawk has adopted a streaming approach to handle binary uploads. Instead of reading the full contents of an uploaded file into a single string, we now process the file incrementally as a stream and pass each chunk directly to albatross. This change ensures that memory usage remains minimal and predictable, regardless of the file size.
Updates were also made to albatross to support streaming all binary data.
This work was achieved in the following PRs:
- albatross supports streaming
- albatross supports streaming of block devices (create, set, dump)
- mollymawk streams unikernel images to albatross
- mollymawk streams binary data when creating block devices
- address code reviews for the streaming
- mollymawk adaptions to the albatross changes using streaming
We encountered while working on streaming that using Lwt_stream does not lead to constant memory usage, but only the create_bounded
provides this guarantee.
Conclusion
Both Mollymawk and Albatross now use streaming interfaces and use much less memory for unikernel deployment and block device operations.
Robur is a cooperative that develops applications and unikernels in OCaml. The aim is to use and promote MirageOS unikernels.
Our work is only partially funded, we cross-fund our work by commercial contracts and public (EU) funding. We are part of a non-profit company, you can make a (in the EU tax-deductible) donation (select "DONATION robur" in the dropdown menu), or sponsor us via the GitHub sponsor button.