A Data Bridge from Rust to Haskell
This post describes a method to serialise a complex Rust data structure and to deserialise it to a corresponding Haskell data structure, without any external dependencies and with only minimal boilerplate.
Problem & Research
Recently, I wanted to process some data (jq programs) in a Haskell program. I had previously written a jq parser in Rust, so my first approach was to write a small jq parser in Haskell. However, I quickly found myself bored and annoyed to write one more parser for the same format.
Thus I started to think about using my Rust-written jq parser from my Haskell program. I would somehow need to serialise the output of my parser into some intermediate format, then deserialise from that format to a data structure in Haskell. I additionally had a few constraints: In particular, I did not want to use any external dependencies on the Haskell side, because I had made some painful experiences with Haskell’s package management from years ago. Furthermore, I also did not want to add any new dependencies to my Rust code. In addition, if possible, I wanted to write a minimum of code for this (de-)serialisation task.
At first, I found that GHC (Haskell’s main compiler) ships a binary en-/decoding package. While that initially looked quite promising, I decided against using it because this library has a function putStringUtf8
to encode a string, but it does not have a corresponding function getStringUtf8
to decode a string. Given that strings are one of the data structures I need to encode the most and I did not find easily how to implement getStringUtf8
myself, I reconsidered my choice of using this library.
Searching a bit more, I found Haskell’s Read
class. This allows you to automatically generate a parser for your Haskell data types. Because writing data parsers is generally much harder than writing data printers, I figured that I could simply write data on the Rust side that can then be read by Haskell’s automatically generated parser.
However, it gets even better: I found out that if I stick to a certain subset of data structures on both the Haskell and the Rust side, then I can automatically generate a printer on the Rust side using #[derive(Debug)]
and automatically generate a parser on the Haskell side using deriving Read
.
I will now write give you some example code that shows how to automatically
(de-)serialise data from Rust to Haskell.
You can find the full source code in the ujq
folder of
this repository.
Serialisation
Consider the following Rust code:
#[derive(Debug)]
pub enum Term<S> {
Id,
Str(S),
Arr(Option<Box<Self>>),
Pipe(Box<Self>, Option<Pattern<S>>, Box<Self>),
}
#[derive(Debug)]
pub enum Pattern<S> {
Var(S),
Arr(Vec<Self>),
}
(I use S
as a type parameter on the Rust side for unrelated reasons;
just pretend that I am using String
in place for S
everywhere.)
When printing a tm: Term
with println!("{tm:?}")
, this yields something like:
Arr(Some(Pipe(Id, Some(Arr([Var("$x"), Var("$y")])), Str("a"))))
Deserialisation
Now to the Haskell side: How to deserialise a term that was output via Rust’s Debug
?
Let us first define data structures in Haskell analogous to the Rust side:
data Term =
Id
| Str(String)
| Arr(Option Term)
| Pipe(Term, Option Pattern, Term)
deriving (Read, Show)
data Pattern = Var(String) | Arr([Pattern])
deriving (Read, Show)
Here, we have our first problem: In Rust, we can define in the same module
multiple data types with constructors that have the same name;
in our example, we have Term::Arr
and Pattern::Arr
.
This is not possible in Haskell; therefore,
we cannot put Term
and Pattern
into the same file.
So we have to move Pattern
to a different Haskell module and
import that from the Term
module.
Note that I used some slightly non-idiomatic Haskell here:
In Haskell, you would rather write
Pipe Term (Option Pattern) Term
instead of
Pipe(Term, Option Pattern, Term)
.
However, when we use the more idiomatic version, the auto-generated
parser cannot parse terms of the shape Pipe(a, b, c)
, only Pipe a b c
.
Because Rust outputs terms of the shape Pipe(a, b, c)
,
we use the less idiomatic version here.
Next, you might notice that I used an Option
type on the Haskell side,
analogous to the Rust Option
type — but Haskell does not have Option
.
Haskell has Maybe
, baby.
No big deal.
I defined an Option
type in Haskell the same way as in Rust, and
made a helper function to convert from Option
to Maybe
:
data Option a = None | Some(a)
deriving (Read, Show)
toMaybe :: Option a -> Maybe a
toMaybe None = Nothing
toMaybe (Some(x)) = Just x
Putting things together
I made a little shell script that passes its first argument to the Rust parser.
The Rust parser then writes its Debug
output on stdout, from where
the Haskell program picks it up:
echo "$1" | ./rust-parser | ./haskell-reader
The Rust part reads from stdin, runs the parser, and serialises its output:
fn main() {
let s = std::io::read_to_string(std::io::stdin()).unwrap();
let tm = parse(&s).unwrap();
println!("{tm:?}");
}
The Haskell part captures the serialised Rust data on stdin and
parses it to the corresponding Haskell Term
with read
:
main :: IO ()
main = do
stdin <- getContents
let term :: Term = read stdin
print term
And that just works nicely!
Restrictions
On the Rust side, I have something like this:
pub struct Def<S> {
pub name: S,
pub args: Vec<S>,
}
I tried to make a Haskell counterpart for this as follows:
data Def = Def {
name :: String,
args :: [String],
}
However, Haskell uses a different syntax than Rust to create values of such types:
-- Haskell
Def {name = "f", args = []}
// Rust
Def {name: "f", args: []}
That means that Haskell’s auto-generated Read
for Def
cannot parse
the output of Rust’s auto-generated Debug
for Def
.
Because I did not want to change the Def
type on the Rust side,
I did simply create a custom Debug
implementation for it:
impl<S: Debug> Debug for Def<S> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "({:?}, {:?})", self.name, self.args)
}
}
(That was the only change that I made to the Rust code base!) On the Haskell side, I simply introduced:
type Def = (String, [String])
Alternatively, I could have written
struct Def<S>(S, Vec<S>);
on the Rust side and
data Def = Def(String, [String])
on the Haskell side.
I noticed that Rust also has a slightly different way to print Unicode characters.
In particular, it prints stuff like \u{1}
, which Haskell does not understand.
I do not expect such characters in my data, so I do not care about this.