Have an interesting error design scenario I have been struggling with working on the encoding stuff. To set the stage, I am working on decoders which are reading bytes and putting them into some structures. Decoders are generally more complicated than encoders, because there are more failure scenarios. Inefficient data, malformed data, invalid data, protocol violations, resource limits…all of these just doesn’t happen when encoding. And these error variants can be unique per-decoder type, so it is tough to just enumerate a handful.
When decoding, especially for the bitcoin domain, we are often composing decoders together. A header decoder is made up of six inner decoders, which are composed of other more primitive decoders (e.g. “4 byte array decoder”). So what is the type of error returned by that high level header decoder? Technically it is a large enumeration of all the inner decoder error types. If there are only two inner decoders, the error type is an enum of the first error and the second error.
Sum Type
pub enum Either<L, R> {
First(L),
Second(R),
}
A sum type which enumerates two error conditions.
As the decoder nesting gets more complex, the type of the error gets more complex.
type Decoder2Error = Either<ArrayError, NetworkError>;
type Decoder4Error = Either<
Either<ArrayError, NetworkError>,
Either<ParseError, ValidationError>
>;
type Decoder6Error = Either<
Either<Either<ArrayError, NetworkError>, ParseError>,
Either<Either<ValidationError, FormatError>, IoError>
>;
Either’s everywhere.
These are a pain to name, but also a pain for a caller to handle.
match nested {
Either::First(Either::First(array_err)) => handle_array_error(array_err),
Either::First(Either::Second(net_err)) => handle_network_error(net_err),
Either::Second(Either::First(parse_err)) => handle_parse_error(parse_err),
Either::Second(Either::Second(val_err)) => handle_validation_error(val_err),
}
Handling nested error types.
So are there any other options? The Decoder trait has an associated error type, and this makes sense, it would be kind of weird to make decoder generic across an error type Decoder<E>. If this was the case, a decodable type could have multiple decoder implementations, each with its own error. This is clearly not what we want to model. How can we compose decoders, but keep the simple Decoder trait?
Composite Errors
Another option is to just shed information and have composed decoders return some sort of CompositeError. The decoder implementation maps inner errors to it, so the caller only has to deal with one error type now. A static amount of information could be in the error type, like maybe the index of the inner decoder which failed and a display string. The caller’s interface is simpler, but might be more difficult to debug failure scenarios of complex decoders. The caller might want to act on the failure as well, for example if more data or a large buffer is required. Preferably this is still captured in the type system instead of shoved into a string.
Some information could be brought back into the picture by hanging on to the source errors. If you keep things generic though, you get back to super-huge-nested-type. So it might be a use case for dynamic dispatch, where the composite holds the source behind a Box<dyn Error>. A big upside here is that it could hook into rust’s existing error conventions and impl Error::source to build the nested type for debugging. The dynamic dispatch always brings a few corner cases to the table. In this case, Box is only available if alloc is enabled, and Error is only there if std is enabled. This matters in no_std compliant crate like in rust-bitcoin.
Maybe to look at it another way, some caller knows they are using decoders which only use three different error variants. So their composite decoder should just be an enum with three variants, I don’t think the “level” of where the error occurred is really all that helpful beyond a debug message. The caller could define this enum and some converters for the inner decoder. This becomes a “Composite Decoder which Return This Error” type. The Composite Decoder is generic, not the Decoder trait. But since the error type is never actually held onto at any point by the Composite Decoder, some PhantomData is required to tag the type.
struct CompositeDecoder<E> {
_phantom: PhantomData<E>,
}
impl<E> Decode for CompositeDecoder<E>
// Make sure inner errors can be converted into composite error.
where
E: From<ArrayError>,
{
type Error = E;
}
Connecting a PhantomData held generic to the associated type.
Per Encoder Functions
Another twist would be some runtime function storage instead of type-level. But I think this is too complex despite the further flexibility of per-error-per-decoder settings.
pub struct Decoder2<A, B, E>
where
A: Decoder,
B: Decoder,
{
state: Decoder2State<A, B>,
convert_a_error: fn(A::Error) -> E,
convert_b_error: fn(B::Error) -> E,
}
Store conversion functions for each child decoder’s error type to avoid PhantomDate.
Collapsing Either’s
I also considered adding some functionality to the Either type to allow the caller to collapse it down to a single “leaf” error. This isn’t that flexible since all the leafs have to be of the same type, so not great, but still kinda neat abuse of the type system. It also would have been nice if I could some how have gotten the base-case identity function as the default for the trait, so that it wouldn’t have to be defined for each leaf error type. But no luck. In practice, this doesn’t simplify much.
/// A trait for flattening nested decoder error structures.
pub trait DecoderError<E = Self>: Sized {
/// Collapse this error to the target error E.
fn collapse(self) -> E;
}
// Either is extended with a recursive collapse.
impl<A, B, T> DecoderError<T> for Either<A, B>
where
A: DecoderError<T>,
B: DecoderError<T>,
{
fn collapse(self) -> T {
match self {
Either::First(a) => a.collapse(),
Either::Second(b) => b.collapse(),
}
}
}
// "Leaf" error types are tagged with an base-case identity function.
impl DecoderError for UnexpectedEof {
fn collapse(self) -> UnexpectedEof { self }
}
Bringing some recursion to the type system…feels a little wrong.
Balance
Taking a step back, it appears that satisfying these three requirements is pretty difficult.
- Composable decoders // Need some way to unify the error types without causing coherence issues.
- Preserved error semantics // Caller can match on the actual error.
- Context information // Which decoder in the chain failed.
The easy way to accomplish composable decoders does away with preserved semantics.