James McMurray's Blog

Rust, Linux and other curiosities

My Rust 2021 roadmap: crates, concision, and community


The Rust core team recently released a call for blog posts as part of the 2021 roadmap for Rust. In this post I will detail my own experience with Rust, and areas I'd like to see improved during 2021.

Opinions expressed are solely my own and do not express the views or opinions of my employer.

My background

I'm currently a Data Engineer and over the last few months I have proposed and started implementing the migration of our serverless data ingestion pipelines to Rust. The requirements of these pipelines vary but usually involve receiving some external data (e.g. via email or S3, possibly via SFTP to S3), validating and transforming that data (with AWS Lambda) and loading it to a data warehouse.

I originally proposed switching to Rust due to having deployment issues with Python (when depending on native modules, like NumPy) due to possible glibc version mismatches as well as the total size of the deployed package.

Rust solves these issues by allowing us to compile a statically linked binary with musl, and feature flags allow us to include useful code from a shared crate without building in the entire crate and its dependencies (for parts we don't need). Rustls also allows us to avoid any OpenSSL-related deployment issues.

Carrying out this proposal has given me direct experience in solving business problems in Rust, working with other team members to introduce Rust to the team, and deployment in an enterprise environment (although the hard parts of deployment are largely handled by other teams).

I have published two personal programs written in Rust: vopono for running specific applications through VPN connections, and s3rename for mass-renaming keys in an S3 bucket. I have also written a few prior blog posts on Rust, the most popular being about data-oriented design in Rust and an introduction to async programming.

To summarise: As a data engineer, I want to be able to easily test and deploy statically linked Rust binaries on various platforms and architectures, and interact with popular services and protocols without friction.

Considerations

When working as an engineer in an agile team, sprint velocity and speed of delivery is key. Engineers want to use the language and tooling that will help them to complete their tasks as quickly as possible, without frustration or the need to dig deep in to the implementations of dependencies.

Amongst engineers I have seen two main reasons for push-back against Rust deployment:

  • The language is perceived to be very difficult to write (mostly due to lifetimes), low-level and slow to develop in.

  • The language is perceived to be very new, with an immature and unstable ecosystem that might be lacking critical functionality and could lead to a loss of development time if some dependencies have to be created in-house.

These aren't issues that can be solved directly necessarily (especially the perception itself), but we can consider how the community can help to turn around these perceptions.

Development speed

Low-level?

The first misconception is that Rust is a low-level language and that the experience might be similar to writing C (i.e. dealing directly with malloc, etc.). I think a lot of this comes from Rust often being compared to C and C++ implementations in benchmarks, and being more strongly adopted in the systems programming community.

I think this will largely be countered by Rust usage becoming more widespread in other domains, such as Web Development with crates like Actix, Rocket, Hyper, etc. on the backend and Seed and Yew on the frontend, and Data Engineering with crates like ndarray, Serde, Polars, Calamine, Amadeus and Ballista.

Difficult to write?

Rust is also often perceived as difficult to write, particularly due to the lifetime notation. Here I think a lot of great work has already been done with non-lexical lifetimes greatly simplifying the necessary notation in many cases, and rust-analyzer providing a fast debug loop (along with the ongoing efforts to improve error messages).

The main thing to note here for the future, in my opinion, is just how much language and syntax improvements can help new users and further adoption. I hope Rust continues to work on making the code more concise for the developer in the majority of use cases. For Rust 2021 and beyond this might include assignment with destructuring, auto-dereferencing in operators, and lifetime elision in structs.

Slower to implement?

Related to the above point, Rust is often considered slower to develop in (usually when compared to dynamic scripting languages like Python or JavaScript). Here I think it's important to note that hitting lots of error messages from the compiler or borrow checker during development doesn't necessarily mean development is slower - as those errors are probably catching lots of issues that might go unnoticed in other languages (until you hit an issue in production).

In German there is a saying "Wenn schon, denn schon.", which roughly translates to "if you're going to do it, do it properly". This is where I believe Rust excels - whilst you might hit a few lifetime and ownership issues during the development of your program, in the end you are saved from a whole class of bugs and also aren't bound by issues like the Global Interpreter Lock or the lack of type safety as in simpler scripting languages.

But it is important to ensure that the feedback loop is fast for the developer. Here I think the great improvements in rust-analyzer have helped already, and hopefully there will be further improvements to compile speeds to make this even faster in the future.

Ultimately with the great tooling available, I think the development speed can actually be faster in Rust than Python or Ruby when you consider the whole software life cycle including fixing bugs and scaling up the system.

However, development speed is also greatly affected by the availability of relevant libraries.

Crates and ecosystem

Rust is still a very new language, with the first stable release in 2015 and more widespread adoption with the release of Rust 2018. This leads to a common concern that Rust may be lacking some libraries, and so could cost significant developer time compared to languages with a larger and more mature ecosystem.

There are already many excellent crates unique to Rust, such as Serde which I miss when working with other languages. The great accessibility of cargo, crates.io and docs.rs also make it easy to discover new crates.

However, there are still some areas for improvement for the ecosystem as a whole. For example, Rusoto is still slightly less user-friendly than boto3 (i.e. having to create _Request structs for each client request).

Another example would be the need to provide root CA certificates (i.e. in a PEM file) to rustls for use with tokio-postgres-rustls for example. This is a minor issue, but one that is not necessary in more "batteries-included" libraries in other languages (e.g. psycopg2 in Python).

These are small examples but can impact the learning curve for new developers. As a community, I think we should try to create issues for any such "paper-cuts" we come across and contribute pull requests where possible.

Documentation is also relevant to this. Overall, Rust has excellent documentation with the combination of rustdoc and mdBook being used to provide crate documentation and detailed developer and contributor guides. Throughout 2021 we should aim to expand existing documentation, and ensure that it is accessible for new developers to do so too - for example, contributing common use cases to the Rust cookbook. Myself, I struggled greatly with Serde's deserialize_with field attribute until looking at examples.

Hopefully in the future, the establishment of the Rust foundation will eventually be able to provide commercial support to critical projects like Rustls, Serde, and Tokio to ensure the crates are maintained and improved in the long-term. I think this would help a lot with the adoption of Rust in enterprise.

It'd also be great if Rust support were added to jsii so Rust could be used with the AWS Cloud Development Kit directly.

Other small issues

There are some other small issues I'd love to see improved during 2021.

Auto Ref in pattern matching

I couldn't find an RFC for this, but it'd great to be able to use a static &str when matching on an enum containing a String. i.e. so this would be possible:

enum MyEnum {
    WithString(String),
    Other,
}

fn main() {
    let myenum = MyEnum::WithString("test".to_string());
    match myenum {
        MyEnum::WithString("test") => {println!("Was string")},
        _ => {},
    }
}

Currently it is necessary to use a match guard (as far as I know). The closest RFC I could find handles auto deref but not ref. Note that is already implemented.

Blocking Futures without an async runtime

At the moment if any crate returns a Future, the developer has to use an async runtime even if just to block on the future synchronously - i.e. tokio's block_on.

It'd be nice if this were built in to the standard library (for the blocking, synchronous case only) if possible.

Lifetimes of trait objects defaults to 'static

I have been caught out a few times by the lifetime of a trait object defaulting to 'static. Recently, in this PR to the dialoguer crate.

Specifically the original struct:

pub struct Input<'a, T> {
    prompt: String,
    default: Option<T>,
    show_default: bool,
    initial_text: Option<String>,
    theme: &'a dyn Theme,
    permit_empty: bool,
    validator: Option<Box<dyn Fn(&T) -> Option<String>>>,
}

Requires that the validator closure has a 'static lifetime, not the 'a of the struct. This can be fixed by specifying the 'a lifetime explicitly:

    validator: Option<Box<dyn Fn(&T) -> Option<String> + 'a>>,

However, I found the fact that it defaults to 'static quite unintuitive, this seems to be decided by the default trait object lifetime rules.

It'd be nice if this could be adjusted similar to the lifetime elision in structs RFC mentioned above so that if the containing struct has a specified lifetime then the trait object would default to that lifetime (the developer could always specify 'static if required). Perhaps this is impractical or undesirable due to other consequences though.

Deref in closure signature

Related to the PR mentioned in the previous section, we have a Validator<T> that must implement the validate method:

pub trait Validator<T> {
    type Err: Debug + Display;

    fn validate(&self, input: &T) -> Result<(), Self::Err>;
}

However, this means if we want to validate a String we must pass a &String to validate. We cannot pass an &str. If T were an &str itself then we could use AsRef to accept a String in the signature, but the other way around doesn't seem possible even though in the end the &str and &String are equivalent for our purposes.

That is, we'd like to accept any type which T could ref in to, not any type which could ref in to T (which is what AsRef<T> provides us). Perhaps there are reasons this isn't viable and the type arguments used above should change, but it was a frustrating issue to hit as a developer.

As a side note related to AsRef, it'd be nice to introduce the syntax mentioned here.

Summary

I've covered a lot of different issues here, but I really wanted to emphasise my own experience in introducing Rust to a team and the common concerns that come up.

In summary, during 2021 I'd like to see:

  • Rust expand to more domains outside of systems programming (i.e. web development and data engineering).
  • Further language syntax improvements to keep Rust concise.
  • Continued improvements in IDE support (especially outside of VS Code).
  • Expanded documentation of common crates and use case examples (i.e. the Rust cookbook).

In my opinion, a major part of making this possible is making it easier for newer Rust developers to contribute to Rust itself (and major crates). For example, I have written about some issues I've had above and linked to some related RFCs - but I would have no idea where to start with testing implementations to solve any of those issues.

The Rust Forge is a good start for this, but I'd also love to see some worked examples of implementing features and bug fixes. For example, Jon Gjengset's Crust of Rust series has been excellent for covering the basics of the implementations of some common Rust concepts, and it'd be great to see more blog posts focussed on making improvements to Rust itself (or major crates).

Other possibilities are to host specific training sessions for new contributors like the Veloren project has done, or to host community bug-squashing days aimed at new contributors like Arch Linux and KDE have done.

Having a larger pool of contributors would help to alleviate the workload on current maintainers, and bring a more diverse range of perspectives from different problem domains and platforms.

Overall, Rust is already in an excellent position with the most welcoming community and most comprehensive documentation that I have seen in any ecosystem. I look forward to what 2021 will bring!