Introduction
Welcome to the Rust ABI Wiki! This is a wiki about Rust's ABI and plans to stabilize it. This wiki is very much so a work-in-progress, and we're trying to flesh it out. If you think of anything, open an issue or pull request to discuss it further! Here is this Wiki's Github Repository.
Overview
This wiki is currently organized into three sections: The Introduction, which is what you're reading now; this outlines the reasoning behind, history of, and terms used throughout the rest of this Wiki.
The Discussion pages summarize various points of discussion and link out to other resources for consideration. After reading this, you should have a pretty solid idea of the challenges we face and goals we have.
The Rust Compiler section outlines how the rust compiler currently works in regards to the ABI. This section should highlight all relevant parts of the compiler, and discuss how the ideas proposed via the discussion may be implemented.
With the Rust compiler infrastructure in mind, the Towards an RFC topic intends to form a precise proposal and actionable plan as to what needs to be done next. It's important that we understand the scope of what's possible:
You can focus exclusively on that aspect if you want, but in that case you still need to think very carefully to avoid over-stating what use cases your proposal will actually enable.
—
@hanna-kruppe
, IRLO
So, tossing around the term "ABI" hides a world of work. And you need to consider ABIs you're not working on to preclude conflicts with the ones you are.
—
@vomlehn
, IRLO
Although this Wiki serves as a general purpose knowledge-base and proposal silo, any the implications of any final proposals must be fully understood.
What Even Is An ABI?
ABIs, FFIs, ABI APIs, what's the difference? This document stands to serve as a FAQ as to what an ABI is, why they're important, and so on.
If you think of a question, add it. If you feel an answer is incomplete or lacking, raise an issue - or even better - a PR.
So, what is an ABI anyway?
An ABI, or Application Binary Interface, is the public-facing API of an executable that determines how other programs can call into it.
Wikipedia explains this much more succinctly:
In computer software, an application binary interface (ABI) is an interface between two binary program modules; often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
— Wikipedia
ABIs describe two main facilities:
- How data is laid out in memory
- How functions are called.
The C ABI is the oldest ABI, dating back to C. The C ABI can be used from Rust through the use of extern "C"
. The C ABI, however, is fairly lacking, as it only provides basic datatypes and is rather inflexible.
However, it's important to note that 'ABI' in of itself is a bit of an overloaded term:
More concretely: there are broadly two clusters of meanings for "stable ABI", the first referring to how language concepts are mapped machine code (e.g. data structure layout, calling conventions) and the latter to the ability to change (upgrade) a software component without rebuilding all the software that interacts with this component. The second is ultimately the responsibility of programmers (e.g. deleting a function from a library breaks both source compatibility and ABI compatibility), but since it's required for many of the benefits ascribed to "stable ABI" and requires compiler support too, anyone pondering this subject should either explicitly acknowledge that aspect (and everything that it enables) as out of scope, or think about what it entails -- which goes far beyond "freezing" data structure layouts, calling conventions, etc.
—
@hanna-kruppe
, IRLO
What does it mean for an ABI to be stable?
A stable ABI means that a compiler will consistently produce the same ABI for an executable across versions.
Right now, this invariant is not present in the Rust Compiler, as the compiler may perform certain memory-layout optimizations (such as zero-sizing), different across versions.
Why is a stable ABI a good thing?
To quote from the initial proposal:
There are many benefits a standardized ABI would bring to Rust. A stable ABI enables dynamic linking between Rust crates, which would allow for Rust programs to support dynamically loaded plugins (a feature common in C/C++). Dynamic linking would result in shorter compile-times and lower disk-space use for projects, as multiple projects could link to the same dylib. For example, imagine having multiple CLIs all link to the same core library crate.
TODO: more recent examples.
What would a stable modular ABI look like for Rust?
- Fully specified - There should be no ambiguous cases, either in how to map Rust to the ABI, or how the ABI maps back to Rust objects.
- Versioned - There should be a mechanism to determine which version of the ABI a chunk of code is using by querying the code itself. The version should follow semver conventions, and should be both machine and human parseable, so that linkers and loaders can decide if two pieces of code are compatible (from an ABI standpoint). Under some circumstances, it may be possible for a linker or loader to generate code to bind chunks of code together that have different ABI versions.
- Designed for introspection - There are tools for different operating systems to analyze blobs of code to determine what they do. For Linux, there are nm, objdump, and a number of other utilities. Unfortunately, the tools are limited by the knowledge encoded in the executable formats; in most cases you can't read the docs for a library by reading the library itself. A useful addition would be the ability to store the documentation for a library in one of the segments of the library, with tools that are able to find and display that information.
Additional useful traits
While the ABI we're developing is primarily for the Rust language, it need not be solely for the rust language. Here are some additional traits that the ABI could have that might be useful.
- Useful outside of rust - There are numerous languages with numerous ABI specifications. In some cases, those ABI specifications have been adopted by other languages, even when the ABI is not the best fit for the other languages.
What's the difference between an ABI and an API/FFI?
We can go on all day talking about how nice and ABI would be, but after you say (or rather write) the word a couple hundred times, it loses it's meaning and rather encapsulates a set of rather lofty ideals.
This shouldn't be the case.
An ABI is something quite simple on the surface level: the memory layout and calling conventions of an executable. It can also be used in a broader sense, but it's best to be specific.
The FFI is the user-code interface the language provides to interact with other executables, which may have a different ABI. The FFI forms the transition layer between different ABIs. Currently, in Rust the FFI is most likened unto extern "C"
and repr(C)
.
History
Before we get started, I think it's important we highlight the loose history of the proposal. This will make it easier for new people to get indoctrinated into the conversation we've had thus far.
For those unfamiliar with Application Binary Interfaces, please read the FAQ.
Pre IRLO discussion
So far, there have been three primary stages in the history of this proposal. The idea for a stable ABI arose in a discussion in the Rust User's forum about programming languages that pair well with Rust. Inevitably, the discussion of FFIs arose, which lead to an constructive debate about Rust's ABI, from which then arose the initial proposal.
IRLO disussion
On May 14th 2020, the Initial Proposal was posted to IRLO, where it received quite a lot of attention. There was a lot of discussion surrounding the proposal; so much, in fact, that the discussion became very deep and general at the same time - everyone attacked the problem from a different angle, and, although, after a 3 month period, the conversation subsided and a general consensus was reached, the length and complexity of the intertwining narratives was a bit hard to follow, to say the least. Now that the idea and interest were there, what was needed next was a bazaar-style, collected and directed effort towards enumerating design decisions and painting a clearer picture of what needed to be done. Hence, the Wiki.
Rust ABI Wiki
On October 23rd, A new effort to summarize the discussion and develop a solid, technical RFC. This effort is divided into three parts: First, the discussion on IRLO needs to be summarized and woven together into a coherent narrative; second, using the outline of the discussion, the relevant parts of the current Rust Compiler Infrastructure need to be targeted and Understood; finally, a terms must be defined and a solid RFC must be created.
The Path Ahead
Once this RFC has been created, this wiki will have fulfilled its purpose, and we can move towards following the RFC to create an actionable implementation. This does not mean that no experimentation should occur in the interim - au contraire - rather, a solid actionable plan is needed before any extensive implementation can be developed.
Summary of the Initial Proposal
The title of the initial post was, quite succinctly, 'A Stable Modular ABI for Rust'. Let's go over each of these terms in a bit more detail to get a better picture of what we're dealing with.
'Stable'
Currently, Rust's ABI is only stable across same versions of the compiler. This means that binaries that are built with different versions can not be dynamically linked, as the compiler does not guarantee that executables built with different versions will share a common interface.
This proposal calls for the stability of such an ABI to be guaranteed, at the least allowing for the compiler to link to older dynamic libraries built with older versions of the same compiler.
'Modular'
A stable ABI is no good if it is not resilient to future change.
'ABI'
'for Rust'
Line of Reasoning
Here's the line of reasoning behind the proposal, quoted:
- A stable ABI would be really nice for Rust
- But it would also be difficult to do
- We propose a modular ABI, where compiler-time macros can determine the layout of datastructures so that they are ABI-compliant.
- We discuss caveats
- We ask for feedback.
—
@isaac
, IRLO
Difference Between the Initial Proposal and the General Consensus Now
After the initial proposal, quite a lot has changed. The initial proposal was very focused on using compile-time macros to determine memory layout. Since this proposal, it's become apparent that more than just memory layout is at stake, and new features must be introduced for FFI creation, binary linking, calling convention, niche expression, etc.
This is the very first initial proposal that led to the subsequent discussion on IRLO. This document is a bit outdated, and is largely here for archival purposes.
Proposing a stable modularizable ABI interface for Rust
Based on the points from the discussion here.
Introduction
Rust is a powerful systems programming with strong memory guarantees. Rust allows for concise expression at a high-level, while still producing fast low-level code. However, Rust does not guarantee the calling conventions and layout of structures in memory, which makes it difficult to write external applications that interface with Rust; Rust lacks a standardized ABI. Standardizing Rust's ABI has been brought up before, but has usually gone nowhere due to the difficulty of the task. In this post, we outline the benefits and stumbling-blocks of a stable ABI, as well as suggest a semi-novel technique as to how such an ABI could be implemented.
Benefits
There are many benefits a standardized ABI would bring to Rust. A stable ABI enables dynamic linking between Rust crates, which would allow for Rust programs to support dynamically loaded plugins (a feature common in C/C++). Dynamic linking would result in shorter compile-times and lower disk-space use for projects, as multiple projects could link to the same dylib. For example, imagine having multiple CLIs all link to the same core library crate.
Although this use case is already rather well covered by abi-stable-crates
, there are still many more benefits beyond linking crates dynamically. A stable ABI would allow Rust libraries to be loaded by other languages (such as Swift), and would allow Rust to interop with libraries defined in other programming languages. Non-Rust crates could be integrated with Rust toolchains; providing an ABI would also allow outside code to rely on Rust for performance-intensive tasks. Cross-language compatibility would increase the diversity of Rust's package ecosystem.
Quote: Imho one of the biggest mistakes C++ ever made was not stabilizing its abi; swift just stabilized theirs and is already reaping the benefits, swift system libraries, the swift runtime, swift UI libraries, all dynamically linked and backwards abi compatible.
Stabilizing the Rust's ABI would allow for cross language interop and dynamic linking. " extern "C"
as the lowest common denominator is too low for Rust" (Quote).
Recently, the Fuschia OS Team at Google decided to ban Rust's for use in Fuschia microkernel, citing C as an alternative because of its stable ABI. Not providing a stable ABI ultimately hurts Rust when getting down to the metal. Given similar languages like C and Swift have a stable ABI, I see no reason why a stable ABI would not be implementable for Rust. As discussed here, some ABIs/FFIs have already been written using proc macro and the like.
Potential Issues
However, a stable ABI is not all peaches and roses. Having to standardize the memory layout of data can limit the number of optimizations the compiler can perform.There has been a lot of work on optimizing laying out fields in structs in reliable and ABI-compliant ways. There are a large class of optimizations that can be done in compliance with an ABI; since an ABI solidifies the layout of data, more reliable bit-twiddling and the like can occur.
While discussing the matter, a point was brought up that the ABI could be modularized. A modularized ABI would be optional while compiling. This modular ABI could be published as a versioned crate. If the ABI ever needs a backward-compatibility breaking change, the change could be made within Semver. Alternatively, a new ABI-compliant compiler backend could be developed, or the current compiler backend could be extended to support an ABI feature flag that would toggle ABI compliant builds.
However:
Quote: Depending on the implementation, if we want to make ABI plugins, to avoid stabilizing the compiler's built-in ABI, we might run into another problem because we have to stabilize the plugin interface, which could be another can of worms.
Standardizing the ABI would take a lot of work. A poorly designed ABI is worse than not having an ABI at all. And as we all know, the right solution is often the hardest one.
Another downside is that allowing ABI crates might not stabilize Rust's ABI, there'd just be ABI fragmentation. Although this is a genuine concern, a 'master' ABI crate with Rust's 'official' ABI could be developed. This would standardize Rust's ABI, while still allowing other crates with other ABI's to be written for interop with other ABIs, like Swift's. Additionally, because modular ABIs are opt-in, ABIs would be used only where explicitly necessary.
Implementation Proposal
So, what might this modularized ABI look like? Roughly speaking, an ABI would be defined by a series of macros in a crate which specify the layout and calling conventions of data structures according to that ABI. During compilation, while determining the layout of the data, the layout information provided by the ABI macros would be used. The end-goal would be for something like #[repr(RustABI)]
or $ cargo build --release --abi rust-abi
to be plausible.
Let's get into more detail. Right now, the closest analogue to a stable Rust ABI is the abi_stable
crate. abi_stable
uses #[repr(C)]
to create ABI-compatible data structures. This is a step in the right direction, but every ABI-complaint type has to pass through abi_stable
's mechanisms. These data-structures are also more expressively limited. For example, every abi_stable
ABI struct has to contain ABI compatible fields - and some Rust types, like Result
, aren't compatible at all.
A modular ABI could solve this issue. An "ABI" Rust crate is a proc-macro-like crate that determines exactly how each byte of a data-structure should be laid out in memory. To do this, the "ABI" crate should provide a macro each standard Rust data-structure (struct
, enum
, tuple
, etc.) When a data-structure is marked as ABI-compliant (either through a #[repr(ABI)]
proc macro or compiler flag), the compiler calls out the "ABI" crate which recursively lays out said data-structure in an ABI-compliant manner.
There are a few issues that still need to be addressed. How do pointers and memory management work across FFI boundaries? We propose that when ABI-compliant data is transferred across an FFI boundary, it should be either copied or moved. Once some data has moved across an FFI boundary, the only way to reference that data is to use the copy, or have the program the data was transferred to transfer it back. This copy/move borrowing technique is merely a suggestion, as there is probably a better way to do it (semi-related post).
To determine the layout of data for Rust's own ABI, a minimum API would have to be found. Rust currently provides many niche optimizations and field ordering techniques to increase performance - a stable ABI might interrupt or prevent some of this. However, as mentioned in the Potential Issues section, there are ways to work around this. Different calling conventions could be supported through a proxy assemble stub or the like, but the devil's always in the details.
Multiple ABI crates would be able to be defined—for example, there could be an abi_swift
crate for interop with Swift's ABI—Rust itself could have it's own ABI in an ABI crate titled abi_rust
or the like.
Quote: The potential to have different ABIs (e.g.,
abi_rust
,abi_swift
) that are used concurrently in the same compilation would permit Rust programs to act as the "glue" between external components that use incompatible ABIs.
Closing Thoughts
We hope that this outline of a very rough specification will provide a launching point for the ultimate development of a stable modularized ABI interface for Rust. Such an ABI would expand the number of applications that Rust could be used for. A stable ABI would standardize dynamic linking between Rust crates, minimize the amount of space-time used during compilation, allow for cross-compatibility between Rust and other programming languages, and increase the plausibility of Rust as a kernel-level language. Something like this takes hard work and good communication, so if you have any questions, comments, concerns, feedback, or other ideas, please don't hesitate to share.
Summary of the Discussion
repr(C)
as the Lowest Common Denominator
Headers
In the C world, there is a relatively simple relation between these two aspects: everything you put into ("public") headers is part of the ABI, and if you want the ability to e.g. change the layout of a type without breaking ABI compatibility, then you need to make that struct an opaque type in the headers and e.g. expose getter/setter functions for fields whose existence you want to guarantee.
—
@hanna-kruppe
, IRLO
A Clarification of Terms
Generics
The way Rust compiles generics, for example, necessarily inlines lots of "internal" library code (full of hard-coded field offsets, type sizes, etc.) into consumers of the library.
Addendum: when I wrote this sentence I also fell into the trap of focusing just on the "ABI" in the narrow sense of choices made while mapping Rust to machine code. Just as important is the fact that the library's code is copied into the consumer at all. So if a bug is fixed in the library, even if you can recompile that library and re-link everything (because you went through great effort to make type sizes and calling conventions and so on stable), the bug fix still won't reach the applications that use the library. Or will reach it inconsistently depending on where a fresh copy of the generic code was instantiated and where an existing copy of the code was reused (see: -Zshare-generics flag).
—
@hanna-kruppe
, IRLO
Generics are not a problem as long as all types are known in the interface exposed through the ABI. So it wouldn't be a problem to return
Result<i32, String>
or evenResult<(), Box<dyn Error>>
(assuming trait objects have defined ABI).They're only an issue for functions like
pub fn generic<T, E>() -> Result<T, E>
that can't be monomorphized on the library side. For these functions the options are:
Just forbid them. Library interfaces would have to use dyn Trait or concrete types instead. IMHO this is quite sensible limitation, especially for an MVP.
Require defining ahead of time which parameters can be used, and compile monomorphic versions just for these types (similar to template instantiations in C++)
Do what Swift does and compile a universal version of the generic function that uses run time type information to support arbitrary types. It's a very clever approach from ABI perspective, but it's equivalent of changing everything into dyn Trait, so it may be a poor fit for Rust.
—
@kornel
, IRLO
Just forbid them. Library interfaces would have to use dyn Trait or concrete types instead. IMHO this is quite sensible limitation, especially for an MVP
I think doing that is very reasonable. Then the library developer could still get the second and third behaviors in many cases by doing something like:
pub fn int_generic() -> Result<i32, Box<dyn Error>> { return generic<i32, Box<dyn Error>>(); } pub fn dyn_generic() -> Result<Box<dyn Any>, Box<dyn Error>> { return generic<Box<dyn Any>, Box<dyn Error>>(); }
—
@tmccombs
, IRLO
- Just forbid them. Library interfaces would have to use dyn Trait or concrete types instead. IMHO this is quite sensible limitation, especially for an MVP.
- Do what Swift does and compile a universal version of the generic function that uses run time type information to support arbitrary types. It's a very clever approach from ABI perspective, but it's equivalent of changing everything into dyn Trait , so it may be a poor fit for Rust.
Both these options make sense; generally speaking, I think Rust is leaving a lot of potential on the table with its current dyn-safe rules. Finding a way to make more traits dyn-safe (for instance, by adding workarounds to the "no associated constants" rule) would help with both of the above options.
Also, implementing the second one would probably help a lot with compile times in debug mode, since it removes the need for monomorphizing most template functions.
—
@PoignardAzur
, IRLO
How about defining a stable ABI subset for "easy" parts of Rust, such as non-inline functions, non-generic structs?
Then std could be split internally into std-abi-stable and std-hard-parts. Crates could link to std-abi-stable dynamically to reduce bloat, while linking std-hard-parts statically. I imagine over time, as more ABI and std features are set in stone, more things would be moved to std-abi-stable part.
For example, path functions in libstd are generic for P: AsRef
, but std already uses a pattern of fn join<P>(&self, p: P) { self._join(p.as_ref()) }
to reduce generics bloat and defer implementation to non-generic _join method. That non-generic implementation could be moved to an ABI-stable library without need to support generics in the ABI.
This is similar to how C libraries use macros and inline functions in the .h files. The header files are the API, but not everything is in the ABI. The ABI is only for whatever these macros expand to.
—
@kornel
, IRLO
@kornel I do want to see a "safe ABI" subset, which is bigger than C and smaller than fully general Rust.
However, having a shared library for cases like _join would require making the internal _join interface stable, which would increase our stability surface area. And in practice, I don't think most people who want shared library support specifically want it to save disk storage space. They do sometimes want it to save RAM, and a shared library might help a little with that when multiple Rust programs are running simultaneously. But mostly, shared libraries make distribution maintenance easier: you can upgrade a library without rebuilding the world. This wouldn't necessarily solve that problem. And I think it would in practice make distribution of Rust binaries harder for many people, because then they'd need to supply the matching library.
—
@josh
, IRLO
The Swift ABI?
A Plugin-Style Architecture
Laying out Memory
TODO: flesh out Include this article
Memory Management Across FFI
An issue with an FFI is that any memory transferred across boundaries is, in essence, being managed by two programs at once. This is a bit of an issue - Rust, as a language, provides strong memory guarantees - and the point of a clean FFI API is to make it as simple as possible to use; an unsafe
block should not need to wrap every call to and from an external ABI.
With that in mind, what is the solution to this? How can data be transferred across programs without violating data lifetimes?
Once some data has been moved across a FFI boundary, the program that passed the data on no longer retains ownership of that data. This is similar to move semantics in closures, etc.
What about pointers?
The only consistent answer to that is that the calling convention of the function pointer (extern "Rust", extern "C", extern "Swift") is part of the function type, the same as layout (#[repr(Rust)], #[repr(C)], #[repr(Swift)]) is part of the struct type.
— @CAD97, IRLO
TODO: reread through discussion and expand on this point.
Calling Conventions
ABI Selection
OneRing and ABI Boundaries
It seems like developing language-independent ABI which significantly improves over C, without being Rust specific, is possible. Slices, tagged unions, utf8 strings, borrowed closures are features which immediately come to mind and have obvious-ish implementations. —
@matklad
, IRLO
Niches
API/FFI, or, Code as Glue
It is important to have a robust API/FFI, making it easy to glue C and Rust together, and pass around safe data structures like counted buffers without writing a lot of repetitive unsafe glue code. —
@josh
, IRLO
A Final Picture
Rust's Compiler Infrastructure
Now that we've gotten a solid idea about what ABI is, and how it functions, let's talk about Rust. Specifically Rust's compiler. Rust's binaries do have an ABI - but what does that ABI look like? Where is it determined? What's preventing it from being stable?
These are some questions we hope to answer in the following chapters.
Towards an RFC
Now that we have a solid understanding of both ABI and how they relate to Rust's Compiler, I think it's time we consolidate what we've learned so far and outline the general topics for a (Pre-)RFC.
A Minimum Viable Subset
May people and organizations have spent significant effort over the years developing ABIs. This page is a list of useful references and resources that show what others have done that may be useful to the project.
- Linkers and Loaders by John R. Levine. Explains what linkers and loaders do, touching on ABIs and object file formats. Copyrighted in 2000, so some of the information on object formats is a little old, but the basic ideas and issues haven't changed too much, so a good place to start.
- x86 calling conventions
- Microsoft® x64 calling convention
- x32 ABI
- System V Application Binary Interface
- System V ABI at osdev.org