Ensuring Memory Safety for Trusted Applications through Secure Compilation

Author: Alberto Tacchella, Post-doctoral at Uni. Trento

A Trusted Application (TA) is an application running in a Trusted Execution Environment (TEE) that implements a security-critical functionality. Especially on low-end IoT devices, TAs are typically written in unsafe languages like C or C++, and are thus prone to all the pitfalls of manual memory management. Moreover, TEEs typically do not provide isolation guarantees inside the address space of single TAs, but only between different TAs. This means that memory corruption events between different TAs are prevented, but invalid memory accesses inside the address space of a single TA are not avoided, potentially allowing an attacker to exploit them.

Ensuring the absence of memory errors in the execution of TAs is thus an essential step in the road towards a more secure IoT ecosystem. In this blogpost we explain how the CROSSCON Project is tackling this problem.

Memory errors, and how to avoid them

It is not common for a technical term to be first defined in a blogpost, but this is exactly what happened with the term “memory safety”. In July 2014, while preparing a MOOC on software security, the researcher Michael Hicks scanned the literature for a precise definition of this notion, only to find that most available sources (from published Computer Science papers to the relevant wikipedia article) defined memory safety in a purely enumerative way, namely, by compiling a (rather long) list of possible memory errors and declaring a program memory safe when no error from that list ever happens during its execution. However, what one usually expects from a scientific definition is not simply a list of all the ways in which the defined property can fail, but rather a general condition which is systematically violated in all the problematic cases (and is not violated during correct executions).

Taking inspiration from previous efforts like CCured, DieHard and SoftBound, Hicks then proposed a new definition based on the following two tenets:

a distinction between defined (allocated) and undefined (never allocated or freed) memory regions, where the latter should not be accessed in any case;
a notion of pointers as capabilities, in which each region of defined memory can be accessed only by the holder(s) of a pointer which has been obtained through “regular” means (e.g., by taking the address of a variable or by calling a standard memory allocation primitive).

Then a program is deemed to be memory safe when every memory access it performs respects the above rules, that is, if it targets a defined memory region and the capability associated to the pointer used for the access is valid.

Secure compilation

A compiler is a program that translates code written in one language, called the source language, into code written in another language, called the target language. In most cases the source language is a (relatively) high-level language, like C, C++ or Rust, and the target language is a machine language, that is, a sequence of coded instructions that can be directly feeded to a CPU for execution.

A secure compiler is a compiler that, in addition to performing the source‐to‐target translation described above, ensures that the process preserves the security properties of the source programs they take as input in the target programs they produce as output. This goal is complementary to more traditional properties of compilation like correctness (defined as the preservation of all the possible behaviours from source code to target code), and has started to attract the interest of Computer Science researchers only at the beginning of this century.

In the last 20 years, several techniques have been developed to ensure that various kinds of source-level security properties are preserved at the target level. Unfortunately, existing approaches typically do not scale well to realistic programming languages (even “small” ones like C), and ensuring the efficiency of the compiled code is also a challenge.

The research goals of the CROSSCON Project in this context are:

to design a secure compilation mechanism for (a relevant subset of) the C programming language which is capable of guaranteeing memory safety for compiled applications, and
to address the problem of TA interoperability by extending the scope of secure compilation to secure cross-compilation, in order to target at least the ARM and RISC-V architectures simultaneously.

Hardware-assisted memory safety

Over the past decades many different approaches have been devised to address the problem of memory errors for programs written in the C language. Broadly speaking, we can divide existing solution in two classes. In the first class we have software-based approaches, in which the absence of memory errors is enforced by altering the default (unsafe) semantics of the C language, either by augmenting the source language with new annotations (as for instance in CheckedC) or by adding runtime mechanisms to enforce spatial and temporal bounds on pointers (as in SoftBound). The drawback of the first kind of approach is that applying them to existing codebases requires refactoring and code conversion, which can incur a considerable cost. Approaches of the second kind do not require changes to source code, but are typically very expensive on commodity hardware, with reported overheads falling in the 50–100% range.

For these reasons, in recent times memory error mitigation systems are shifting towards the second class of solutions, which involve some form of hardware-assisted memory protection. Memory protection units (MPUs) and memory mapping units (MMUs), widely deployed on current hardware, can be seen as a very basic form of hardware assistance for memory safety; however, their main purpose is to provide memory isolation guarantees between different applications running on the same machine, and are not suitable for providing memory access control at a finer (i.e., single process) level.

A more relevant tool in this regard is provided by memory tagging, also known as memory coloring. In this scheme, each memory location is paired with an identifier, called a tag or color, that cannot be read or modified outside of kernel-level code. These tags function as a key-lock mechanism: every time a memory region is allocated, a new tag for it is selected, and a pointer to that region equipped with the same tag is returned. At every subsequent memory access, the underlying hardware ensures that each (tagged) pointer can only access a location equipped with the same tag.

In principle, this ensures that each pointer can only access locations inside the memory region from which it was originally derived, thereby implementing the idea of pointers as capabilities that was described above. Unfortunately, this scheme only achieves perfect security when no pair of defined memory regions ever share their tag, which in general requires an unbounded number of distinct tags. In practice, the limited number of tags available on each specific hardware platform makes this approach suitable for guaranteeing a weaker, probabilistic form of memory safety.

Secure cross-compilation based on memory tagging

The secure compiler proposal that we are developing for the CROSSCON Project is based on a two-pronged approach. On the theoretical side, we are building a formal model of a C-like intermediate language that uses an (abstract) memory tagging scheme to enforce a memory-safe semantics for pointer accesses. This memory-safe intermediate language is then compiled into a formal model of an assembly-like language, which abstracts away the architecture-dependent parts of a typical RISC instruction set. The preservation of the memory safety property can then be proved by formal means, thereby providing a solid guarantee for the security of the (cross-)compilation scheme.

On the implementation side, we aim for a prototype compiler based on the well‐known LLVM compiler toolchain. The LLVM infrastructure is rich in tools and features, among which there is an expressive Intermediate Representation (IR) that can be used for analysis and instrumentation purposes. Our implementation will be based on the abstract memory tagging scheme described above, which can then be specialized to cover existing or future realizations of the concept in commodity hardware. In this connection we would like to mention specifically the Memory Tagging Extension for the ARM v8.5-A architecture, which is already implemented in some high-end devices currently on the market and is expected to be widely adopted in the future, and various RISC‐V memory tagging extensions that are currently in the draft phase.

Conclusions

Developing memory safe applications that can be executed on widely different platforms is a huge challenge. Hardware-assisted memory protection primitives are particularly compelling in this regard, since they allow wide-scale deployment of hardened applications at no additional cost. In the CROSSCON Project we are targeting one of these primitives, the memory tagging extensions for ARM and RISC-V, to provide an architecture-independent toolchain for safely deploying Trusted Applications in the ever-evolving IoT world.