Now that we know what Bitcode actually does, lets review the security implications.
Compiler optimizations correctness is nothing new and of course it also applies to programs not distributed with Bitcode. But usually, the developer of the application has complete control on what types of optimizations will be performed and on what files these should be applied. Bitcode doesnt provide any fine-grained control over this. Apple does not provide any documentation or sample configuration of what optimizations are applied on the LLVM-IR. It acts like a black-box, and you cant reproduce the output locally as far as I know.
Compiler optimizations correctness has been studied for a long time. A recent paper by Vijay DSilva, Mathias Payer and Dawn Song presented at LangSec 2015, analyses optimizations from a security perspective. Here are a few examples of what could go wrong:
A compiler optimization that manipulates security critical code may cause information about a secure computation to persist in memory thereby introducing a persistent state violation.
Examples: Dead Store Elimination, Function Call Inlining, Code Motion
The term undefined behavior refers to situation in whichthe behavior of a program is not specified. Language specificationsdeliberately underspecify the semantics of someoperations for various reasons such as to allow workingaround hardware restrictions or to create optimization opportunities.
Examples of undefined behavior in C are usingan uninitialized variable, dividing by zero, or operations thattrigger overflows.
A side channel leaks information about the state of thesystem. Side channel attacks allow an attacker external tothe system to observe the internal state of a computationwithout direct access to the internal state. We say that a sidechannel violation occurs if optimized code is susceptible toa side channel attack but the source code is not.
Examples: Common Subexpression Elimination, Strength Reduction, Peephole Optimizations
All of these bugs are bad. With cryptographic implementations in mind, all of these represent important vulnerabilities. From key leakage to timing attacks via fault attacks, a lot of things can go wrong.
It is quite frequent for performance or security sensitive applications to include inline assembly. Cryptographic implementations are often written directly in assembly to prevent any unwanted compiler optimizations and be able to optimize operations tailored to the algorithms requirements.
With Bitcode, developers cant use inline assembly anymore. It not only affects cryptographic applications, VLC media player uses inline assembly to speed up some graphical operations.
In his 1984 Turing Award Lecture Reflections on trusting trust, Ken Thompson describes how compilers can be used to inject trojans.
Extract from Ken Thompsons Turing AwardLecture
An interesting example of the inherent trust issue in compilers has been doing rounds recently with the detection of XcodeGhost, a modified version of Xcode that reportedly infected millions of users. It is already difficult to trust compilers. LLVM being open-source and reviewable makes it easier to trust than a blackbox. But that doesnt entirely solve the problem since you will need a compiler binary to compile your compiler.
The centralization of the building and signing process is what worries me: an adversary could find a vulnerability in the LLVM backend to obtain remote code execution on Apples Bitcode compilation infrastructure to inject a compiler trojan that would affect every single app on the App Store that was submitted with Bitcode.
Compromising Apples infrastructure isnt the only way to backdoor iOS apps in that setting. Since Apple is going to centralize the process of building application binaries, it will become really appealing to governments desiring to get lawful access to application content, circumventing any kind of encryption, no matter how strong it is. An article published today by the Washington Post explains how the Obama administration working group has been working on compelling phone manufacturers or App Store services to provide a way to insert a backdoor on the device.
The second approach would exploit companies automatic software updates. Under a court order, the company could insert spyware onto targeted customers phones or tabletsessentially hacking the device.
Adding backdoors to Bitcode applications becomes easier than ever. Apple has your applications IR (easier to decompile than binary) with all the associated symbols and will recompile your app.
This is not a new capability. Of course, you can easily do it on current binaries too. But the issue is that Apple will compile your code, and you dont get to see the resulting binary.
There is currently no easy way of verifying (save with a jailbroken iPhone) that an application on the App Store is the one you uploaded as a developer. And Im not even talking about reproducible builds, which consist of reproducing the same binary from a given source code. The biggest pain point up to now for verifiable and reproducible builds was FairPlay, Apples DRM applied to applications. App Store binaries are encrypted (with a key that even the apps developer doesnt have). Hence, they have to be decrypted before being able to analyze with binary analysis tools such as Hopper, IDA
FairPlay encryption already made it really difficult for developers to make sure that Apple distributes the build you think they are distributing. But at least, with jailbroken iPhone, Im currently able to get some idea of what the binary Im distributing looks like and compare it to the one I submitted to Apple.
With Bitcode builds, this becomes significantly harder. Since I cant reproduce a target binary since I dont know what optimizations Apple is performing on my binaries, I cant diff them to find possibly altered parts. I have to analyze the entire binary to figure out that nothing nasty is going on. It became so bad that some developers werent even able to symbolicate the crash logs of their own applications since binaries were built on the server and Apple was failing to provide the symbolication files (dSYM).