TL;DR: After two years of experimentation, I finally managed to run D code inside my browser. The conversion chain is D-to-LLVM-to-C-to-ASMJS, and uses ldc, Emscripten, and an unexpectedly useful tool from the Julia community.
Demo here: http://code.alaiwan.org/dscripten/full.html
First I had to get it to build. LLVM API are incompatible between major releases, and the ldc build system relies on the version number which is reported by llvm-config. However, as emscripten-fastcomp API is halfway between LLVM major releases, this version number makes almost no sense, and I had to modify ldc to get it to build.
After monthes of trial and error, I finally gave up, and, with a heavy heart, went back to C++. I made two toy projects using the SDL and vanilla Emscripten.
Writing these two projects was an incredibly instructive experience. It allowed me to fully realize the inherent limitations of running inside a browser. This was also the occasion to learn the OpenGL core profile, which was, at the time, the only way to do 3D graphics from Emscripten.
The source code for these two projects is available at the following Bazaar repositories:
$ bzr checkout http://code.alaiwan.org/bzr/games/shooter
$ bzr checkout http://code.alaiwan.org/bzr/games/deeep
It turns out LLVM currently has a “C++ backend”: however, it produces C++ code making LLVM API calls, whose execution results in the construction of LLVM data structures representing your program. Obviously, this isn’t what we want here.
LLVM used to have a real C backend, translating LLVM bitcode to standalone C code. This is exactly what we need. Alas, this project was unmaintained, and it was discontinued by the LLVM developers mid 2012, with the release of LLVM 3.1.
The LLVM C backend seems to have been resurrected by Julia developers in the of summer 2014. The project is named llvm-cbe and is available on github (https://github.com/JuliaComputing). And it’s great, because it means now I can translate a D program back to C code … and then feed it to Emscripten!
Well, in theory. In practice, things didn’t come out very smoothly.
Firstly, the LLVM C backend in its current state requires LLVM 3.7 exactly. And of course, you can’t compile it against fastcomp (the emscripten LLVM fork). So we’re gonna have to deal with two LLVM toolchains here, let’s hope everything will be compatible.
Secondly, the LLVM C backend sometimes generates invalid C code, i.e code which doesn’t compile. Once again, it seems that its authors restricted themselves to the bitcode subset targetted by clang (although, with a lot of effort, it’s possible to craft a C++ file whose compilation with clang will produce evil bitcode, and this evil bitcode will lead llvm-cbe into producing invalid C code: https://github.com/JuliaComputing/llvm-cbe/issues/2#issuecomment-236424508 ) .
I had to fork my version of llvm-cbe, and to fix some issues myself, hoping that my pull requests will be accepted.
At this point, I’m able to translate some D programs to standalone-gcc-compilable C code. Which means the only thing left is to feed it to Emscripten.
So, to summarize the whole working compilation pipeline:
Now comes the fun part: writing some D code which uses this toolchain. What I ended up with is a minimalistic real-time game using the SDL.
You can play the demo at: http://code.alaiwan.org/dscripten/full.html
The source code for the demo, the build scripts, and the toolchain deployment scripts are available at:
Although most D programs make heavy use of the runtime, with some effort, programs that don’t require it can be written. Just tell your linker to omit any default libraries, and your code is guaranted not to make any calls to the runtime.
This is what I’m aiming for at this point: having the runtime feature-level of C, which means no GC, no dynamic arrays, no runtime type information, no “new” operator, and no exceptions.
Due to an error in llvm-cbe (related to global variable declaration order), I don’t have support for classes yet.
On the other side, I still benefit from some great parts of D happening at compile-time: mixins, templates, traits, and of course CTFE.
As you may have seen if you played with the deployment scripts, the toolchain is not very pretty. It requires 3 directories to be added to your PATH, amongst them, two different versions of an LLVM toolchain.
An obvious enhancement would be to use the WebAssembly backend of the upstream LLVM, and then using Emscripten+Binaryen to convert the generated WebAssembly to asm.js. This would allow me to ditch llvm-cbe, emscripten-fastcomp and emscripten-fastcomp-clang, resulting in a lighter and more consistent toolchain.
At the moment, I only gave a quick (one weekend) try to this idea, but the incompatible version requirements of the different tools are driving me crazy. llvm-cbe requires LLVM version 3.7 and doesn’t provide a CMakeLists.txt. However the autoconf-based build system of LLVM is deprecated in favour of cmake, recent versions of LLVM (3.9 at this time of writing) don’t provide a ./configure script anymore.
However, I kind of like the idea of being able to convert anything back to C code. C compilers are going to be around for a long time, C as a target platform is an incredibly safer bet than any other format, including LLVM bitcode.
My goal is to continue writing my programs in D, in the most portable way, without fear of potential future platform restrictions. However, the technique I just described is too fragile at this point. It works around many incompatibilites and limitations. However, I believe this proof-of-concept paves the way for non-hacky enhancements.
If you have any idea on how to improve this technique, please let me know. Don’t hesitate to fork the project on github, to make suggestions, or to ask me questions, or even to suggest enhancements to the current flappy-bird-frustrating gameplay of the demo!