The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or clickhereto continue anyway

Rethinking Compiler Design




rethink of optimizing compiler design


It is worth noting that several features also imply properties of language design.

Coded as a caching interpreter:

Reversal on the


compilation strategy that provides simplicity and dynamism associated with scripting languages. A


operates on exactly one


(see below). In at least one readily accessible shell-mode or interpreter-mode, one should have access to an REP loop allowing one to execute commands, inspect values, and essentially extend a page. Taking advantage of pre-parsed and precompiled units is accepted (e.g. to reduce latency), but should be hidden behind the scenes (no explicit preparation of build dependencies). Likely Effects:


Instead of a sharp break between the


and assembler, the transition will be smooth. An application program will act as an extension to the language; Functions add to the API; Syntax rules add to the grammar. This might be achievable via some sort of language neutrality with several languages parsing to a common set of core semantics (akin to .Net or JavaVM). Or it might be achieved all in one language. Either way, one can essentially


from one programming environment.

Optimizations factored from compiler:

Optimizations will be externalized. This is analogous to having external libraries that provide <stdio> or <maths>, except with an


flavor. It may help to have an official form of 'hot comment' and some ability to manipulate the post-processor in order to select and configure optimizations at a sub-program level and extend the language with new optimizations.

Compiler internals cleanly exposed:

This has far reaching consequences. You can get at lists of all symbols. You can call a UniversalIterator


that will walk through any data that has a defined structure, whether the data was defined by you or the compiler, without having to write the iterator yourself. You can add new types on exactly the same footing as internal types. You have precise control over placement of data, if you want it. Any kludgy design within the inner recesses of the compiler is exposed

(thereby encouraging the designer to clean it up :-) )

. (



Abstract Compiler Definition:

Much of the compiler would be written in an abstract high-level way as a




compiler (related




). It would be also useful to have a data-driven compiler that separates various policies and data (e.g. machine services, costs, optimization policies, resource management policies, and security policies) and somehow combines these for the target machine. However, such features may be better reached through


while favoring a separation of concerns.

Bells and Whistles

Constancy as an attribute:

Whether a variable is variable or constant can change at 'run time'.

See discussion below under ConstancyAsAnAttribute?.

Stack/calling-format modifiers:

Makes coding of co-routines easy (use multiple stacks). Makes non-standard calling conventions easy - e.g. modifier indicates pass data in registers, in floating point coprocessor, or (for booleans) by calling alternative instances of the same code.

Op-Codes first class types:

Type system extended beyond that of C so that opcodes can be defined as structured types, using union/struct/bitfield etc. For example, the type system will accommodate self-relative pointers, that is pointers that give an offset from their location rather than an absolute address. One consequence of op-codes becoming first class types is that the development environment's built in system for displaying data also serves as a disassembler.

Cache Specification Layer/Library:

Custom caches can be generated from the function specs, to encourage much wider use of cached results.

Machine Architecture using ArchitectureDiscoveryTool?:

Rather than the error prone and laborious process of entering machine architectural details by hand, these could be generated by an ArchitectureDiscoveryTool



). Particularly valuable if multiple architectures are being targeted. A middle-man database of architecture details would also be a smart idea, as it would allow hand-entry of details that such a tool fails to obtain and would be better applied towards cross-compiles.


Debugging could be provided on multiple abstraction levels of the transformed code by conforming to a



Implementation Strategy (Take 1)

So where to start? There needs to be something more regular than a typical processor instruction set or typical high-level language at the conceptual core.



approach (proposed by


) shows that the starting point is only a concrete initial representation of a more abstract approach. A decision to use a particular data representation in the initial implementation does not lock you in to that long term. The compiler/interpreter's functions will in the course of development gain sufficient power that you can introspect the internal workings of the compiler and change, for example, the way that trees are represented internally

without a radical re-write


The suggested implementation route is to implement an interpreter and add optimizations to it that make it behave like a compiler. The observation is that a constant-folding interpreter can cache its results and make them available for later. Rather than writing both an interpreter and a compiler we instead get the interpreter to

interpret an instance of itself interpreting the code being compiled

. To the top level interpreter, the second level interpreter and the code being compiled are all constants. Constant folding optimizations optimize away the calculations of the second level interpreter, optimizing away the interpretation overheads and leaving just the useful instructions, or that's the theory anyway.


A considerable body of code is needed before a useful system emerges.

This sounds like the SecondFutamuraProjection


, which is sort of explored in



It is. But in


you cannot go down to the metal without an external tool (like in Tak 2 below). So you have neither of the advantages. The idea is to have a primitive DynamicCodeGeneration


facility, call it StructuredSelfModifyingCode


. It's to simple self-modifying code as loops are to goto.

Implementation Strategy (Take 2)

An alternative implementation strategy is to initially forget about native object code generation! Instead focus on the introspection and the high-level side of optimizations. In this guise the games compiler becomes a code generator. Let it parse C to its internal representation, process that with customizable re-write rules, and then generate C once again rather than object code. Much of the work on re-write rules and on universal iterators can be done and debugged, and the ideas debugged, without going near object code.

Some uses for the code generator:

  • Auto-generate efficient C code for working with different formats of bitmaps, from a high-level description.
  • Add profiling code.
  • Add serialization code.
  • Provide data-relocation code.
  • Provide data packing/unpacking/endian-conversion code.

This kind of thing is sufficiently useful in itself that the new 'compiler' code could start being used for non-toy projects.


  • Full control over the lowest parts of the compiler (i.e. CodeGeneration) will not initially be possible with this approach. Therefore the necessary high-performance code might never be generated.
  • To actually compile anything an external program (here: a CeeCompiler?) is required.

  • Although the BreakEven requires more code than the first strategy, it may be quicker and easier to reach that BreakEven, because of the maturity of the existing C development tools (better debugger and IDE). But see MetaDebugInterface.

The page


is discussing an implementation strategy that is close in spirit to 'implementation strategy 1'. The general consensus seems to be that a lot of up-front design work, or at the very least extensive domain knowledge, is needed if the initial take is to have a hope of evolving smoothly into an industrial strength (gcc or better) optimizing compiler. The strongest argument backing up that opinion is around differences in datastructures, but to this reader's eyes amounts to no more than moving from one specific structure to another which has the first one as a special case.

It is correct, that simple AST transformations alone will not suffice to do e.g. interprocedural ConstantFolding


and multi-variate specialization. Further data-structures/algorithms will be needed to support the basic GraphRewritingCompiler



  • StaticSingleAssignmentForm and ControlFlowGraphs to capture control and data dependencies between expressions.
  • a LoopNestingForrest?, that can handle irreducible ControlFlowGraphs? to perform loop fusion and other loop optimizations efficiently. This is often used to guide source-to-source transformations in C and Fortran code followed by compilation with code-motion optimizations deactivated. Therefor it use results in AST-transformations for our approach.
  • an efficient algorithm for RegisterAllocation (the actual code generation can be done by term-rewriting as approaches like BEG show - see

These (or like ones) are needed for an industrial strength compiler no doubt. In the first step one would use simplified algorithms and data-structures and keep the information in attributes attached to the AST nodes. In the long run one


consider a hybrid structure, where the AST nodes are updated automatically with dependency information etc. --


Related On Wiki


: Perhaps this section can all move to


? Or just be deleted?

Look for contribs on compilers and optimization by:

Or go straight to


describes some of the issues facing games developers.

For definitions and some off-site links, try:

Other sources

Possibilities for borrowing and stealing ideas from other systems?

Q: Why


, and why the original title "Games

Compiler" [now renamed]??? It looks more like MetaCompilerLanguage


or SelfCompilerLanguage


or even CompilerOptimizedLanguage


. --


A: You are correct. However there seems to be more active interest in control over optimization amongst game developers than elsewhere, and more understanding of what is required there than elsewhere. My own interest comes from DNA sequence analysis, where some analysis programs can run for months. --


How to break the project in to small enough stages:

There are too many tasks that cannot easily be broken down into term rewriting tasks. E.g. the low-level parts of code-generation like register-assignment. Many of these tasks can be fitted into fixpoint iteration over equations on vectors over program points of lattice elements representing knowledge over the points (i.e. the 'attributes' calculated above). But I'm not yet clear, how these equations can interact cleanly with the term rewrite rules. My idea was to use a work-list approach, just evaluate what it knows at each point and postpone everything else - inefficient initially, but ripe for higher-level optimization later. Converting the well-knowns equations for typical problems like constant propagation into the framework should be straight forward, but other points of the integration like memory-management and OS-calls are still unclear (as of Feb 2005).

The basic goal is to reach the


(see there) as early as possible. Would like to ensure that even if a large body of code is required to get a reasonable compiler, that code will be in the target language nearly from the beginning - though that's not an absolute requirement - converting later/semi-automatically is a workable option.








This discussion section may stay quite short, as I'd like to,

or for you to

, refactor discussion into


rapidly. --


We want something more regular than a typical processor instruction set or typical high-level language, as the conceptual core. Bootstrapping is a problem with this objective, because efficient compiled code is a strong requirement.

The following ingredients are needed:

  • A simple and regular parser for expressions with a rich set of operators and precedences (MinimalParsing is easy).
  • A TermRewriting or graph-rewriting system, that can synthesize rule-valued attributes (to allow reflection and model types) and can delay rewriting (lazy execution or worklist).
  • An initial set of primitive rewrite rules for creating rules and for executing expressions as machine code.
  • Rules/Expressions describing formation of machine code of the target machine.

These can be implemented in plain C code. I already have a parser and parts of the rewriting engine as well as an idea what the rewrite rules should look like. This simple approach cannot guarantee convergence or confluence of the rewriting (reflection is obviously '?' too strong for that), but it should be enough to bootstrap a higher-level version of the engine (e.g. one with backtracking capabilities and explicit state).



Q: Why is this important? -- NickShaffner?

A: Because of constant folding. If parameters or a 3D model is read in from an external file, and then remains unchanged thereafter, certain 'constant folding' optimizations at run time become possible.

I think that this way may lead to consistency problems. It is not 'well-defined', in the sense that it cannot be checked or enforced. I'd rather prefer to do


(or in this context: lazy compilation) on the expression using the 'constancy' in question. This is the way used in e.g.




. And it would be quite easy to do with a compiler, that is already able to compile itself. --


A cleaner solution, semantically, is to distinguish immutable values from mutable state at a language level. One loads the 3D model as a value which is 'constant' immediately. Values tend to be big (e.g. full collections, records, tuples, and trees) and variables contain a reference to the value. This approach is very good for many optimizations as well as various orthogonal properties (transparent distribution, transparent persistence, SoftwareTransactionalMemory, versioning and FirstClassUndo, etc.). For PartialEvaluation, it also helps if functions are distinguished from procedures (i.e. reject 'impure' functional programming; favor interleaved procedural/pure-functional instead. See notes in IdealProgrammingLanguage).

My notation for this (in the context of


) has been to use two prefix operators ('\expr' and '#expr') to specify late compilation. '\expr' delays compilation of 'expr' until run-time (run-time of the containing expression, which may very well be during compilation). '#expr' specifies evaluation of 'expr' during compilation of the containing expression.

Example: 'for (i=1; i<10000; i++)\{a += #sin(b)}' This will replace 'sin(b)' with the actual value during the loop. Of course, this reduction of strength might be automated by a state-of-the-art compiler,


this explicit way has some advantages:

  • Can be used even if 'sin' has side effects.
  • Can be incorporated as primitives in our GraphReWritingCompiler and thus be used to implement StrengthReduction.
  • It is a cleaner mechanism than '#include', '#define' etc., but can be used in the same way: '#(a=1)' defines a constant, '#System.print("hallo world")' prints during compilation.



I wrote a related page called


. I'll note that I disfavor your exact approach for a number of reasons.

  • First, the ability in a language to semantically delay the compilation of an expression is very harmful to cross-compilers and embedded systems programming (especially reuse of library code). This is because the program will essentially need to have a compiler installed to work with off-the-shelf libraries. Somehow controlling order of evaluation in an expression would be a better solution... for example, in the above example you could have the #sin(b) essentially mean replace this with the value of executing sin(b) the moment 'sin' and 'b' are fully resolved. This would avoid the need for delayed compilation. One could still have it be lazily executed (#\sin(b)).
  • Second, the need to delay compilation of a larger expression in order to express the desire to compile part of it earlier is likely counter-productive.
  • Third, and importantly, I feel that a CompileTimeResolution technique needs to be conservative, such that it doesn't occur for dead code. One solution to this is similar to how Makefiles operate: each resolution returns a value, and the CompileTimeResolution occurs only if that value is deemed to be necessary. The conservative solution has better characteristics for code reuse and modularity because dead-code that contains a resolution will simply not execute it, and thus programmers can reuse code while automatically avoiding unnecessary side-effects.

Q: Would the resulting language be one of the



A: Not necessarily. If compilation occurs at the AST level, then it is likely to be Homoiconic. If it occurs after semantics are applied, then it needn't be Homoiconic but it does require

FirstClass decomposable

functions and types and such. If occurs prior to the AST, then it needn't be homoiconic either. That said, since we


s on


tend to like


, being homoiconic is probable. It simply isn't necessary.

Comment:homoiconicity is a consilience between a general data structure and the syntax of a language. A tree is a general phrase data structure, and also capable of storing an AST specifically. People are used to the idea that homoiconic languages are syntacticalyally rather simple, but that need not be the case.

Admirable. How do you plan to implement non-local optimizations - lets say the old boring register allocation? If the intermediate representation is written in a continuum of 'languages', how can an optimization understand enough context to be useful? -- cristipp

My plan is, that this is not done directly on the symbolic level (even though it might be possible). The idea is to provide a directed flow equation framework, that supports fixpoint calculation in the presence of incremental updates. This framework is connected to the nodes of the symbolic term graph in the following way:

  • flow equations are specified for nodes, that match rewrite rules (see GraphReWritingCompiler) like this: `reg(^a)`->`reg(^a, ^solve(live(a))` where "solve" would be the primitive function for the incremental solver and "live" be the property to solve (defined elsewhere).
  • when the term is rewritten, instead of replacing terms directly (after function application), the special solve function is used to update the corresponding flow equations incrementally and return the result when it is stable (i.e. delaying if needed until other nodes provide sufficient data for the solver).



See also


CategoryCompilers CategoryOptimization CategoryGameProgramming

Continue reading on