Code doesnt live in a vacuum, and documentation shouldnt either. Today were releasing the largest update to our indexing system since launch a nearly complete re-write of our code analysis pipeline that allows us to accurately infer cross references and make symbol resolutions that no other Ruby tool can match.
RDoc and YARD only consider the code within a single gem, so in order to cross-reference documentation for all gems, we need to build the full object graph linking all ruby code. To build the graph, we parse the classes/methods/modules for every ruby gem with all their relationships and serialize that data into a normalized, compressed and clustered 25GB table. It takes a few thousand machine hours to parse all the ruby code, so we built a cluster to handle the job.
Concurrently building all the docs gets more complex when you add shared dependency resolution. For example if 2 workers are building ActiveModel and ActionPack, which both depend on ActiveSupport (like 10,298 other gems), one will build the dependency and the other requeues its job and finds a new gem to work on in the meantime. In addition, there’s the usual subtlety to avoid deadlocks and races since the winner of a race winds up with broken links. There’s also fun with circular dependencies, and then you have to realize if a pathological gem is ever going to finish parsing, so you can decide what to do with its dependencies.
Ruby has a relatively complex grammar, so traversing the object graph efficiently makes for an interesting relational algebra exercise. Instead, we efficiently deserialize the relevant piece of the graph for any given gem, and traverse it in memory to check if every word in documentation is in fact a symbol reference (within that specific piece of documentations local context of course).
As our CS professors used to rant, there’s no such thing as a compiled or interpreted language, only implementations. It sounds strange to say, but we now have a (limited) implementation of a distributed ruby compiler and linker.
We’ve started sifting through the full web of ruby code, to find signals for the most important pieces, and wanted to share a few of the early insights we’ll be rolling into better search quality.
It looks like we’re not the only ones who’ve gotten used to all of activesupport’s goodies regardless of which project we’re working on. It’s being used 20% more than rails. It’s also interesting to see that activerecord 3.0 is still the most popular dependency of other gems, over 3.1, 3.2 or 4.0. It’s also pretty clear that ruby works well for data transformation projects, good libraries like json and nokogiri make it so much less of a chore than it used to be.
I loved DataMapper, and it’s still a surprisingly popular ORM. It’s been EOL for a few years now, so that may have helped concentrate all dependencies on the last version available. This list is also dominated by web dev staples like XML parsing and networking.
It goes without saying that
Object comes out way ahead, but it’s interesting to
Exception classes in ruby are getting heavy usage. Also, just about every
major version of
ActiveRecord::Base shows up on this list sooner rather than
later, but overall usage is broadly distributed among the different versions.
Ruby is popular for its dynamic programming, so it’s no surprise that
method_missing is #1 and
respond_to? makes an appearance on this list.
It’s also good to see people writing plenty of tests for the insanity that too
much meta programming can create. When tests fail, it looks like plenty of
people fall back to printf debugging with
Thanks to this work, we’ve added some great new stuff to Omniref: for starters,
you’ll now see the full ancestry of a class listed in the heading (e.g. check out
Slim::Interpolation to see that it eventually inherits from
Temple::HTML::Filter in the temple gem,
Object defined in the standard library.)
We can also take things a step further, and now inline documentation from one
library into others where it’s relevant. For example,
modules from it’s sister gem,
ActiveModel::Conversion, which are defined in a different gem,
but you don’t need to worry about that — We’ve rendered the full public API on a
single page so you won’t have to hunt for the relevant docs.