Summary: Your website is slow, but the backend is fast. How do you diagnose performance issues on the frontend of your site? We'll discuss everything involved in constructing a webpage and how to profile it at sub-millisecond resolution with Chrome Timeline, Google's flamegraph-for-the-browser. (4428 words/22 minutes)
Server response times, while easy to track and instrument, are ultimately a meaningless performance metric from an end-user perspective. Actual end-user response to the word 'microservices'. End-users don't care how fast your super-turbocharged bare-metal Node.js server is - they care about the page being completely loaded as fast as possible. Your boss is breathing down your neck about the site being slow - but your Elixir-based microservices architecture has average server response times of 10 nanoseconds! What's going on?
So what's a good, performance-minded full stack developer to do? How can we take our page loads from slow to ludicrous speed?
But, rather than just tell you that XYZ technique is faster than another, I'm going to show you how and why. Rather than take my word for it, you can test different frontend optimizations for yourself. To do that, we're going to need a profiling tool.
Note that most of Google's documentation on Chrome Timeline is severely out of date and shows a "waterfall" view that no longer exists in Chrome as of October 2015 (Chrome 45). This post is up-to-date as of that time.
To open Chrome Timeline, open up Chrome Developer Tools (Cmd + Alt + I on Mac) and click on the Timeline tab. You'll see a blank timeline with millisecond markings. For now, uncheck the "causes", "paint" and "memory" checkboxes on the top, and disable the FPS counter by clicking the bar graph icon. What your settings should look like These tools are mostly useful for people profiling client-side JS apps, which I won't get into here.
The Chrome Timeline records page interactions a lot like a VCR. You can click the little circular icon (the record button) at any time to turn on Timeline recording, and then click it again to stop recording. If the Timeline is open during a refresh, it will automatically record until the page has loaded.
Let's try it on http://todomvc-turbolinks.herokuapp.com/. This is a TodoMVC implementation I did for a previous blog on Turbolinks. While the Timeline is open, you can trigger a full page load with CMD + Shift + R and Chrome will automatically record the page load for you in Timeline.Be sure you're doing a hard refresh here, otherwise you may not redownload any assets.
Note that browser extensions will show up on Chrome Timeline. Any extension that alters the page may show up and make your timelines confusing. Do yourself a favor and disable all of your extensions while profiling with Chrome Timeline.
We're going to start with a walkthrough of a typical HTML page load in Timeline, and then we're going to identify what this performance profile says about our application and how we can speed it up.
Here's what my Timeline looked like:
254 ms from refresh to done - not bad for an old Rails app, eh?
The first thing you'll notice is that big chunk of idle time at the beginning. Almost nothing is happening until about 67ms after I hard-refreshed. An idle browser is the devil's workshop. What's going on there? It's a combination of server response time (on this particular app, I know it hovers around 20ms), and network latency (depending on how far you are from the US East Coast, anywhere from 10-300ms).
Even though we live in an age of mass cable and fiber optic internet, our HTTP requests still take a lot of time to go from place to place. Even at the theoretical maximum speed of an HTTP request (the speed of light), it would take a user in Singapore about 70ms to reach a server in the US. And HTTP doesn't travel at the speed of light - cable internet works about half that speed. In addition, they make as many as a dozen intermediate stops along the way along the Internet backbone. You can see these stops using
traceroute. In addition, you can get the approximate network latency to a given server by simply using
ping (that's what it was designed for!).
For example, I live in New York City. Pinging a NIST time server in Oregon, I usually can see network latency times of about 100ms Oregon? Well these packets Oregonna take a long time to get there! . That's a pretty substantial increase over the time we'd expect if the packets were traveling at the speed of light (~26ms). By comparison, my average network latency for a time server in Pennsylvania is just 20ms. And Indonesia? Packets take a whopping 364ms to make the round trip. For websites that are trying to keep page load times under 1 second, this highlights the importance of geographically distributed CDNs and mirrors.
Let's zoom in on the first event on the timeline. It seems to happen in the middle of this big idle period. You can use the mouse wheel to zoom.
The first event on the Timeline is "Receive Response". A few milliseconds later, you'll see a (tiny) "Receive Data" event. You might see one or two more miscellaneous events related to page unloading, another "Receive Data" event, and finally a "Finish Loading" event. What's going on here?
The server has started responding to your request when you see that first "Receive Response" event. You'll see several "Receive Data" events as bytes come down over the wire, completing with the "Finish Loading" event. This pattern of events will occur for any resource the page needs - images, CSS, JS, whatever. Once we've finished downloading the document, we can move on to parsing it.
"Parsing HTML" sounds like a pretty simple process, but Chrome (and any browser) actually has a lot of work to do. The browser will read the bytes of HTML off the network (or disk, if you're viewing a page on your computer), and convert those bytes into UTF-8 or whatever document encoding you've specified. Then, the browser has to "tokenize" - basically taking the long text string of the HTML and picking out each tag, like
<a>. Imagine that the browser converts the ~100kb string of HTML into an array of several strings. Me, waiting for The Verge to load Then it "lexes" these tokens (basically converts them into fancy objects) and finally constructs a DOM out of them. On complicated pages, these steps add up - on my machine, The Verge takes over 200ms just to parse the HTML. Yow.
<script src="/assets/application-0b54454ea478523c05eca86602b42d6542063387c4ee7e6d28b0ce20f5e2c86c.js" async="async" data-turbolinks-track="true"></script>
Because this script tag was marked with the
Browsers will not wait on external CSS before continuing past this step. If you think about it, this makes sense. CSS cannot modify the DOM, it can only style it and make it pretty. In order to even apply the CSS, we need to have the DOM constructed first. So the browser, smartly, simply sends the request for the CSS and moves on to the next step.
Note that this "Parse HTML" step will reoccur every time the browser has to read new HTML - for example, from an AJAX request.
The next major event you're going to see is the purple "Recalculate Styles". Unfortunately, this event covers a lot of things that actually happen during page construction. The first is the construction of the CSSOM.
As HTML is to the DOM, so CSS is to the CSSOM. Your CSS, after it's downloaded has to be converted -> tokenized -> lexed -> constructed just like the HTML was. This process is usually the cause of any "Recalculate Styles" bars you see at the beginning of the page load.
"Recalculate Styles" can also mean a lot of other confusing things are happening with your CSS, like "recursive calculation of computed styles", or whatever that means. The gist is that if you're seeing a lot of time in "Recalculate Styles", your CSS is too complicated. Try to eliminate unused or unnecessary style rules.
Why are we seeing Recalculate Styles events when the CSS hasn't even been downloaded yet? The browser is applying the browser's default CSS to the document, and it may also be applying any
style attributes present in the HTML markup itself (
display: none being a common one, present on this page).
You will probably see more purple events (Recalculate Styles and its cousin, Layout) later on in the timeline. Again, your browser does not wait for CSS to finish downloading - it's already calculating styles and layouts based on just your HTML markup and the browser defaults right now. The rendering events you see later on occur once the CSS is finished downloading.
Slightly after your first Recalculate Styles event, you should see a purple "Layout" event. Basically, at this point, your browser has all of the DOM and CSSOM in memory and needs to turn it into pixels on the screen.
The browser traverses the visible elements of the DOM (actually the render tree), and figures out each node's visibility, applicable CSS styles, and relative geometry (50% width of its parent and so on). Complicated CSS will obviously make this step longer, but so will complicated HTML.
In summary - in the "Layout" step, then, the browser is just calculating what's visible, what isn't, and where it should go on the page.
It's generally at this point that you'll see the blue bar in Timeline - this is the
async). Most browsers have not painted anything to the screen by this point.
To speed up
DomContentLoaded, you can do a few things:
asyncwhere possible. Moving script tags to the end of the document doesn't help speed up
asynctags is generally cleaner and more effective than using so-called 'async' script injection.
loadevent. This may be true, but you certainly don't want to inline all of your CSS. Also, figuring out what CSS rules you need for the above-the-fold content in this age of CSS frameworks and Bootstrap sounds like a lot of work to me. How much CSS do you need to render above-the-fold? All of it. As a rule of them, don't consider inlining all of your CSS unless you've got about 50kb or less of it. Once HTTP2 becomes more common and we can download CSS, HTML and JS over the same connection, this optimization will no longer be needed.
As we move along the timeline to the right, you should start seeing some green bars in the flamegraph. These are Paint related events. There's a whole lot that can go on in these events (and Chrome even provides profiling tools just for these painting events), but I'm not going to go too deep on them here. All you need to know is that paint events happen when the browser is done rendering (the purple bars - the process of turning your CSS and HTML into a layout) and needs to turn the layout into pixels on a screen.
The green bar in the timeline is the first paint - the first time anything is rendered to screen. Optimizing first paint is largely a matter of optimizing DOMContentLoaded and getting the stylesheet to the client as fast as possible. Any stylesheet that doesn't specify a media query (like
Keep scrolling to the right on the Timeline. Wow - see how much longer it took to get to this part? In my case, it took almost 40 ms of just waiting around to download the whole stylesheet - and this app's stylesheet isn't even that big! To be exact, we sent the request for the stylesheet at about 65ms, and it didn't come back until 101ms. In reality, this actually extremely fast (in a real app, you would expect that to be more like 200-350ms at least), and we can't really optimize that much further. I'm in NYC and Heroku is in Virginia, so most of that time is network latency anyway.
Once the stylesheet is downloaded, it's parsed. You'll see another cycle of purple events (as the CSSOM is re-calculated, we re-render the layout) and green events (now that the layout is updated, we render the result to the screen).
The stylesheet for this app is extremely simple, and my app appears to be wasting about 30ms waiting for the CSS to download. It may be worth investigating the performance impact of inlining the entire stylesheet in the HEAD of this page. Most sites won't benefit from this optimization (see my bit about this above), but because this app is idling for about 20ms waiting for the styles to download, we may want to eliminate that network round-trip.
Finally, you should see the
Once all of those callbacks attached to
load have completed, you'll see the red bar, which signifies the end of the
load. This is generally when the page is "ready" and finished loading. Finally!
So, you've got a site that takes 5-10 seconds to get to the
load event. How can you use Timeline to profile it and find the performance hotspots?
load. Get an idea of where you spend most of your time - is it in parsing (loading), scripting or rendering and painting? Is it idle time?
asynctags to these scripts to get them off the rendering critical path, even if the vendor proudly claims the script is already "async!". Try to look at the call stacks and figure out where you're spending most of your time.
asyncscript tag may cause a "flash of unstyled content" where part of the page would look one way and then suddenly flash into a different styling. That isn't acceptable, so we're stuck with Optimize.ly's slow and painful re-layouts here.
Do these things to make your pages load faster.
srcattribute), so you may need to drop things like Mixpanel's script into a remote file you host yourself (in Rails, you might put it into
application.jsfor example) and then make sure that remote script has an
asyncon external scripts takes them off the blocking render path, so the page will render without waiting for these scripts to finish evaluating.
asynctag, external CSS must go first. External CSS doesn't block further processing of the page, unlike external JS. We want to send off all of our requests before we wait on remote JS to load.
loadevent fires - if it's long and deep, you need to investigate how you can tie fewer events to the document being ready. Can you attach your handlers to
I'm Nate Berkopec (@nateberkopec). I write online about web performance from a full-stack developer's perspective. I primarily write about frontend performance and Ruby backends. If you liked this article and want to hear about the next one, click below. I don't spam - you'll receive about 1 email per week. It's all low-key, straight from me.
I've authored an in-depth course for making Ruby and Rails applications faster. The Complete Guide to Rails Performance is a full-stack course that gives you the tools to make Ruby on Rails applications faster, more scalable, and simpler to maintain. It includes a 361 page PDF and over 15 hours of video content.Learn more at railsspeed.com.