The long development cycle for v0.12 (nine months and counting, thelongest one to date) has given the core team and contributors ampleopportunity to introduce a number of performance optimizations.This blog post aims to cover the most notable ones.
Writable streams now support a “corked” mode, similar to the TCP_CORK and TCP_NOPUSH socket options from
When corked, data written to the stream is queued up until the streamis uncorked again. This lets Node.js combine smaller writes intolarger ones, resulting in fewer system calls and TCP roundtrips.
The http module has been updated to use corked mode transparentlywhen sending a chunked request or response body. If you look atstrace(1) output often, you will notice more writev(2) and fewerwrite(2) system calls.
The tls module has been considerably reworked in Node.js v0.12.
In Node.js v0.10, the tls module sits on top of the net module astransform stream that transparently encrypts and decrypts networktraffic. Such layering is desirable from an engineering perspectivebut it introduces overhead – more moving around of memory and manymore calls in and out of the V8 VM than are strictly necessary – andgets in the way of optimizations.
That is why in node.js v0.12, the tls module has been rewritten touse libuv directly. It now pulls incoming network traffic directlyoff the wire and decrypts it without going through intermediatelayers.
Non-scientific benchmarks using a null cipher suggest that TLS is nowgenerally 10% faster while consuming less memory. (I should notethat the reduced memory footprint may in part be the result of thereworked memory manager, another v0.12 optimization.)
(And, in case you’re wondering, a null cipher is a cipher that doesn’tencrypt the payload; they’re useful for measuring infrastructureand protocol overhead.)
Several cryptographic algorithms should now be much faster, sometimes*much* faster. A little background:
Cryptography in Node.js is implemented using the OpenSSL library.Algorithms in OpenSSL have portable reference implementations writtenin C with hand-rolled assembly versions for specific platforms and architectures.
Node.js v0.10 already uses assembly versions for some things andv0.12 greatly expands that. What’s more, AES-NI is now used whenit’s supported by the CPU, which most x86 processors produced in thelast three or four years do.
On Linux systems, if
grep ^flags /proc/cpuinfo | grep -w aes findsany matches, then your system supports AES-NI. Note that hypervisorslike VMWare or VirtualBox may hide CPU capabilities from the guestoperating system, including AES-NI.
An amusing result of enabling AES-NI is that an industrial strengthcipher such as AES128-GCM-SHA256 is now faster than a no-encryptioncipher like NULL-MD5!
A side effect of the multi-context refactoring is that it greatlyreduces the number of persistent handles in Node.js core.
A persistent handle is a strong referenceto an object on the V8heap that prevents the object from being reclaimed by the garbagecollector until the reference is removed again. (In GC speak, it’san artificial GC root.)
Node.js uses persistent handles to cache often-used values, likestrings or object prototypes. However, persistent handles need aspecial post-processing step in the garbage collector and as suchhave an overhead that scales linearly with the number of handles.
As part of the multi-context cleanup work, a great many persistenthandles have been eliminated or switched over to a more lightweightmechanism (called ‘eternal handles’; what’s in a name?)
The net effect is that your application spends less time insidethe garbage collector and more time doing useful work.Now
v8::internal::GlobalHandles::PostGarbageCollectionProcessing()should show up a great deal less in
node --prof output.
The cluster module in node.js v0.10 depends on the operating systemto distribute incoming connections evenly among the worker processes.
It turns out that on Solaris and Linux, some workloads cause veryunbalanced distributions among the workers. To mitigate that, Node.js v0.12 has switched to round-robin by default. See thisblog postfor more details.
setTimeout() and friends now use a time source that is both fasterand immune to clock skew. This optimization is enabled on allplatforms but on Linux we take it one step further and read thecurrent time directly from the VDSO, thereby greatly reducingthe number of gettimeofday(2)and clock_gettime(2)system calls.
setImmediate() and process.nextTick() also saw performance tweaksthat add fast paths for dispatch in the general case. Said functionswere already pretty fast but now they’re faster still.
StrongLoop Arc is a graphical UI for theStrongLoop API Platform, which includes LoopBack,that complements theslc command line toolsfor developing APIs quickly and getting them connected to data. Arc also includes tools for building, profiling and monitoring Node apps. It takes just a few simple steps to get started!