Welcome to the third post in the Windows Command-Line series. In this post, we'll start to dig into the internals of the Windows Console and Command-Line, what it is, what it does ... and what it doesn't do.
[Updated 2018-07-20 to improve readability and clarify some Unicode/UTF-x details]
During the initial development of Windows NT, circa 1989, there was no GUI, there was no desktop, there was ONLY a full-screen command-line, that visually resembled MS-DOS more than it did the future. When the Windows GUIs implementation started to arrive, the team needed a Console GUI app and thus, the Windows Console was born. Windows Console is one of the first Windows NT GUI apps, and is certainly one of the oldest Windows apps still in general use.
The Windows Console code-base is currently (July 2018) almost 30 years old ... older, in fact, than the developers who now work on it!
As we learned in our previous posts, a Terminal's job is relatively simple:
However, the Windows Console does things a little differently:
Windows Console is a traditional Win32 executable and, though it was originally written in 'C', much of the code is being migrated to modern C++ as the team modernizes and modularizes Console's codebase.
For those who care about such things: Many have asked whether Windows is written in C or C++. The answer is that - despite NT's Object-Based design - like most OS', Windows is almost entirely written in 'C'. Why? C++ introduces a cost in terms of memory footprint, and code execution overhead. Even today, the hidden costs of code written in C++ can be surprising, but back in the late 1990's, when memory cost ~$60/MB (yes $60 per MEGABYTE!), the hidden memory cost of vtables etc. was significant. In addition, the cost of virtual-method call indirection and object-dereferencing could result in very significant performance & scale penalties for C++ code at that time. While one still needs to be careful, the performance overhead of modern C++ on modern computers is much less of a concern, and is often an acceptable trade-off considering its security, readability, and maintainability benefits ... which is why we're steadily upgrading the Consoles code to modern C++.
Before Windows 7, Windows Console instances were hosted in the crucial Client Server Runtime Subsystem (CSRSS). In Windows 7, however, Console was extracted from CSRSS for security and reliability reasons, and given a new home in the following binaries:
A high-level view of Console's current internal architecture looks like this:
The core components of the Console consist of the following (from the bottom-up):
As can be seen in the Console architecture above, unlike NIX terminals, the Console sends/receives API calls and/or data serialized into IO Control (IOCTL) messages, not serialized text. Even ANSI/VT sequences embedded in text received from (primarily Linux) Command-Line apps is extracted, parsed and converted into API calls. This difference exposes the key fundamental philosophical difference between *NIX and Windows: In *NIX, "everything is a file", whereas, in Windows, "everything is an object".
There are pros and cons to both approaches, which we'll outline, but avoid debating at length here. Just remember that this key difference in philosophy is fundamental tomany of the differences between Windows and *NIX!
When Unix was first implemented in the late 1960's and early 1970's, one of the core tenets was that (wherever possible) everything should be abstracted as a file stream. One of the key goals was to simplify the code required to access devices and peripherals: If all devices presented themselves to the OS as file-systems, then existing code could access those devices more easily. This philosophy runs deep: One can even navigate and interrogate a great deal of a *NIX-based OS & machine configuration by navigating pseudo/virtual file-systems which expose what appear to be "files" and folders, but actually represent machine configuration, and hardware. For example, in Linux, one can explore a machine's processors' properties by examining the contents of the
The simplicity and consistency of this model can, however, come at a cost: Extracting/interrogating specific information from text in pseudo files, and returned from executing commands often requires tools, e.g. sed, awk, perl, python, etc. These tools are used to write commands and scripts to parse the text content, looking for specific patterns, fields, and values. Some of these scripts can get quite complex, are often difficult to maintain, and can be fragile - if the structure, layout, and/or format of the text changes, many scripts will likely need to be updated.
When Windows NT was being designed & built, "Objects" were seen as the future of software design: "Object Oriented" languages were emerging faster than rabbits from a burrow - Simula and Smalltalk were already established, and C++ was becoming popular. Other Object-Oriented languages like Python, Eiffel, Objective-C, ObjectPascal/Delphi, Java, C#, and many others followed in rapid succession.
Inevitably, having been forged during those heady, Object-Oriented days (circa 1989), Windows NT was designed with a philosophy that "everything is an object". In fact, one of the most important parts of the NT Kernel is the "Object Manager"!
Developers use Windows' Win32 API to access and manipulate objects and structures that provide access to similar information provided by *NIX pseudo files and tools. And because parsers, compilers, and analyzers understand the structure of objects, many coding errors can often be caught earlier, helping verify that the programmer's intent is syntactically and logically correct. This can also result in less breakage, volatility, and "churn" over time.
So, coming back to our central discussion about Windows Console: The NT team decided to build a "Console" which differentiated itself from a traditional *NIX terminal in a couple of key areas:
While the Console's API has proven very popular in the world of Windows Command-Line tools and services, the API-centric model presents some challenges for Command-Line scenarios:
Many Windows Command-Line tools and apps make extensive use of the Console API.
The problem? These APIs only work on Windows. Thus, combined with other differentiating factors (e.g. process lifecycle differences, etc.), Windows Command-Line apps are not always easily-portable to *NIX, and vice-versa.
Because of this, the Windows ecosystem has developed its own, often similar, but usually different Command-Line tools and apps. This means that users have to learn one set of Command-Line apps and tools, shells, scripting languages, etc. when using Windows, and another when using *NIX.
There is no simple quick-fix for this issue: The Windows Console and Command-Line cannot simply be thrown away and replaced by bash and iTerm2 because there are hundreds of millions of apps, scripts, and tools that depend upon the Windows Console and Cmd/PowerShell shells, many of which are launched billions of times a day on Windows PC's and Servers around the globe.
So, what's the solution here? How do developers run command-line tools, compilers, platforms, etc. originally built primarily on/for *NIX based platforms?
3rd party tools like MinGW/MSYS and Cygwin do a great job of porting many of the core GNU tools and compatibility libraries to Windows, but they are not able to run un-ported, unmodified Linux binaries. This turns out to be an essential requirement, becuase many Ruby, Python, Node, etc. packages and modules depend-upon Linux behaviors and/or or "wrap" Linux binaries.
These reasons led Microsoft to enable genuine, unmodified Linux binaries and tools to run natively on Windows' Subsystem for Linux (WSL).
Using WSL, users can now download and install one or more genuine Linux distros side-by-side on the same machine, and use each distros' or tools' package manager (e.g. apt, zypper, npm, gem, etc.) to install and run the vast majority of Linux Command-Line tools, packages, and modules alongside their favorite Windows apps and tools. To learn more about WSL, visit the WSL Learning Page, or the official WSL documentation.
Also, there are still some things that Console offers that haven't been adopted by non-Microsoft terminals: Specifically, the Windows Console provides command-history and command-alias services, which aimed to eliminate the need for every command-line shells (in particular) to re-re-re-implement the same functionality. We'll return to this subject in the future.
As we discussed in the Command-Line Backgrounder post, Terminals were originally separate from the computer to which they were attached. Fast-forward to today, this design remains: Most modern terminals and Command-Line apps/shells/etc. are separated by processes and/or machine boundaries.
On *NIX-based platforms, the notion thatterminals and command-line applications are separate and simply exchange characters, has resulted in *NIX Command-Lines being easy to access and operate from a remote computer/device: As long as a terminal and a Command-Line application can exchange streams of characters via a some type of ordered serial communications infrastructure (TTY/PTY/etc.), it is pretty trivial to remotely operate a *NIX machine's Command-Line.
On Windows however, many Command-Line applications depend on calling Console API's, and assume that they're running on the same machine as the Console itself.This makes it difficult to remotely operate Windows Command-Line shells/tools/etc.: How does a Command-Line application running on a remote machine call API's on the user's local machine's Console? And worse, how does the remote Command-Line app call Console API's if its being accessed via a terminal on a Mac or Linux box?!
Sorry to tease, but we'll return to this subject in much more detail in a future post!
Generally, on *NIX based systems, when a user wants to launch a Command-Line tool, they first launch a Terminal. The Terminal then starts a default shell, or can be configured to launch a specific app/tool. The Terminal and Command-Line app communicate by exchanging streams of characters via a Pseudo TTY (PTY) until one or both are terminated.
On Windows, however, things work differently: Windows users never launch the Console (conhost.exe) itself: Users launch Command-Line shells and apps, not the Console itself!
Yes, in Windows, users launch the Command-Line app, NOT the Console itself. If a user launches a Command-Line app from an existing Command-Line shell, Windows will (usually) attach the newly launched Command-Line .exe to the current Console. Otherwise, Windows will spin up a new Console instance and attach it to the newly launched app.
Because users run
PowerShell.exe and see a Console window appear, they labor under the common misunderstanding that Cmd and PowerShell are, themselves, "Consoles" ... they're not! Cmd.exe and PowerShell.exe are "headless" Command-Line applications that need to be attached to a Console (
conhost.exe) instance from which they receive user input and to which they emit text output to be displayed to the user.
Also, many people say "Command-Line apps run in the Console". This is misleading and contributes additional confusion about how Consoles and Command-Line apps actually work!
Please help correct this misconception if you hear it by pointing out that "Command-Line tools/apps run connected to a Console" (or similar). Thanks!
Okay, so, Windows Command-Line apps run in their own processes, connected to a Console instance running in a separate process. This is just like in *NIX where Command-Line applications run connected to Terminal apps. Sounds good, right? Well ... no; there are some problems here because Console does things a little differently:
These are significant limitations, especially the latter point. Why? What if you wanted to create an alternate Console app for Windows? How would you send keyboard/mouse/pen/etc. user actions to the Command-Line app if you couldn't access the communications "pipes" connecting your new Console to the Command-Line app?
Alas, the story here is not a good one: There ARE some great 3rd party Consoles (and server apps) for Windows (e.g. ConEmu/Cmder, Console2/ConsoleZ, Hyper, Visual Studio Code, OpenSSH, etc.), but they have to jump through extraordinary hoops to act like a normal Console would.
For example, 3rd party Consoles have to launch a Command-Line app off-screen at, for example, (-32000,-32000). They then have to send keystrokes to the off-screen Console, and screen-scrape the off-screen Console's text contents and re-draw them on their own UI! I know, crazy, right?! It's a testament to the ingenuity and determination of the creators of these apps that they even work at all.
This is clearly a situation we are keen to remedy. Stay tuned for more info on this part of the story too - there's some good news on the way.
As discussed above, Windows Console provides a rich API. Using the Console API, Command-Line apps and tools write text, change text colors, move the cursor, etc. And, because of the Console API, Windows Console had little need to support ANSI/VT sequences that provide very similar functionality on other platforms. In fact, until Windows 10, Windows Console only implemented the bare minimum support for ANSI/VT sequences:
This all started to change in 2014, when Microsoft formed a new Windows Console team dedicated to untangling and improving the Console & Windows' Command-Line infrastructure.
One of the new Console team's highest priorities was to implement comprehensive support for ANSI/VT sequences in order to render the output of *NIX applications running on Windows Subsystem for Linux (WSL), and on remote *NIX machines. You can read a little more about this story in the previous post in this series.
The Console team added comprehensive support for ANSI/VT sequences to Windows 10's Console, enabling users to use and enjoy a huge array of Windows and Linux Command-Line tools and apps. The team continues to improve and refine Console's VT support with each OS release, and are grateful for any issues you file on our GitHub issues tracker
A quick Unicode refresher: Unicode or ISO/IEC 10646 is an international standard defining every character/glyph used in almost every writing system on Earth, plus many non-script symbols and character-sized images (e.g. emoji) in use today. At present (July 2018), Unicode 11 defines 137439 characters, across 146 modern and historic scripts! Unicode also defines several character encodings, including UTF-8, UTF-16, and UTF-32:
The most popular encoding today, thanks to its efficient storage requirements, and widespread use in HTML pages, is UTF-8. UTF-16/UCS-2 are both common, though decreasingly so in stored documents (e.g. web pages, code, etc.). UTF-32 is rarely used due to its inefficient and considerable storage requirements. Great, so we have effective and efficient ways to represent and store Unicode characters!
Alas, the Windows Console and its API were created before Unicode was created. The Windows Console stores text (that is subsequently drawn on the screen) as UCS-2 characters requiring 2-bytes per cell. Command-Line apps write text to the Console using the Console API. Many Console APIs come in two flavors - functions with an
A suffix handle single-byte/character strings, and functions with a
W suffix handle 2-byte (wchar)/character strings: For example, the WriteConsoleOutputCharacter() function compiles down to
WriteConsoleOutputCharacterA() for ASCII projects, or
WriteConsoleOutputCharacterW() for Unicode projects. Code can specifically call
...W suffixed functions directly if specific handling is required.
However, while all W APIs support UCS-2, and some were updated to also support UTF-16, not all W APIs fully support UTF-16.
Also, Console doesn't support some newer Unicode features including Zero Width Joiners (ZWJ) which are used to combine otherwise separate characters in, for example, Arabic and Indic scripts, and are even used to combine several emoji characters into one visual glyph like the "people" emoji, and ninjacats.
Worse still, the Console's current text renderer can't even draw these complex glyphs, even if the buffer could store them: Console currently uses GDI for text rendering, but GDI doesn't adequately support font-fallback - a mechanism to dynamically find and load an alternative font that contains a glyph missing from the current font. Font-fallback is well supported by more modern text rendering engines like DirectWrite
So what happens if you wanted to write complex and conjoined glyphs onto the Console? Sadly, you can't ... yet, but this too is a post for another time.
Once again, dear reader, if you've read everything above, thank you, and congratulations - you now know more about the Windows Console than most of your friends, and likely more than even you wanted to! Lucky you
We've covered covered a lot of ground in this post:
In the next few posts in this series, we'll delve further into the Console, and discuss how we're addressing these issues ... and more! As always, stay tuned [Many thanks to my colleagues on the Console team for helping keep this post accurate and balanced - Michael, Mike, Dustin and Austin - y'all rock! ]