(This is a guest post by xorhash.)
I’ve been on a trip on the memory lane lately, digging around old manuals of UNIX operating system before BSD. In doing so, I’ve come across the sources for the 7th Edition manuals. I wanted to show one part of volume 2A to other people, but didn’t want to make them download the entire 336 pages of volume 2A for the part in question. The part I wanted to extract was LEARN Computer-Aided Instruction on UNIX, starting at p. 107 in the volume 2A PDF file).
A normal person would, I presume, try to split the PDF file. That is straightforward and produces the expected results. I believe I needn’t state that you wouldn’t be reading this if I solved this problem like any sane person would. Instead, I opted to rebuild the PDF from the troff sources provided at the link above.
I am not a very clever man, and thus I completely disregarded the generation procedure that was already spelled out. However, it wasn’t exactly specific anyway, so I didn’t miss out on much.
So I knew what I needed to do: Get the troff sources. I asked that the Heavens have mercy on my poor soul if this requires a lot of adjustment for 2017 text processing tools. However, a man must do what a man must do. The file in question was called vol2/learn.bun. I had no idea what a bun file is, hoped it wasn’t related to steamed buns and clicked it. As it turns out, it’s just what we would call a self-extracting archive today. The shell commands are not very weird, so the extraction process actually worked out just fine. Now I had files
p7. Except what happened to
p1, the world will never know.
I’ve dabbled in man pages before, but that was mostly mandoc, not actual troff.
Accordingly, the first attempt at getting something going was as naive as it could get:
$ groff -Tpdf p* | zathura -
It led to, shall we say, varying results.
really butchered rendering attempt
Clearly, I was doing something very fundamentally wrong. Conveniently, volume 2A also had a lot of troff documentation. Apparently I was supposed to pass
-ms and first run tbl(1) over the troff source before actually giving it to groff. That sounded like a good idea, but the results were still somewhat off:
not very butchered rendering attempt
Allow me to express my doubts that this text was written in 2017. If you compare the output with the known-good PDF, you’ll also notice that, somehow, BellLaboratories, MurrayHill, NewJersey07974 turned into CAI. Unfortunate.
Continuing to read the page I got the
learn.bun from, I also spied a section called Macros and References. That sounds relevant to my interests.
tmac.s, which after studying groff(1) seems to be what would get used with
-ms references some files in
/usr/lib/tmac. I was not in the mood to let this flood over into my system, so I had to make minor adjustments and turn it into relative paths. I also renamed
tmac.os to avoid colliding with the one provided by groff, making the new invocation:
$ tbl p* | groff -M./macros -mos -Tpdf | zathura -
Now we’re getting somewhere:
almost not butchered rendering attempt
It’s better than the previous attempts. But there are also some warnings and problems that need cleaning up:
tmac.os:806: warning: numeric expression expected (got `\')
.UXmacro was requested, I got:
warning: macro `ev1' not defined (possibly missing space after `ev') environment stack underflow
Point 1 was easy to address, it’s a simple text change. Point 2 was caused by spurious dots in front of a call to
.ND. However, the actual volume 2A PDF said a different date than in the file, so I adjusted that to match (June 18, 1976 to January 30, 1979).
As for points 3 and 4… Let’s just say groff/troff macros are definitely not meant to be written or read by humans and it’s a feat comparable to magic that someone wrote this set of troff macros. Line 806 is
.ch FO \\n(YYu. Supposedly, that changes the location of a page trap when the given macro is invoked. The second argument is meant to be a distance, which explains why groff is complaining. I tried to checked what groff does and left none the wiser. FO seems related to the page footer, I seemed to get away with just deleting that line, though.
Finally, point 4. Apparently,
.ev1 was used multiple times in the
tmac.os. This looked like it should’ve been
.ev 1 instead. Changing those, lo and behold,
.UX stopped behaving funky for the most part. Yet for some reason, I’d still get multiple footnotes about the trademark ownership of the UNIX trademark.
tmac.os sets a troff register (
GA) when the
.UX macro is first encountered so that the footnote is only made once. The footnote is being made twice. Something does not add up here.
.AI (author’s institution) resets
GA, but the first
.UX comes after
.AI, so that’s not the problem. Removing the
.AE macros from page 1 caused only one footnote to be made. Thus, I infer it’s actually intended behavior that the footnote is made once for the abstract and once for the main body. Checking with the volume 2A PDF again, I realized that point 4 was, in fact, fixed just by the
ev1 changes and I was just chasing a bug that does not exist. I really should’ve checked the PDF twice.
The abstract finally looks okay.
good rendering attempt
Okay, we’re done, we can go home, right? Almost, one last thing to do: On the last page, there’s something really important missing: the bibliography. Instead, there’s just $LIST$ there. We can’t just turn Brian W.Kernighan and Michael E.Lesk into plagiarists!
Back to the troff documentation in volume 2A, there’s a match for $LIST$ on p. 183. Apparently I need a reference file and preprocess the file with refer(1). That sounds simple enough. Fortunately, I got the reference file along with the macros above, so I didn’t have to look for that separately.
$ refer -pRv7man -e p* | tbl | groff -M./macros -mos -Tpdf | zathura -
half of the references are blank
Of course. Why would it work? That’d have been too much to ask for. At least I get some nice hints:
refer:p2:148: no matches for `skinner teaching 1961'
refer:p3:114: no matches for `kernighan editor tutorial 1974'
The troff documentation conveniently explains the format for the reference file, so I could just add these two entries to
Rv7man and be done with it. Thankfully, the pre-compiled PDF of the volume 2A manual had the information necessary to compile the bibliography entries with.
%T Why We Need Teaching Machines %A B. F. Skinner %J Harvard Educational Review %V 31 %P 377-398 %D 1961 %T A Tutorial Introduction to the Unix Editor ed %A B. W. Kernighan %D 1974
now that’s what I call a bibliography
And of course, here isthe product of this whole ordeal.
The Heavens were feeling somewhat merciful, but only just enough that I could waste no more than a day on this project. They really wanted me to spend that day on it, though.
On a side note, the missing learn references aren’t available from the link that was provided.http://cm.bell-labs.com/cm/cs/who/bwk/learn.tar.gzis now down, though the web archive still has it. Needless to say, I didn’t read that.
I will never, ever touch troff/groff again. mandoc is good at what it does and I’ll stick to mandoc for writing man pages. But if I ever need to get something typeset nicely from plain text?
LaTeX is the answer. Not troff. Never troff. Not even once.
UNIX is a registered trademark of The Open Group.