The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or clickhereto continue anyway

learn: Ancient troff sources vs. modern-day groff | Virtually Fun

(This is a guest post by xorhash.)

Introduction

I’ve been on a trip on the memory lane lately, digging around old manuals of UNIX operating system before BSD. In doing so, I’ve come across the sources for the 7th Edition manuals. I wanted to show one part of volume 2A to other people, but didn’t want to make them download the entire 336 pages of volume 2A for the part in question. The part I wanted to extract was LEARN Computer-Aided Instruction on UNIX, starting at p. 107 in the volume 2A PDF file).

A normal person would, I presume, try to split the PDF file. That is straightforward and produces the expected results. I believe I needn’t state that you wouldn’t be reading this if I solved this problem like any sane person would. Instead, I opted to rebuild the PDF from the troff sources provided at the link above.

I am not a very clever man, and thus I completely disregarded the generation procedure that was already spelled out. However, it wasn’t exactly specific anyway, so I didn’t miss out on much.

Getting the sources

So I knew what I needed to do: Get the troff sources. I asked that the Heavens have mercy on my poor soul if this requires a lot of adjustment for 2017 text processing tools. However, a man must do what a man must do. The file in question was called vol2/learn.bun. I had no idea what a bun file is, hoped it wasn’t related to steamed buns and clicked it. As it turns out, it’s just what we would call a self-extracting archive today. The shell commands are not very weird, so the extraction process actually worked out just fine. Now I had files p0 through p7. Except what happened to p1, the world will never know.

First Steps

I’ve dabbled in man pages before, but that was mostly mandoc, not actual troff. Accordingly, the first attempt at getting something going was as naive as it could get: $ groff -Tpdf p* | zathura - It led to, shall we say, varying results.

really butchered rendering attempt

Clearly, I was doing something very fundamentally wrong. Conveniently, volume 2A also had a lot of troff documentation. Apparently I was supposed to pass -ms and first run tbl(1) over the troff source before actually giving it to groff. That sounded like a good idea, but the results were still somewhat off:

not very butchered rendering attempt

Allow me to express my doubts that this text was written in 2017. If you compare the output with the known-good PDF, you’ll also notice that, somehow, BellLaboratories, MurrayHill, NewJersey07974 turned into CAI. Unfortunate.

Back to Square One and Pick Up the Breadcrumbs

Continuing to read the page I got the learn.bun from, I also spied a section called Macros and References. That sounds relevant to my interests. tmac.s, which after studying groff(1) seems to be what would get used with -ms references some files in /usr/lib/tmac. I was not in the mood to let this flood over into my system, so I had to make minor adjustments and turn it into relative paths. I also renamed tmac.s to tmac.os to avoid colliding with the one provided by groff, making the new invocation:

$ tbl p* | groff -M./macros -mos -Tpdf | zathura -

Now we’re getting somewhere:

almost not butchered rendering attempt

It’s better than the previous attempts. But there are also some warnings and problems that need cleaning up:

  1. There’s a note that Bell Laboratories holds the UNIX trademark, which is no longer true.
  2. Now, this most certainly was not written in December21, 19117, either.
  3. tmac.os:806: warning: numeric expression expected (got `\')
  4. Every time the .UX macro was requested, I got: warning: macro `ev1' not defined (possibly missing space after `ev') environment stack underflow

Point 1 was easy to address, it’s a simple text change. Point 2 was caused by spurious dots in front of a call to .ND. However, the actual volume 2A PDF said a different date than in the file, so I adjusted that to match (June 18, 1976 to January 30, 1979).

And Down the Slippery Slope

As for points 3 and 4… Let’s just say groff/troff macros are definitely not meant to be written or read by humans and it’s a feat comparable to magic that someone wrote this set of troff macros. Line 806 is .ch FO \\n(YYu. Supposedly, that changes the location of a page trap when the given macro is invoked. The second argument is meant to be a distance, which explains why groff is complaining. I tried to checked what groff does and left none the wiser. FO seems related to the page footer, I seemed to get away with just deleting that line, though.

Finally, point 4. Apparently, .ev1 was used multiple times in thetmac.os. This looked like it should’ve been .ev 1 instead. Changing those, lo and behold, .UX stopped behaving funky for the most part. Yet for some reason, I’d still get multiple footnotes about the trademark ownership of the UNIX trademark.tmac.os sets a troff register (GA) when the .UX macro is first encountered so that the footnote is only made once. The footnote is being made twice. Something does not add up here..AI (author’s institution) resets GA, but the first.UX comes after .AI, so that’s not the problem. Removing the .AB/.AE macros from page 1 caused only one footnote to be made. Thus, I infer it’s actually intended behavior that the footnote is made once for the abstract and once for the main body. Checking with the volume 2A PDF again, I realized that point 4 was, in fact, fixed just by the ev1 changes and I was just chasing a bug that does not exist. I really should’ve checked the PDF twice.

The abstract finally looks okay.

good rendering attempt

Done! Wait, No, Almost

Okay, we’re done, we can go home, right? Almost, one last thing to do: On the last page, there’s something really important missing: the bibliography. Instead, there’s just $LIST$ there. We can’t just turn Brian W.Kernighan and Michael E.Lesk into plagiarists!

Back to the troff documentation in volume 2A, there’s a match for $LIST$ on p. 183. Apparently I need a reference file and preprocess the file with refer(1). That sounds simple enough. Fortunately, I got the reference file along with the macros above, so I didn’t have to look for that separately.

$ refer -pRv7man -e p* | tbl | groff -M./macros -mos -Tpdf | zathura -

half of the references are blank

Of course. Why would it work? That’d have been too much to ask for. At least I get some nice hints:

refer:p2:148: no matches for `skinner teaching 1961' refer:p3:114: no matches for `kernighan editor tutorial 1974'

The troff documentation conveniently explains the format for the reference file, so I could just add these two entries to Rv7man and be done with it. Thankfully, the pre-compiled PDF of the volume 2A manual had the information necessary to compile the bibliography entries with.

%T Why We Need Teaching Machines
%A B. F. Skinner
%J Harvard Educational Review
%V 31
%P 377-398
%D 1961

%T A Tutorial Introduction to the Unix Editor ed
%A B. W. Kernighan
%D 1974

now that’s what I call a bibliography

And of course, here isthe product of this whole ordeal.

Closing Remarks

The Heavens were feeling somewhat merciful, but only just enough that I could waste no more than a day on this project. They really wanted me to spend that day on it, though.

On a side note, the missing learn references aren’t available from the link that was provided.http://cm.bell-labs.com/cm/cs/who/bwk/learn.tar.gzis now down, though the web archive still has it. Needless to say, I didn’t read that.

I will never, ever touch troff/groff again. mandoc is good at what it does and I’ll stick to mandoc for writing man pages. But if I ever need to get something typeset nicely from plain text?

LaTeX is the answer. Not troff. Never troff. Not even once.

UNIX is a registered trademark of The Open Group.

Continue reading on virtuallyfun.com