Type Book
Date 2007
Pages 1009
Tags computer science, compilers

Compilers: Principles, Techniques, and Tools, Second Edition

Chapter 1: Introduction

1.1: Language Processors

  • Exercise 1.1.1: What is the difference between a compiler and an interpreter?
    • A compiler translates a source language into a target language, perhaps executable machine code. An interpreter executes the source language directly, without a separate compilation step (though there may be a kind of compilation happening under the hood).
  • Exercise 1.1.2: What are the advantages of (a) a compiler over an interpreter (b) an interpreter over a compiler?
    • (a) A compiler may produce a faster program.
    • (b) An interpreter may produce better error messages.
  • Exercise 1.1.3: What advantages are there to a language-processing system in which the compiler produces assembly language rather than machine language?
    • Assembly code is easier to produce and debug.
  • Exercise 1.1.4: A compiler that translates a high-level language into another high-level language is called a source-to-source translator. What advantages are there to using C as a target language for a compiler?
    • C has mature tooling that can produce optimized code and facilitate debugging, which may not exist for the source language.
  • Exercise 1.1.5: Describe some of the basic tasks that an assembler needs to perform.
    • At the most basic level, the assembler must translate mnemonics into the appropriate machine-dependent representation. Additionally, it must expand assembler macros, resolve labels, and other such tasks.

1.2: The Structure of a Compiler

A model of the phases of compilation:

  1. character stream -> Lexical Analyzer -> token stream
  2. token stream -> Syntax Analyzer -> syntax tree
  3. syntax tree -> Semantic Analyzer -> syntax tree
  4. syntax tree -> Intermediate Code Generator -> intermediate representation
  5. intermediate representation -> Machine-Independent Code Optimizer -> intermediate representation
  6. intermediate representation -> Code Generator -> target-machine code
  7. target-machine code -> Machine-Dependent Code Optimizer -> target-machine code

In the lexical analysis or scanning phase, the compiler translates the stream of input characters into a sequence of lexemes. For each lexeme, the lexer produces a token like (token-name, attribute-value).

In the syntax analysis or parsing phase, the compiler translates the stream of tokens into a syntax tree.

The semantic analyzer checks the program for semantic consistency. This includes type checking.

Compilers often generate a machine-language-like intermediate representation after semantic analysis.

1.5: Applications of Compiler Technology

Parallelism

Parallelism refers to the computer doing more than one thing simultaneously.

In instruction-level parallelism, the processor executes multiple operations simultaneously. This may be accomplished by the hardware determining that certain operations do not depend on one another, and therefore may be executed out of order without changing the result. From the programmer's point of view, all instructions are executed in sequence.

Very Long Instruction Word (VLIW) architectures such as Intel IA64 include explicit instruction that operate in parallel. For example, they might have instructions that operate on an entire vector of data simultaneously.

At the processor level, multiple threads of the same application may run on different processors. Such multithreading may be written explicitly by the programmer or else generated by the compiler.

Memory hierarchies

Memory hierarchies refers to the multiple different types of memory available, each with its own size and speed. For example, registers, processor caches, physical memory, and finally secondary storage such as hard drives. Optimizing memory accesses involves satisfying as many memory accesses as possible from the fastest type of memory.

RISC and CISC

The Reduced Instruction-Set Computer (RISC) architectures supports fewer, simpler instructions, which are easier for compilers to optimize for. By contrast, Complex Instruction-Set Computer (CISC) architectures have more complex instructions, but it may be more efficient for compilers to decompose complex actions into sequences of simple instructions than to use the more complex ones. Modern architectures are all influenced by RISC.

1.6: Programming Language Basics

Scope

If something in a language may be decided at compile time, then it is said to be static. If it can only be decided at runtime, then it is said to be dynamic.

A language uses static scope or lexical scope if the scope of a declaration can be decided by looking at the program code. Otherwise, the language uses dynamic scope, and a variable might at runtime refer to one of several declarations of the variable.

Dynamic scope resolution is needed for polymorphic procedures, unless the language can determine the type of arguments at compile time.

Terminology: name, variable, identifier

In this book, name and variable are distinguished: variable refers to the run-time location denoted by compile-time names. An identifier like x or y is a string that may refer to some entity. A qualified name like x.y is a name, but not an identifier.

1.6.6: Parameter Passing Mechanisms

The actual parameters of a procedure are those actually used in the call of the procedure. By contrast, the formal parameters of a procedure are those used in the definition of the procedure.

In call-by-value, the callee receives a copy of the value of its parameter, and typically cannot modify the source of that value. However, if a pointer is passed, the callee may dereference the pointer and modify the value in this way. This is also the case in C, C++, or Java when an array name is passed to a procedure: a pointer to the beginning of the array is the actual value passed.

In call-by-reference, callees receive a pointer to the value of the parameter.

In call-by-name, used in Algol 60, the callee acts as though the text of parameter value were used in place of the parameter's usage, as though the parameter were a macro substituting its value in. This is not used today.

Chapter 2: A Simple Syntax-Directed Translator

Name Role
Alfred V. Aho Author
Jeffrey D. Ullman Author
Monica S. Lam Author
Pearson Education, Inc. Publisher
Ravi Sethi Author