Compiler Design Glossary






Compiler Design Glossary



Compiler Design Glossary


General Concepts

Compiler Design
The study of how to build a compiler, which is a program that translates high-level programming languages into machine code. The focus is on the translation process, ensuring correctness and code efficiency.
Compiler
A program that translates high-level programming languages (e.g., Python, C++, or Java) into machine code that a computer’s hardware can execute directly.
Phases of a Compiler
The distinct stages of the compilation process. The main phases are: Lexical Analysis, Syntax Analysis, Semantic Analysis, Intermediate Code Generation, Code Optimization, and Code Generation.

Back to top


Lexical Analysis

Lexical Analysis (Scanning)
The first phase of a compiler that breaks the source code into small, meaningful units called tokens. It involves reading the source program character by character and organizing them into tokens.
Token
A meaningful sequence of characters that can be treated as a single unit in the grammar of the programming language. Examples include keywords, identifiers, and operators.
Lexeme
An actual string of characters from the source code that matches a pattern and generates a token. For example, in the statement float x;, “float” is a lexeme for the “keyword” token, and “x” is a lexeme for the “identifier” token.
Keywords
Reserved words with specific meanings in the language. They cannot be used as variable names or identifiers.
Identifiers
Names given to variables, functions, arrays, or other user-defined items.
Literals (Constants)
Fixed values in the code that cannot change during a program’s execution.
Operators
Symbols used to perform operations.
Punctuation (Delimiters)
Symbols that structure the program, such as commas, parentheses, and curly braces.
Symbol Table
A data structure used by the compiler to store information about variables, functions, and other identifiers.

Back to top


Syntax Analysis and Parsing

Syntax Analysis (Parsing)
The second phase of a compiler that checks whether the tokens follow the rules of the programming language’s grammar.
Parser
A component of the compiler that performs syntax analysis by checking if the input tokens form a valid structure according to the language’s grammar. It takes a stream of tokens as input and produces a parse tree or syntax errors as output.
Parse Tree
A hierarchical data structure that represents the syntactic structure of a program according to the grammar rules. It contains all non-terminals and terminals.
Abstract Syntax Tree (AST)
A simplified, more abstract version of a parse tree that represents the semantic structure of the code and focuses on essential elements.
Context-Free Grammar (CFG)
A formal grammar used to define the syntax rules of programming languages.
Ambiguous Grammar
A grammar in which a single string can have more than one parse tree.
Top-Down Parsers
Parsers that build the parse tree from the root (start symbol) down to the leaves (terminals). They use a leftmost derivation.
LL(1) Parser
A type of top-down parser that reads input from left to right, constructs a leftmost derivation, and uses one “lookahead” token to decide parsing actions.
Bottom-Up Parsers
Parsers that build the parse tree from the leaves (terminals) up to the root (start symbol). They construct the rightmost derivation in reverse.
LR Parser
A type of bottom-up parser that reads input from left to right and constructs a rightmost derivation in reverse.
Shift-Reduce Actions
The two primary actions in bottom-up parsing. Shift moves the next input symbol onto a stack, and reduce replaces a sequence of symbols on the stack with a non-terminal.

Back to top


Intermediate Code and Optimization

Three-Address Code (3AC)
A code representation where each statement has at most three operands. This is an intermediate representation used for optimization and portability.
Static Single Assignment (SSA)
A code representation where every variable in the code has a single assignment. This simplifies optimization.
Control Flow Graph (CFG)
A representation of a program using nodes (basic blocks) and edges (control flow).
Basic Block
A sequence of instructions with one entry point and one exit point.
Code Optimization
The process of improving the intermediate code to reduce execution time and memory usage.
Constant Folding
Evaluating constant expressions at compile time.
Dead Code Elimination
Removing code that does not affect the program’s output.
Loop Optimization
A category of optimizations that includes techniques like Code Motion (moving loop-invariant code outside the loop) and Loop Unrolling (reducing loop overhead by executing multiple iterations at once).
Peephole Optimization
Analyzing a small sequence of instructions (a “peephole”) and replacing it with a faster alternative.

Back to top


Runtime Environment

Runtime Environment
The state of the target machine, including software libraries and environment variables, that provides services to a running program.
Activation
Each execution of a procedure.
Activation Tree
A tree that shows the way control enters and leaves procedure activations, with each node representing an activation. The root of the tree is the main function.
Activation Record (Frame)
A data structure that manages the information needed for a single execution of a procedure. It is pushed onto the Control Stack when a procedure is called and popped when it returns.
Linking
The process of combining multiple object files and resolving symbolic references to create a single executable file.
Loading
The process of placing the executable file into memory, resolving runtime addresses, and preparing it for execution by the CPU.

Back to top


Syntax Directed Translation

Syntax Directed Translation (SDT)
A method that combines a context-free grammar with semantic rules to assign meaning or perform actions during parsing. It is used to generate code, check types, and build abstract syntax trees.
Attributes
Values attached to grammar symbols that are computed by semantic rules.
Synthesized Attributes
Attributes computed from the attributes of a node’s children in the parse tree (a bottom-up flow).
Inherited Attributes
Attributes computed from the attributes of a node’s parent or siblings (a top-down or lateral flow).
S-Attributed Grammar
A grammar that only uses synthesized attributes.
L-Attributed Grammar
A grammar that uses both synthesized attributes and a restricted form of inherited attributes (from the parent or left siblings).

Back to top


TECH TALENTS
Average rating:  
 0 reviews
Social Share Buttons and Icons powered by Ultimatelysocial
Scan the code