Syntax Analysis¶
Syntax Analysis is performed in dsyntax.c
and dsyntax.h
.
The aim of Syntax Analysis is to determine whether the user wrote the source code correctly in terms of the structure of the program. This is done by generating a syntax tree, which not only helps us later with analysing the code but also helps us find any syntax errors in the code.
The syntax tree has a root node, which represents the entire program. The children of the root node would then be the statements in the program, then those statements would have the name, the arguments, and the outputs as their children, etc.
Since the syntax of the language doesn’t allow for nested statements like in other languages, we use a top-down method of creating the syntax tree rather than using grammars and bottom-up methods which can handle nested statements. Therefore in the compiler, we use the recursive-descent method to convert the lexical stream from Lexical Analysis to a syntax tree.
Structures¶
SyntaxDefinition
An enumerator representing the definitions of syntax statements, for example, there will be one for line identifiers, another for arguments, another for statements as a whole, etc.
Note
There is one special value in SyntaxDefinition
which is
STX_TOKEN = 0
, which represents a lexical token.
SyntaxContext
A structure that, during Syntax Analysis, keeps track of data like where we are in the lexical stream, what the current
LexToken
is, etc.SytaxNode
A node in the syntax tree. It contains a
SyntaxDefinition
, and a reference to a lexical token (LexToken*
) to preserve the info gathered by Lexical Analysis - however, it only stores the said reference is theSyntaxDefinition
isSTX_TOKEN = 0
.Each node also stores 2 pointers to other syntax nodes (
SyntaxNode*
), which make up the structure of the tree. One is the direct child of the node, and the other is the next sibling of the node. The reason for this structure is so that we don’t have to store a dynamic list of children, which in C is a pain.Therefore, the syntax tree as a whole is represented as a pointer (
SyntaxNode*
) to the root node.
Functions¶
Meta-variables¶
Each SyntaxDefinition
has an associated function called a
meta-variable in dsyntax.c
, which returns if the section of lexical
tokens we’re looking at fits the syntax rule of the meta-variable, and may
potentially throw an error if it specifically breaks the rules of that
meta-variable.
Here is an example of a meta-variable for a line identifier, which is a line
symbol (#
), followed by an integer literal:
/* <lineIdentifier> ::= <Line><IntegerLiteral> */
static SyntaxResult lineIdentifier(SyntaxContext *context) {
SyntaxResult out;
out.node = d_syntax_create_node(STX_lineIdentifier, NULL, context->lineNum);
out.success = true;
VERBOSE(5, "ENTER\tlineIdentifier\tWITH\t%i\n",
context->currentToken->type);
if (context->currentToken->type == TK_LINE) {
nextToken(context);
if (context->currentToken->type == TK_INTEGERLITERAL) {
SyntaxNode *literal = d_syntax_create_node(
STX_TOKEN, context->currentToken, context->lineNum);
d_syntax_add_child(out.node, literal);
nextToken(context);
} else {
syntax_error(
"Expected integer literal to follow the line symbol (#)",
context);
fail_definition(&out);
}
} else {
syntax_error(
"Expected line identifier to start with the line symbol (#)",
context);
fail_definition(&out);
}
return out;
}
Syntax functions¶
In dsyntax.h
, there are a few functions to help you traverse and
manipulate a syntax tree, for instance:
-
void
d_syntax_add_child
(SyntaxNode *parent, SyntaxNode *child) Add a child node to a parent node.
- Parameters
parent
: The parent node to attach the child onto.child
: The child node to add onto the parent.
-
size_t
d_syntax_get_num_children
(SyntaxNode *parent) Get how many children a node has.
- Return
The number of children
parent
has.- Parameters
parent
: The node to query.
There is also a mechanism to help you find all of the syntax nodes with a
given SyntaxDefinition
:
-
SyntaxSearchResult
d_syntax_get_all_nodes_with
(SyntaxNode *root, SyntaxDefinition definition, bool traverseChildrenOfFound) Return all occurances of a tree with a given definition as a malloc’d SyntaxSearchResult.
- Return
A malloc’d SyntaxSearchResult.
- Parameters
root
: The root node to search from.definition
: The definition we want our found nodes to have.traverseChildrenOfFound
: If we find a node that we want, should we also traverse the children of that found node?
It returns a list of SyntaxNode*
which have the same SyntaxDefinition
as definition
. It also gives you the choice of whether you want to continue
traversing the children of a satisfactory node to try and find more, although
this will mostly be false as certain syntax definitions will only appear in
certain layers of the tree.
Because C is our bestest friend in the world, we need to free the results once we’re done with them:
-
void
d_syntax_free_results
(SyntaxSearchResult results) Free search results from memory.
- Parameters
results
: The results to free.
However, the most important function is:
-
SyntaxResult
d_syntax_parse
(struct _lexStream stream, const char *filePath) Parse a lexical stream, and generate a syntax tree.
- Return
The malloc’d root node of the syntax tree, and whether the parsing was successful or not.
- Parameters
stream
: The stream to parse from.filePath
: In case we error, say what the file path was.
It returns the root node of a syntax tree, given the stream
from
Lexical Analysis. In the event there is an error, it blames
filePath
.
-
void
d_syntax_free_tree
(SyntaxNode *root) Free all the nodes from a syntax tree.
- Parameters
root
: The root node of the tree.
This frees the syntax tree after we are done with it.
Like in Lexical Analysis, there is a debugging method:
-
void
d_syntax_dump_tree
(SyntaxNode *root) Dump the contents of a tree recursively.
- Parameters
root
: The root node to start from.
It dumps the syntax tree with indentation depending on the depth of the node.
Note
The syntax gets dumped automatically whenever you compile source code with
a VERBOSE_LEVEL
of 4 or higher.