Semantic Analysis¶
Semantic Analysis is performed in dsemantic.c
and dsemantic.h
.
This stage checks to see whether what the user has written is correct,
in terms of the nodes, their connections, and their definitions. During this
stage, we build up a representation of the sheet in memory (Sheet
, defined
in dsheet.h
), and check for errors using that structure.
Property Scanning¶
Firstly, we scan the syntax tree we made in Syntax Analysis, and extract the property statements that were found. We use the property statements to manipulate a sheet’s data.
-
void
d_semantic_scan_properties
(Sheet *sheet, SyntaxNode *root, Sheet **priors, bool debugIncluded) Sets the properties of the sheet, given the syntax tree.
- Parameters
sheet
: A pointer to the sheet where we want to set the properties.root
: The root node of the syntax tree.priors
: A NULL-terminated list of sheets that, if included, will produce an error.debugIncluded
: If true, compile included sheets in debug mode.
Takes a reference to a Sheet
, and depending on what property statements
we find in the syntax tree root
, we manipulate sheet
. We first check
to see if its a property name we expect to find. Then, the necessary
information is passed onto a function called add_property_PropertyName
in dsemantic.c
, where PropertyName
is the name of the property, e.g.
add_property_Variable
.
Node Scanning¶
Secondly, we scan the syntax tree to find the general statements, and we find out:
What the name of the node is, and if it is a defined name.
What the inputs and outputs are. If they are literals, are they of the correct data type? If they are names, are they defined?
Note
When we say literals we mean data explicitly inserted into the source
code, e.g. In Print(#1, "Hello, world!")
, "Hello, world!"
is the
literal.
Note
If an integer literal was given to a socket that is exclusively for floats, then the compiler should automatically convert the integer literal to a float literal for convenience.
Afterwards, we take the input and output sockets and connect them. While connecting them, we check to see if the data types of both sockets match, and we check that an output execution socket doesn’t have more than one connection, etc.
-
void
d_semantic_scan_nodes
(Sheet *sheet, SyntaxNode *root) Sets the nodes of the sheet, given the syntax tree.
NOTE: This function also sets the connections between the nodes.
- Parameters
sheet
: A pointer to the sheet where we want to set the properties.root
: The root node of the syntax tree.
Takes sheet
and creates a list of Node
in the sheet’s graph, as well as
a list of Wire
to connect the sockets of the nodes.
When we want to find out where the name of something is defined, we can use:
-
AllNameDefinitions
d_get_name_definitions
(struct _sheet *sheet, const char *name) Get all of the places where a name is defined, and what the name’s type is.
We will also check recursively up the includes of sheets.
- Return
An array of NameDefinition.
- Parameters
sheet
: The sheet to start looking from.name
: The name to query.
This gets all the places a name is defined. Ideally, there should only be one definition of a name. If there is more than one, then we error saying that we can’t pick a definition, and if there is none, then we error saying the name is not defined.
If we know there is only one definition of a name, then we can go straight to the name’s definition with this function:
-
const NodeDefinition *
d_get_definition
(struct _sheet *sheet, const char *name, size_t lineNum, const char *funcName, NameDefinition *nameDef) Get a node’s definition from it’s name.
- Return
The node’s definition.
- Parameters
sheet
: The sheet the node is a part of.name
: The name of the node.lineNum
: In case we error, say where we errored from.funcName
: If the name is Define or Return, we need the function name so we can get the correct sockets.nameDef
: A pointer that is set to the node’s name definition. If the node definition returns NULL, do not trust this value.
Note
Nodes store a NameDefinition
as a cache for Code Generation,
but what they store depends on the node itself. If the node is the getter
or setter of a variable, the definition stored references the variable.
If the node is a Define
or Return
node, the definition stores a
reference to the function being defined or returned. Otherwise, the
definition stores a reference to the node’s functionality itself.
Remember that the sheets can include other sheets, so we need to check included sheets recursively to see if there are names that are defined that we can use.
You can free an AllNameDefinitions
struct with:
-
void
d_free_name_definitions
(AllNameDefinitions *definitions) Free an
AllNameDefinitions
struct. It should not be used after it has been freed.- Parameters
definitions
: The structure to free.
The act of checking the data type matching of connections is done when we add a connection to a socket with the function:
-
bool
d_graph_add_wire
(Graph *graph, Wire wire, const char *filePath) Add a wire to a sheet, connecting two sockets.
- Return
If the operation was successful.
- Parameters
graph
: The graph to add the wire to. Both nodes have to belong to this graph.wire
: The wire to add to the sheet.filePath
: In case we error, say where we errored from.
which is defined in dgraph.h
.
Reducing Data Types¶
Wait, reducing types? What does that mean???
I’ve been hiding a dark truth from you this whole time… data types are sometimes not what they seem. Sometimes… a socket can have more than one data type! (insert shocked Pikachu face)
So this is a thing that exists purely so that we don’t have multiple functions
for different data types. For example, we can use the Add
node without
having to worry about whether to call AddInt
or AddFloat
. We say
sockets have vague data types if they can allow more than one data type.
Note
Only core functions defined in dcore.c
have vague data types.
The reason we need to reduce down vague data types to single data types is for when we come to Code Generation, the compiler needs to know exactly what data types it is dealing with. As a wise woman once said:
“Ain’t nobody got time for ambiguity!”
-
void
d_semantic_reduce_types
(Sheet *sheet) Take the connections of a sheet which may have “vague” connections, and reduce them to unique types with the information we have.
e.g. If
Multiply
has at least oneFloat
input, the output must be aFloat
.- Parameters
sheet
: The sheet to reduce the types on.
Takes the connections of a sheet and reduces down vague connections.
It does this by completing passes until all of the sockets are reduced. During each pass, it will use the information it has to reduce down a connection. Here is an example:
[Variable(i, Integer, 10)]
i~#1
Multiply(#1, 2)~#2
Multiply(#2, 3.14)~#3
Start~#4
Print(#4, #3)
Since
i
is an Integer, line#1
begins as an Integer as well.When the first
Multiply
is reached, it checks it’s input data types, which are both integers. Thus, it’s output is set from Integer | Float to just an Integer.When the second
Multiply
is reached, like the first, it checks it’s input data types. The first is an integer since we’ve just reduced it, but the second is a Float literal. Thus, the output is set from Integer | Float to just a Float.When the
Print
is reached, it checks its non-execution connection, and we’ve just reduced it down to a Float. Thus, the socket is set from the union of all data types (sincePrint
should be able to print anything except for Execution types) to just a Float.
We are lucky in this example that everything got fully reduced in one pass since the order of the nodes was optimal. But if we change the order in which the nodes are reduced, we will need to do multiple passes before every node is reduced.
What’s that, I hear you ask? How do you represent combinations of multiple data types, and how do we make sure that two vague data types are compatible? Well, I’m glad you asked!
Data types are represented as separate bits in an integer, for example, in
the integer 00000110 represents Integer | Float, since both the Integer bit
(TYPE_INT = 2
) and the Float bit (TYPE_FLOAT = 4
) are active.
Therefore, if two data types are compatible, then at least one base data type
has to be present in both data types - so we can bitwise AND both data
types, and hope to god we don’t get 0, i.e.
(type1 & type2) != 0
Here is what the DType
enum looks like:
-
enum
_dType
An enum for the data types in Decision.
The values go up in powers of 2. This is so that we can combine data types to make vague data types, e.g. a number is
TYPE_INTEGER | TYPE_FLOAT
.Note that there are several macros for common vague data types, e.g.
TYPE_NUMBER
,TYPE_BITWISE
,TYPE_VAR_ANY
, …Values:
-
TYPE_NONE
= 0
-
TYPE_EXECUTION
= 1
-
TYPE_INT
= 2
-
TYPE_FLOAT
= 4
-
TYPE_STRING
= 8
-
TYPE_BOOL
= 16
-
TYPE_NAME
= 32
-
Detecting Loops¶
If you abstract sheets down far enough, they will just turn into directed graphs. Get your graph theory textbook of choice at the ready, and you’ll see why this matters!
Consider this example:
i~#1
Add(#1, #3)~#3
This is clearly erroneous since we can never get a value for #3
. In fact,
all loops are erroneous, since we can’t get input values from nodes that
depend on the input value indirectly.
So fourthly, we need to check that the directed graph we have is a tree, i.e. a connected graph with no cycles. The connected bit we’ve already dealt with, since unconnected nodes can’t be accessed anyway, but the no cycles bit is tricky…
We also want to take the time to let the user know if there are any nodes that have no connections, i.e. redundant nodes.
-
void
d_semantic_detect_loops
(Sheet *sheet) After a sheet has been connected in
d_semantic_scan_nodes
, go through the sheet and see if there are any loops.“Loops are bad, and should be given coal at christmas.”
In retrospect, this was a bad quote. Instead, give it something that won’t cause climate change, like a really cheap sticker.
While we are here, we also check to see if there are any redundant nodes.
- Parameters
sheet
: The connected sheet to check for loops.
This function goes through all of the nodes and finds nodes with no input
sockets (except for names). These nodes are the start of a path of execution.
We each of these nodes, we enter a function with the node with no input sockets
start
, we get the outputs of the node and recursively call the function
again, and again. We are essentially checking every possible path through the
graph from the starting node using depth-first search. As we are checking each
path, we are adding the latest node into pathArray
, which acts as a stack,
and represents the current path we’ve taken. Thus, if the node we’ve just added
is already in pathArray
, we have a cycle!
Checking Subroutine Returns¶
Last but not least, once we have verified that there are no loops in the graph, we need to check that every subroutine in the sheet has a Return node at the end of every possible execution path.
The reason we need to check this is because execution paths can branch, e.g. with an IfThenElse node. If an execution path doesn’t end with a Return, then we don’t know what return values the subroutine should return!
Functions don’t have this problem, as they have one and only one Return node.
This check is done with the function:
-
void
d_semantic_check_subroutine_returns
(Sheet *sheet) After a sheet has been checked for loops with
d_semantic_detect_loops
, check to see if all execution paths from the Define of a subroutine end with a Return.- Parameters
sheet
: The sheet to check.
It calls a recursive function that checks every possible execution path with
depth-first search, similar to how d_semantic_detect_loops
works, but this
time it only traverses execution wires.
Conclusion¶
So we have a bunch of things that we need to check the sheet for - surely a good samaritan would have put all of the above and wrapped it up in a nice, cosy function for everyone to use?
-
void
d_semantic_scan
(Sheet *sheet, SyntaxNode *root, Sheet **priors, bool debugIncluded) Perform Semantic Analysis on a syntax tree.
- Parameters
sheet
: The sheet to put everything into.root
: The valid syntax tree to scan everything from.priors
: A NULL-terminated list of sheets that, if included, will produce an error. This is to prevent circular includes.debugIncluded
: If true, compile any included sheets in debug mode.