Code Generation¶
Code Generation is perfomed in dcodegen.c
and dcodegen.h
.
After we’ve made a model of the sheet in memory, and after we’ve checked that everything the user has made is syntactically and semantically correct, we now need to generate the bytecode for The Virtual Machine to run.
So far, the difference between execution and non-execution nodes has only been that execution nodes have execution sockets - the main difference will be explained in this section.
Definition Nodes¶
For definition nodes, the first thing we do is generate a return instruction before generating anything else. This essentially acts as a “bookmark” for the function in the bytecode, which the linker can later point to.
Subroutines¶
For subroutines like Start
, the code generator will recurse through the
execution wires and generate the bytecode for each execution node in sequence.
Functions¶
For functions, the code generator will try and find the Return
node for
the function, and recurse backwards to generate the bytecode to calculate
the return values. The reason for this is because we don’t have an order of
execution as we do for execution nodes.
Execution Nodes¶
When an execution node is reached, it will first generate the bytecode to get the inputs. If the inputs are literals, then this is trivial. But if there is a connection to the input’s socket, then we need to follow the connection and recursively generate the bytecode for the node that starts the connection.
Secondly, bytecode is generated depending on the node itself, using the inputs. If the node is a core function, then we can directly insert the bytecode in. Otherwise, we make use of the calling abilities of The Virtual Machine.
Lastly, it will check to see if the last output execution socket has a
connection. If it does, then there is something more to do after this node
is finished. We then recursively generate the bytecode for the next execution
node and add it on to the bytecode we’ve already generated. Otherwise, we may
want to put a return instruction at the end of the sequence to say that we
are done. In some cases, however, this is not needed, like inside For
loops.
When getting output from an execution node, the generator will copy the output onto the top of the stack in order to preserve the original value.
Non-Execution Nodes¶
When a non-execution node is reached, it will first generate the bytecode to get the inputs, like with execution nodes.
Next, bytecode is generated depending on the node itself, using the inputs. This process is almost exactly the same as with execution nodes, including if the function is not a core function.
However, unlike execution nodes, it does not generate any more bytecode past the current node. This is because non-execution nodes are purely meant to be called upon from execution nodes.
When getting output from a non-execution node, it will be used by the node getting the input. This ensures that if values change over time, then the calculations will produce different results over time as well. Non-execution nodes are purely meant to just manipulate the inputs to execution nodes.
Calling Convention¶
If we see a node with a name that isn’t a core function, it needs to be called. How we call functions is important and should be as efficient as possible since there could be lots of calls during execution, especially if the call is inside a loop.
Arguments and return values are easy - arguments are pushed onto the stack in order before the call, and return values are pushed onto the stack in reverse order before returning.
Note
The calling and returning instructions in The Virtual Machine have a byte immediate operand for how many arguments or return values there are, meaning you can only have at most up to 256 arguments and 256 returns.
Structures¶
BCode
Essentially a list of
char
. Its purpose is to store bytecode and be a convenient way to build up the bytecode. It also stores alongside the code a list of instruction indexes and what they should be linked to, as well as a list of indexes to say where functions start pushing return values onto the stack.
There are 3 convenient functions for BCode
:
-
BCode
d_malloc_bytecode
(size_t size) Create a malloc’d BCode object, with a set number of bytes.
- Return
The BCode object with malloc’d elements.
- Parameters
size
: The number of bytes.
Automatically allocates memory to store bytecode for you.
Likewise, there is a function to free the memory:
-
void
d_free_bytecode
(BCode *bcode) Free malloc’d elements of bytecode.
- Parameters
bcode
: The bytecode to free.
Arguably the most useful function however is:
-
void
d_concat_bytecode
(BCode *base, BCode *after) Append bytecode to the end of another set of bytecode.
- Parameters
base
: The bytecode to be added to.after
: The bytecode to append. Not changed.
Concatenates after
onto the end of base
, which makes building up the
bytecode simple.
BuildContext
Contains useful information used by the code generator to help make the most efficient bytecode possible. Only one instance is made at the beginning of Bytecode Generation, and it is passed around by reference throughout.
It also stores a section of memory for data like variables and string literals, which will be stored in the sheet, and also keeps track of links to those items.
One thing we want to guarantee when creating the bytecode is that The Virtual Machine stack is as small as possible throughout the execution of the program. This is done by keeping track, at any one time, of what the index of the top of the stack is.
Sockets¶
One important thing to note is how a node knows which input corresponds to which element in the stack.
Inside the Node
structure defined in dgraph.h
, there is a property
called _stackPositions
. Only Code Generation uses this property.
If the socket is connected, the socket’s entry will be set to the index of the
socket’s value in the stack at that time. If there is no connection, then an
index may or may not be assigned for the literal value, depending on if we want
to use the literal value as an immediate or not.
Note
See The Virtual Machine to see what I mean by an immediate value.
This way, indexes are passed along from socket to socket, so that the next node knows in what index the socket’s value is in.
Note
Most of the bytecode generation functions guarantee that the output is on the top of the stack, so usually we take advantage of this and don’t bother to read the socket’s stack index.
Linking Functions¶
The whole process of linking is explained in Linking, but in order for it to work, we need to do 2 things:
We need to know what things we need to link to, like variables or functions.
We need to say which instructions need to point to which link.
-
void
d_add_link_to_ins
(BuildContext *context, BCode *bcode, size_t insIndex, LinkMeta linkMeta, size_t *indexInList, bool *wasDuplicate) Add a future link to an instruction in bytecode.
- Parameters
context
: The context needed to store the link.bcode
: The bytecode containing the instruction to link.insIndex
: The index of the instruction to edit when linking is taking place.linkMeta
: The link metadata.indexInList
: Stores in the reference the index of the new metadata in the list.wasDuplicate
: Sets the reference to true if a matching linkMeta was already found in the array, false otherwise.
This function puts information into context
to say which instruction in
bcode
(index insIndex
) needs to point to some link linkMeta
. If
linkMeta
is a duplicate of a previously linked linkMeta
, then the
instruction points to the original version, and the new version is ignored.
indexInList
is replaced with the index of linkMeta
in the list of
LinkMeta
in context
, and wasDuplicate
is replaced with a boolean
representing if linkMeta
was already found in context
.
-
char *
d_allocate_from_data_section
(BuildContext *context, size_t size, size_t *index) Allocate a number of bytes from the data section for some data. That data could be a string literal, variable, etc.
- Return
A pointer to the start of the new allocation. Returns NULL if size is 0.
- Parameters
context
: The context that contains the built-up data section.size
: The size of the allocation in bytes.index
: Overwrites with the index of the start of the allocation from the start of the data section.
Allocates a chunk of memory for the data section in context
. size
is
the size of the allocation in bytes, and index
is replaced with the index
of the start of the allocation, which is useful when creating links. The
function returns a pointer to the start of the allocation, rather than the
index. This makes copying data into the allocation easy.
-
size_t
d_allocate_string_literal_in_data
(BuildContext *context, BCode *linkCode, size_t insIndex, char *stringLiteral) Given a string literal, allocate memory from the data section to store the literal.
If there is a duplicate string literal found, the links to string literal passed are pointed to the literal already stored in the data section. NOTE: If the string was a duplicate, it is freed!
- Return
The index of the data section where the string literal starts.
- Parameters
context
: The context containing the built-up data section.linkCode
: The bytecode to link to the string literal in the data section.insIndex
: The index of the LOADUI instruction to replace when linking is taking place.stringLiteral
: The string literal to place in the data section.
Allocated memory for the data section of context
in order to fit the
length of stringLiteral
. This function automatically links an instruction
from linkCode
(index insIndex
) to the string literal’s new location.
-
void
d_allocate_variable
(BuildContext *context, SheetVariable *variable, size_t size, size_t indexInLinkMeta) Allocate memory from the data section to store a variable.
- Parameters
context
: The context containing the built-up data section.variable
: The variable data.size
: How many bytes to allocate for the variable.indexInLinkMeta
: The index of the variable’s LinkMeta entry in the build context.
Allocates memory for the data section of context
, in order to fit the
content of a variable variable
. You can specify a size, which is usually
sizeof(dint)
, all you need to provide is the index of the variable’s entry
in the LinkMeta
list in context
.
Generation Functions¶
-
BCode
d_push_literal
(BuildContext *context, NodeSocket socket, bool cvtFloat) Generate bytecode to push a literal onto the stack.
- Return
Bytecode to push the socket’s literal onto the stack.
- Parameters
context
: The context needed to build the bytecode.socket
: The socket of the literal to push onto the stack.cvtFloat
: Converts the literal to a float if possible.
This function takes a literal value and pushes it onto the top of the stack.
-
BCode
d_push_variable
(BuildContext *context, size_t nodeIndex) Given a node that is the getter of a variable, generate bytecode to push the value of the variable onto the stack.
- Return
Bytecode to push the variable’s value onto the stack.
- Parameters
context
: The context needed to build the bytecode.nodeIndex
: The index of the node that is the getter of a variable.
This function takes a variable’s value and pushes it onto the top of the stack.
-
BCode
d_push_input
(BuildContext *context, NodeSocket socket, bool forceFloat) Given an input socket, generate bytecode to push the value of the input to the top of the stack.
- Return
Bytecode to push the input’s value onto the stack.
- Parameters
context
: The context needed to build the bytecode.socket
: The input socket to get the value for.forceFloat
: Force integers to be converted to floats.
This function takes an input socket and pushes it’s value onto the top of the stack. However, that is not as simple as it sounds, as it may generate the entire bytecode nessesary to get the value.
-
BCode
d_push_node_inputs
(BuildContext *context, size_t nodeIndex, bool order, bool ignoreLiterals, bool forceFloat) Given a node, generate bytecode to push the values of the inputs to the top of the stack.
- Return
Bytecode to push all input’s values onto the stack.
- Parameters
context
: The context needed to generate the bytecode.nodeIndex
: The index of the node whose input sockets to generate bytecode for.order
: If true, the inputs are pushed in order, such that the last input is at the top. If false, the inputs are pushed in reverse order, such that the first input is at the top.ignoreLiterals
: Do not generate bytecode for non-float literal inputs.forceFloat
: Force integers to be converted to floats.
This function takes the variable inputs of a node, and pushes them onto the top of the stack in an order of your choosing.
-
BCode
d_generate_operator
(BuildContext *context, size_t nodeIndex, DIns opcode, DIns fopcode, DIns fiopcode, bool forceFloat) Given an operator node, generate the bytecode for it.
- Return
Bytecode to get the output of an operator.
- Parameters
context
: The context needed to generate the bytecode.nodeIndex
: The index of the operator node to get the result for.opcode
: The operator instruction.fopcode
: The float variant of the instruction.fiopcode
: The full immediate variant of the instruction.forceFloat
: Should the output always be a float?
This function generates the bytecode for a generic operator (e.g. Add
).
-
BCode
d_generate_comparator
(BuildContext *context, size_t nodeIndex, DIns opcode, DIns fopcode, fimmediate_t strCmpArg, bool notAfter) Given a comparator node, generate the bytecode for it.
- Return
Bytecode to get the output of a comparator.
- Parameters
context
: The context needed to generate the bytecode.nodeIndex
: The index of the comparator node to get the result for.opcode
: The comparator instruction.fopcode
: The float variant of the instruction.strCmpArg
: The SYS_STRCMP argument to use to compare strings.notAfter
: Do we invert the answer at the end?
This function generates the bytecode for a generic comparator (e.g. Equal
).
-
BCode
d_generate_call
(BuildContext *context, size_t nodeIndex) Given a node that calls a function or subroutine, generate the bytecode to call it.
- Return
Bytecode to call the function or subroutine.
- Parameters
context
: The context needed to generate the bytecode.nodeIndex
: The index of the node to generate the bytecode for.
This function generates the bytecode to call a function or subroutine. It abides to the calling convention and pushes the arguments in the correct order.
-
BCode
d_push_argument
(BuildContext *context, NodeSocket socket) Given an output socket that is a function/subroutine argument, generate bytecode to push the value of the argument to the top of the stack.
- Return
Bytecode to push the argument.
- Parameters
context
: The context needed to generate the bytecode.socket
: The output socket representing the function argument.
If you are in a function or subroutine, and you want an argument on the top of the stack, this is the function for that.
-
BCode
d_generate_return
(BuildContext *context, size_t returnNodeIndex) Given a Return node, generate the bytecode to return from the function/subroutine with the return values.
- Return
Bytecode to return from the function/subroutine.
- Parameters
context
: The context needed to generate the bytecode.returnNodeIndex
: The Return node to return with.
This function generates the bytecode to return from the function or subroutine. It abides to the calling convention and pushes the return values in reverse order.
-
BCode
d_generate_nonexecution_node
(BuildContext *context, size_t nodeIndex) Given a non-execution node, generate the bytecode to get the output.
- Return
Bytecode to run the nonexecution node’s function.
- Parameters
context
: The context needed to generate the bytecode.nodeIndex
: The index of the non-execution node.
This function generates the bytecode for a generic non-execution node.
-
BCode
d_generate_execution_node
(BuildContext *context, size_t nodeIndex, bool retAtEnd) Given an execution node, generate the bytecode to get the output.
- Return
Bytecode to run the execution node’s subroutine.
- Parameters
context
: The context needed to generate the bytecode.nodeIndex
: The execution node.retAtEnd
: Should the bytecode return at the end?
This function generates the bytecode for a generic execution node.
-
BCode
d_generate_start
(BuildContext *context, size_t startNodeIndex) Given a Start node, generate the bytecode for the sequence starting from this node.
- Return
The bytecode generated for the Start function.
- Parameters
context
: The context needed to generate the bytecode.startNodeIndex
: The Start node index.
This function generates the bytecode for the main Start function.
-
BCode
d_generate_function
(BuildContext *context, SheetFunction func) Given a function, generate the bytecode for it.
- Return
The bytecode generated for the function.
- Parameters
context
: The context needed to generate the bytecode.func
: The function to generate the bytecode for.
This function generates the bytecode for a function or subroutine defined in the sheet.
Conclusion¶
In order to do all of the above, all you need is the function:
-
void
d_codegen_compile
(Sheet *sheet, bool debug) Given that Semantic Analysis has taken place, generate the bytecode for a given sheet.
- Parameters
sheet
: The sheet to generate the bytecode for.debug
: If true, we include debug information as well.
Which takes a sheet, generates the bytecode, and inserts it into the sheet once it’s done.
Output¶
By the end of compilation, if everything succeeds, then you should have a
Sheet
with the contents of an object file. An object file is made up
of multiple sections that contain different data.
The functionality to save, load and dump Decision object files can be found
in dasm.h
.
Object Sections¶
.text
: A list of instructions for The Virtual Machine to execute..main
: An integer which says which instruction in.text
should be the one to execute first if this sheet is where we start executing. It essentially points to the compiledStart
..data
: An allocated section of memory with items like variables and string literals contained inside..lmeta
: Essentially a list ofListMeta
, which contains data on what and where something is..link
: A table of what instructions point to which index in the.lmeta
section..func
: Essentially a list ofSheetFunction
..var
: Essentially a list ofSheetVariable
..incl
: A list of paths to sheets that this sheet includes.