Code Generation

Code Generation is perfomed in dcodegen.c and dcodegen.h.

After we’ve made a model of the sheet in memory, and after we’ve checked that everything the user has made is syntactically and semantically correct, we now need to generate the bytecode for The Virtual Machine to run.

So far, the difference between execution and non-execution nodes has only been that execution nodes have execution sockets - the main difference will be explained in this section.

Definition Nodes

For definition nodes, the first thing we do is generate a return instruction before generating anything else. This essentially acts as a “bookmark” for the function in the bytecode, which the linker can later point to.

Subroutines

For subroutines like Start, the code generator will recurse through the execution wires and generate the bytecode for each execution node in sequence.

Functions

For functions, the code generator will try and find the Return node for the function, and recurse backwards to generate the bytecode to calculate the return values. The reason for this is because we don’t have an order of execution as we do for execution nodes.

Execution Nodes

When an execution node is reached, it will first generate the bytecode to get the inputs. If the inputs are literals, then this is trivial. But if there is a connection to the input’s socket, then we need to follow the connection and recursively generate the bytecode for the node that starts the connection.

Secondly, bytecode is generated depending on the node itself, using the inputs. If the node is a core function, then we can directly insert the bytecode in. Otherwise, we make use of the calling abilities of The Virtual Machine.

Lastly, it will check to see if the last output execution socket has a connection. If it does, then there is something more to do after this node is finished. We then recursively generate the bytecode for the next execution node and add it on to the bytecode we’ve already generated. Otherwise, we may want to put a return instruction at the end of the sequence to say that we are done. In some cases, however, this is not needed, like inside For loops.

When getting output from an execution node, the generator will copy the output onto the top of the stack in order to preserve the original value.

Non-Execution Nodes

When a non-execution node is reached, it will first generate the bytecode to get the inputs, like with execution nodes.

Next, bytecode is generated depending on the node itself, using the inputs. This process is almost exactly the same as with execution nodes, including if the function is not a core function.

However, unlike execution nodes, it does not generate any more bytecode past the current node. This is because non-execution nodes are purely meant to be called upon from execution nodes.

When getting output from a non-execution node, it will be used by the node getting the input. This ensures that if values change over time, then the calculations will produce different results over time as well. Non-execution nodes are purely meant to just manipulate the inputs to execution nodes.

Calling Convention

If we see a node with a name that isn’t a core function, it needs to be called. How we call functions is important and should be as efficient as possible since there could be lots of calls during execution, especially if the call is inside a loop.

Arguments and return values are easy - arguments are pushed onto the stack in order before the call, and return values are pushed onto the stack in reverse order before returning.

Note

The calling and returning instructions in The Virtual Machine have a byte immediate operand for how many arguments or return values there are, meaning you can only have at most up to 256 arguments and 256 returns.

Structures

BCode

Essentially a list of char. Its purpose is to store bytecode and be a convenient way to build up the bytecode. It also stores alongside the code a list of instruction indexes and what they should be linked to, as well as a list of indexes to say where functions start pushing return values onto the stack.

There are 3 convenient functions for BCode:

BCode d_malloc_bytecode(size_t size)

Create a malloc’d BCode object, with a set number of bytes.

Return

The BCode object with malloc’d elements.

Parameters
  • size: The number of bytes.

Automatically allocates memory to store bytecode for you.

Likewise, there is a function to free the memory:

void d_free_bytecode(BCode *bcode)

Free malloc’d elements of bytecode.

Parameters
  • bcode: The bytecode to free.

Arguably the most useful function however is:

void d_concat_bytecode(BCode *base, BCode *after)

Append bytecode to the end of another set of bytecode.

Parameters
  • base: The bytecode to be added to.

  • after: The bytecode to append. Not changed.

Concatenates after onto the end of base, which makes building up the bytecode simple.

BuildContext

Contains useful information used by the code generator to help make the most efficient bytecode possible. Only one instance is made at the beginning of Bytecode Generation, and it is passed around by reference throughout.

It also stores a section of memory for data like variables and string literals, which will be stored in the sheet, and also keeps track of links to those items.

One thing we want to guarantee when creating the bytecode is that The Virtual Machine stack is as small as possible throughout the execution of the program. This is done by keeping track, at any one time, of what the index of the top of the stack is.

Sockets

One important thing to note is how a node knows which input corresponds to which element in the stack.

Inside the Node structure defined in dgraph.h, there is a property called _stackPositions. Only Code Generation uses this property. If the socket is connected, the socket’s entry will be set to the index of the socket’s value in the stack at that time. If there is no connection, then an index may or may not be assigned for the literal value, depending on if we want to use the literal value as an immediate or not.

Note

See The Virtual Machine to see what I mean by an immediate value.

This way, indexes are passed along from socket to socket, so that the next node knows in what index the socket’s value is in.

Note

Most of the bytecode generation functions guarantee that the output is on the top of the stack, so usually we take advantage of this and don’t bother to read the socket’s stack index.

Linking Functions

The whole process of linking is explained in Linking, but in order for it to work, we need to do 2 things:

  1. We need to know what things we need to link to, like variables or functions.

  2. We need to say which instructions need to point to which link.

void d_add_link_to_ins(BuildContext *context, BCode *bcode, size_t insIndex, LinkMeta linkMeta, size_t *indexInList, bool *wasDuplicate)

Add a future link to an instruction in bytecode.

Parameters
  • context: The context needed to store the link.

  • bcode: The bytecode containing the instruction to link.

  • insIndex: The index of the instruction to edit when linking is taking place.

  • linkMeta: The link metadata.

  • indexInList: Stores in the reference the index of the new metadata in the list.

  • wasDuplicate: Sets the reference to true if a matching linkMeta was already found in the array, false otherwise.

This function puts information into context to say which instruction in bcode (index insIndex) needs to point to some link linkMeta. If linkMeta is a duplicate of a previously linked linkMeta, then the instruction points to the original version, and the new version is ignored.

indexInList is replaced with the index of linkMeta in the list of LinkMeta in context, and wasDuplicate is replaced with a boolean representing if linkMeta was already found in context.

char *d_allocate_from_data_section(BuildContext *context, size_t size, size_t *index)

Allocate a number of bytes from the data section for some data. That data could be a string literal, variable, etc.

Return

A pointer to the start of the new allocation. Returns NULL if size is 0.

Parameters
  • context: The context that contains the built-up data section.

  • size: The size of the allocation in bytes.

  • index: Overwrites with the index of the start of the allocation from the start of the data section.

Allocates a chunk of memory for the data section in context. size is the size of the allocation in bytes, and index is replaced with the index of the start of the allocation, which is useful when creating links. The function returns a pointer to the start of the allocation, rather than the index. This makes copying data into the allocation easy.

size_t d_allocate_string_literal_in_data(BuildContext *context, BCode *linkCode, size_t insIndex, char *stringLiteral)

Given a string literal, allocate memory from the data section to store the literal.

If there is a duplicate string literal found, the links to string literal passed are pointed to the literal already stored in the data section. NOTE: If the string was a duplicate, it is freed!

Return

The index of the data section where the string literal starts.

Parameters
  • context: The context containing the built-up data section.

  • linkCode: The bytecode to link to the string literal in the data section.

  • insIndex: The index of the LOADUI instruction to replace when linking is taking place.

  • stringLiteral: The string literal to place in the data section.

Allocated memory for the data section of context in order to fit the length of stringLiteral. This function automatically links an instruction from linkCode (index insIndex) to the string literal’s new location.

void d_allocate_variable(BuildContext *context, SheetVariable *variable, size_t size, size_t indexInLinkMeta)

Allocate memory from the data section to store a variable.

Parameters
  • context: The context containing the built-up data section.

  • variable: The variable data.

  • size: How many bytes to allocate for the variable.

  • indexInLinkMeta: The index of the variable’s LinkMeta entry in the build context.

Allocates memory for the data section of context, in order to fit the content of a variable variable. You can specify a size, which is usually sizeof(dint), all you need to provide is the index of the variable’s entry in the LinkMeta list in context.

Generation Functions

BCode d_push_literal(BuildContext *context, NodeSocket socket, bool cvtFloat)

Generate bytecode to push a literal onto the stack.

Return

Bytecode to push the socket’s literal onto the stack.

Parameters
  • context: The context needed to build the bytecode.

  • socket: The socket of the literal to push onto the stack.

  • cvtFloat: Converts the literal to a float if possible.

This function takes a literal value and pushes it onto the top of the stack.

BCode d_push_variable(BuildContext *context, size_t nodeIndex)

Given a node that is the getter of a variable, generate bytecode to push the value of the variable onto the stack.

Return

Bytecode to push the variable’s value onto the stack.

Parameters
  • context: The context needed to build the bytecode.

  • nodeIndex: The index of the node that is the getter of a variable.

This function takes a variable’s value and pushes it onto the top of the stack.

BCode d_push_input(BuildContext *context, NodeSocket socket, bool forceFloat)

Given an input socket, generate bytecode to push the value of the input to the top of the stack.

Return

Bytecode to push the input’s value onto the stack.

Parameters
  • context: The context needed to build the bytecode.

  • socket: The input socket to get the value for.

  • forceFloat: Force integers to be converted to floats.

This function takes an input socket and pushes it’s value onto the top of the stack. However, that is not as simple as it sounds, as it may generate the entire bytecode nessesary to get the value.

BCode d_push_node_inputs(BuildContext *context, size_t nodeIndex, bool order, bool ignoreLiterals, bool forceFloat)

Given a node, generate bytecode to push the values of the inputs to the top of the stack.

Return

Bytecode to push all input’s values onto the stack.

Parameters
  • context: The context needed to generate the bytecode.

  • nodeIndex: The index of the node whose input sockets to generate bytecode for.

  • order: If true, the inputs are pushed in order, such that the last input is at the top. If false, the inputs are pushed in reverse order, such that the first input is at the top.

  • ignoreLiterals: Do not generate bytecode for non-float literal inputs.

  • forceFloat: Force integers to be converted to floats.

This function takes the variable inputs of a node, and pushes them onto the top of the stack in an order of your choosing.

BCode d_generate_operator(BuildContext *context, size_t nodeIndex, DIns opcode, DIns fopcode, DIns fiopcode, bool forceFloat)

Given an operator node, generate the bytecode for it.

Return

Bytecode to get the output of an operator.

Parameters
  • context: The context needed to generate the bytecode.

  • nodeIndex: The index of the operator node to get the result for.

  • opcode: The operator instruction.

  • fopcode: The float variant of the instruction.

  • fiopcode: The full immediate variant of the instruction.

  • forceFloat: Should the output always be a float?

This function generates the bytecode for a generic operator (e.g. Add).

BCode d_generate_comparator(BuildContext *context, size_t nodeIndex, DIns opcode, DIns fopcode, fimmediate_t strCmpArg, bool notAfter)

Given a comparator node, generate the bytecode for it.

Return

Bytecode to get the output of a comparator.

Parameters
  • context: The context needed to generate the bytecode.

  • nodeIndex: The index of the comparator node to get the result for.

  • opcode: The comparator instruction.

  • fopcode: The float variant of the instruction.

  • strCmpArg: The SYS_STRCMP argument to use to compare strings.

  • notAfter: Do we invert the answer at the end?

This function generates the bytecode for a generic comparator (e.g. Equal).

BCode d_generate_call(BuildContext *context, size_t nodeIndex)

Given a node that calls a function or subroutine, generate the bytecode to call it.

Return

Bytecode to call the function or subroutine.

Parameters
  • context: The context needed to generate the bytecode.

  • nodeIndex: The index of the node to generate the bytecode for.

This function generates the bytecode to call a function or subroutine. It abides to the calling convention and pushes the arguments in the correct order.

BCode d_push_argument(BuildContext *context, NodeSocket socket)

Given an output socket that is a function/subroutine argument, generate bytecode to push the value of the argument to the top of the stack.

Return

Bytecode to push the argument.

Parameters
  • context: The context needed to generate the bytecode.

  • socket: The output socket representing the function argument.

If you are in a function or subroutine, and you want an argument on the top of the stack, this is the function for that.

BCode d_generate_return(BuildContext *context, size_t returnNodeIndex)

Given a Return node, generate the bytecode to return from the function/subroutine with the return values.

Return

Bytecode to return from the function/subroutine.

Parameters
  • context: The context needed to generate the bytecode.

  • returnNodeIndex: The Return node to return with.

This function generates the bytecode to return from the function or subroutine. It abides to the calling convention and pushes the return values in reverse order.

BCode d_generate_nonexecution_node(BuildContext *context, size_t nodeIndex)

Given a non-execution node, generate the bytecode to get the output.

Return

Bytecode to run the nonexecution node’s function.

Parameters
  • context: The context needed to generate the bytecode.

  • nodeIndex: The index of the non-execution node.

This function generates the bytecode for a generic non-execution node.

BCode d_generate_execution_node(BuildContext *context, size_t nodeIndex, bool retAtEnd)

Given an execution node, generate the bytecode to get the output.

Return

Bytecode to run the execution node’s subroutine.

Parameters
  • context: The context needed to generate the bytecode.

  • nodeIndex: The execution node.

  • retAtEnd: Should the bytecode return at the end?

This function generates the bytecode for a generic execution node.

BCode d_generate_start(BuildContext *context, size_t startNodeIndex)

Given a Start node, generate the bytecode for the sequence starting from this node.

Return

The bytecode generated for the Start function.

Parameters
  • context: The context needed to generate the bytecode.

  • startNodeIndex: The Start node index.

This function generates the bytecode for the main Start function.

BCode d_generate_function(BuildContext *context, SheetFunction func)

Given a function, generate the bytecode for it.

Return

The bytecode generated for the function.

Parameters
  • context: The context needed to generate the bytecode.

  • func: The function to generate the bytecode for.

This function generates the bytecode for a function or subroutine defined in the sheet.

Conclusion

In order to do all of the above, all you need is the function:

void d_codegen_compile(Sheet *sheet, bool debug)

Given that Semantic Analysis has taken place, generate the bytecode for a given sheet.

Parameters
  • sheet: The sheet to generate the bytecode for.

  • debug: If true, we include debug information as well.

Which takes a sheet, generates the bytecode, and inserts it into the sheet once it’s done.

Output

By the end of compilation, if everything succeeds, then you should have a Sheet with the contents of an object file. An object file is made up of multiple sections that contain different data.

The functionality to save, load and dump Decision object files can be found in dasm.h.

Object Sections

  • .text: A list of instructions for The Virtual Machine to execute.

  • .main: An integer which says which instruction in .text should be the one to execute first if this sheet is where we start executing. It essentially points to the compiled Start.

  • .data: An allocated section of memory with items like variables and string literals contained inside.

  • .lmeta: Essentially a list of ListMeta, which contains data on what and where something is.

  • .link: A table of what instructions point to which index in the .lmeta section.

  • .func: Essentially a list of SheetFunction.

  • .var: Essentially a list of SheetVariable.

  • .incl: A list of paths to sheets that this sheet includes.