Lexical Analysis¶
Lexical Analysis is performed in dlex.c
and dlex.h
.
The aim of Lexical Analysis is to change the way the source code is represented in memory, such that the later compilation stages have an easier job understanding what they are reading. Lexical Analysis essentially abstracts the source code (by ignoring comments and whitespace) and categorises it such that we can identify what a particular section of the source code represents in one comparison (Is it an integer? Is it the name of something? Is it a plane?).
Structures¶
LexType
An
enum
representing the type of data a token represents. The values range from very basic (TK_COMMA
) to representing a large amount of characters (TK_INTEGERLITERAL
).LexData
A
union
of the most basic data types, which are strings (char*
), integers (dint
), floats (dfloat
), and booleans (bool
). This union is used for a token to store important information it needs to save for later, for example, the name of a node, or the value of a literal.
Note
dint
and dfloat
are defined in dcfg.h
and represent different
sizes depending on whether the pre-processor macro DECISION_32
is
defined.
LexToken
A
struct
that containsLexType
andLexData
elements.LexStream
A list of
LexToken
.
Functions¶
Most of the functions declared in dlex.h
are to help identify characters
or to help group up some characters into say, a string literal.
These functions are ultimately used by:
-
LexStream
d_lex_create_stream
(const char *source, const char *filePath) Create a stream of lexical tokens from some source text that you can extract tokens from.
NOTE: The source code needs to have a newline (
\n
) as it’s last non-NULL character!- Return
A new malloc’d
LexStream
. If there was an error,numTokens = 0
andtokenArray = NULL
.- Parameters
source
: The source text. Can be text read from a source file.filePath
: In case we error, say what the file path was.
to create a LexStream
from some source code source
. It is essentially
a giant for loop that iterates over the characters of the source string and
builds up a LexToken
each iteration to add to the LexStream
.
filePath
is used in producing compilation errors, if any should occur.
Warning
For safety reasons, the source
string NEEDS a newline character (\n
)
as the last non-null character for the function to work without producing
any errors.
Since LexStream
is a dynamic list, and this is C, once you are done with
the list, you will need to free it from memory with this function, because
hell on Earth actually exists:
-
void
d_lex_free_stream
(LexStream stream) Frees the token array from a stream. The stream should not be used after this function is called on it.
- Parameters
stream
: The stream to free from memory.
Note
d_lex_free_stream
frees the list of tokens, but not anything that may
have been allocated inside the tokens themselves, i.e. string names and
literals inside of LexData
. These will need to be freed later on.
If you want to dump the contents of a LexStream
into stdout
, you can
use the function:
-
void
d_lex_dump_stream
(LexStream stream) Print the contents of a
LexStream
.- Parameters
stream
: The stream to print to the screen.
to do so. The output of this function for a section of source code that prints “Hello, world!” to the screen may look like this:
i type data
...
17 0 Start (00000127F23739A0)
18 11
19 12
20 6 1
21 15
22 0 Print (00000127F2376700)
23 17
24 12
25 6 1
26 13
27 8 Hello, world! (00000127F2376720)
28 20
29 15
...
Note
This table gets printed whenever you compile source code with a
VERBOSE_LEVEL
of 4 or higher.
The first column states the index of the LexToken
in the LexStream
,
the second column states the LexType
value (you can look up what each value
corresponds to which type in dlex.h
), and the third column states the
value in LexData
(for specific types). Strings in the third columns are
also given alongside their pointer values, which can help with debugging.