Writing a simple Compiler on my own - Generating Code for Assignments (part 1)
[Custom Thumbnail]
All the Code of the series can be found at the Github repository:
https://github.com/drifter1/compiler
Introduction
Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will generate code for variable and pointer assignment statements.Requirements:
Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:- What Compiler Design is (mainly the steps)
- For which exact Language the Compiler is build for (Tokens and Grammar)
- How to use Flex and Bison
- How to implement a lexer and parser for the language using those tools
- What the Symbol Table is and how we implement it
- How we combine Flex and Bison together
- How we can pass information from the Lexer to the Parser
- How we define operator priorities, precedencies and associativity
- What Semantic Analysis is (Attributes, SDT etc.)
- How we do the so called "Scope Resolution"
- How we declare types and check the type inter-compatibility for different cases that include function parameters and assignments
- How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
- Intermediate Code Representations, including AST's
- How we implement an AST (structure and management)
- Action Rules for AST nodes and more
- The target system's architecture and its assembly language (MIPS)
- Register Allocation & Assignment and how we implement it
- How we generate Assembly code for various cases
Difficulty:
Talking about the series in general this series can be rated:- Intermediate to Advanced
- Medium
Actual Tutorial Content
Assignment Statements
Assignment Statements can be summarized by the following diagram:As you can see variables (and arrays) are assigned to expressions. Constant values are a special type of expression that have to be generated differently. In generic expressions we basically just have to move the register in which the result (assign_val) is stored to the register in which the assign-entry is stored in. For constants we use "LI" or "LI.D (GP or FP) to directly assign that value to the corresponding assignment-register. Of course we will also have to manage type-conversion between integers and doubles.
In the case of arrays we don't store the index right now, and so we will manage those in the next part to not make things complicated today!
Talking about pointers, we either update the pointing location or the value the pointer is pointing to. We don't store the "*" before the pointer ID to know if the address or the value is updated (which is something that we will take care of in the next part), but we can still check if the right-part is a reference or not, to update the pointer's location.
generate_assignment() Function
The function that will be used for assignment generation is based on a switch-case of the symbol table entry's "st_type" variable.We have the following cases:
switch (node->entry->st_type){
case INT_TYPE:
case CHAR_TYPE:
...
break;
case REAL_TYPE:
...
break;
case POINTER_TYPE:
...
break;
case ARRAY_TYPE:
...
break;
default:
fprintf(stderr, "Error in case selection!\n");
exit(1);
}
Variable Assignment
In simple variable-assignments, we basically have:- Integer-to-Integer: MOVE instruction
- Constant-to-Integer: LI instruction
- Double-to-Integer: MOVE after conversion using CVT.W.D and MFC1
- Double-to-Double: MOV.D instruction
- Constant-to-Double: LI.D instruction
- Integer-to-Double: MOV.D after conversion using MTC1 and CVT.D.W
In Code this looks like this:
switch (node->entry->st_type){
case INT_TYPE:
case CHAR_TYPE:
if(expression_data_type(node->assign_val) == INT_TYPE){
/* constant */
if(node->assign_val->type == CONST_NODE){
temp_const = (AST_Node_Const *) node->assign_val;
fprintf(fp, "LI %s, %d\n",
GetRegisterName(node->entry->g_index, 0), temp_const->val);
}
/* variable */
else{
fprintf(fp, "MOVE %s, %s\n",
GetRegisterName(node->entry->g_index, 0),
GetRegisterName(getGraphIndex(node->assign_val), 0));
}
}
else{
if(node->assign_val->type == CONST_NODE){
temp_const = (AST_Node_Const *) node->assign_val;
fprintf(fp, "LI %s, %d\n",
GetRegisterName(node->entry->g_index, 0), temp_const->val);
}
/* variable */
else{
fprintf(fp, "CVT.W.D %s, %s\n",
GetRegisterName(getGraphIndex(node->assign_val), 1),
GetRegisterName(getGraphIndex(node->assign_val), 1));
fprintf(fp, "MFC1 %s, %s\n",
GetRegisterName(getGraphIndex(node->assign_val), 0),
GetRegisterName(getGraphIndex(node->assign_val), 1));
fprintf(fp, "MOVE %s, %s\n",
GetRegisterName(node->entry->g_index, 0),
GetRegisterName(getGraphIndex(node->assign_val), 0));
}
}
break;
case REAL_TYPE:
if(expression_data_type(node->assign_val) == REAL_TYPE){
/* constant */
if(node->assign_val->type == CONST_NODE){
temp_const = (AST_Node_Const *) node->assign_val;
fprintf(fp, "LI.D %s, %.2f\n",
GetRegisterName(node->entry->g_index, 1), temp_const->val);
}
/* variable*/
else{
fprintf(fp, "MOV.D %s, %s\n",
GetRegisterName(node->entry->g_index, 1),
GetRegisterName(getGraphIndex(node->assign_val), 1));
}
}
else{
/* constant */
if(node->assign_val->type == CONST_NODE){
temp_const = (AST_Node_Const *) node->assign_val;
fprintf(fp, "LI.D %s, %d.0\n",
GetRegisterName(node->entry->g_index, 1), temp_const->val);
}
/* variable*/
else{
fprintf(fp, "MTC1 %s, %s\n",
GetRegisterName(getGraphIndex(node->assign_val), 0),
GetRegisterName(getGraphIndex(node->assign_val), 1));
fprintf(fp, "CVT.D.W %s, %s\n",
GetRegisterName(getGraphIndex(node->assign_val), 1),
GetRegisterName(getGraphIndex(node->assign_val), 1));
fprintf(fp, "MOVE %s, %s\n",
GetRegisterName(node->entry->g_index, 1),
GetRegisterName(getGraphIndex(node->assign_val), 1));
}
}
break;
...
Pointer Assignment
For now, in the case of pointers we can check if the "assign_val" is a reference which tells us that we are updating the pointer's pointing address. Something like "p = p + 1" can't be managed right now, as the value "p+1" would be stored in the address pointer 'p' points to. Either way, right now we can write the following Code to check for References:case POINTER_TYPE:
if(node->assign_val->type == REF_NODE){
AST_Node_Ref *temp_ref = (AST_Node_Ref *) node->assign_val;
/* not a reference */
if(temp_ref->ref == 0){
/* integer */
if(expression_data_type(node->assign_val) == INT_TYPE){
fprintf(fp, "SW %s, 0(%s)\n",
GetRegisterName(getGraphIndex(node->assign_val), 0),
GetRegisterName(node->entry->g_index, 0));
}
/* floating point */
else{
fprintf(fp, "S.D %s, 0(%s)\n",
GetRegisterName(getGraphIndex(node->assign_val), 1),
GetRegisterName(node->entry->g_index, 0));
}
}
/* reference */
else{
fprintf(fp, "SW %s, %s($0)\n",
GetRegisterName(getGraphIndex(node->assign_val), 0),
node->entry->st_name);
fprintf(fp, "MOVE %s, %s\n",
GetRegisterName(node->entry->g_index, 0),
GetRegisterName(getGraphIndex(node->assign_val), 0));
}
}
else{
/* integer */
if(expression_data_type(node->assign_val) == INT_TYPE){
fprintf(fp, "SW %s, 0(%s)\n",
GetRegisterName(getGraphIndex(node->assign_val), 0),
GetRegisterName(node->entry->g_index, 0));
}
/* floating point */
else{
fprintf(fp, "S.D %s, 0(%s)\n",
GetRegisterName(getGraphIndex(node->assign_val), 1),
GetRegisterName(node->entry->g_index, 0));
}
}
break;
...
For the case of Array Assignments I will for now just print out "Array Assignment".
Compiler Output
Lastly, let's run our compiler for the three examples that we use through-out the series...Simple Example 1
Running for example.c we get the following output file:.data
# variables
i: .word 0
val: .double 0.000000
res: .double 0.000000
# messages
.text
main:
LI.D $f2, 2.50
LI.D $f28, 1.0
ADD.D $f6, $f2 $f28
MOV.D $f4, $f6
By also looking at the console output:
we can see that "res" gets the value "val + 1" or "2.5 + 1" or 3.5 as it should!
Simple Example 2
Running for example.c we get the following output file:.data
# variables
i: .word 0
val: .double 2.500000
res: .space 80
# messages
.text
main:
LI $s0, 0
LW $s0, i($0)
BGE $s0, 10, Temp_Label
L.D $f2, val
LW $s0, i($0)
LI.D $f6, 0.0
Array Assignment
LW $s2, res($0)
MOV.D $f2, $f4
LW $s2, res($0)
ADDI $s0, $s0, 1
Here we can see that an array assignment is missing!
Full Example
Running for full_example.c we get the following output file:.data
# variables
c: .byte 'c'
i: .word 0
p: .word 0
val: .double 2.500000
res: .double 0.500000, 1.500000, 2.500000, 3.500000, 4.500000, 5.500000
# messages
msg1: .asciiz "\n"
msg2: .asciiz "\n"
msg3: .asciiz "iteration: 3\n"
msg4: .asciiz " "
msg5: .asciiz "\n"
.text
main:
LA $s3, res($0)
SW $s3, p($0)
MOVE $s4, $s3
LI $s0, 0
LW $s0, i($0)
BGE $s0, 10, Temp_Label
LW $s0, i($0)
BGT $s0, 5, Temp_Label
J Temp_Label
LW $s0, i($0)
BNE $s0, 5, Temp_Label
LW $s0, i($0)
MULI $s5, $s0, 2
MOVE $s0, $s5
LI $s6, 0
MTC1 $s6, $f12
CVT.D.W $f12, $f12
MOVE $f4, $f12
L.D $f4, val
LW $s0, i($0)
LI.D $f14, 0.0
S.D $f14, 0($s4)
LW $s3, res($0)
J Temp_Label
L.D $f4, val
LW $s0, i($0)
LI.D $f16, 0.0
S.D $f16, 0($s4)
LW $s3, res($0)
MOV.D $f4, $f6
LW $s3, res($0)
LW $s4, p($0)
ADDI $t1, $s4, 1
SW $t1, 0($s4)
LW $s0, i($0)
BNE $s0, 2, Temp_Label
L.D $f4, val
LI.D $f28, 4.50
C.EQ.D $f4 $f28
BC1F Temp_Label
AND $s0, $s0, $s0
ADDI $s0, $s0, 1
LW $s0, i($0)
BGE $s0, 12, Temp_Label
LW $s0, i($0)
LW $s1, c($0)
ADDI $s0, $s0, 1
The only problem that we have right now is in the "bold" code. As you can see we update the pointer's location storing "p+1" in $t1, but that value gets then stored in the address that p is pointing to, not updating p itself as it should! The rest is OK.
RESOURCES
References:
No references, just using code that I implemented in my previous articles.Images:
All of the images are custom-made!Previous parts of the series
General Knowledge and Lexical Analysis
- Introduction
- A simple C Language
- Lexical Analysis using Flex
- Symbol Table (basic structure)
- Using Symbol Table in the Lexer
Syntax Analysis
- Syntax Analysis Theory
- Bison basics
- Creating a grammar for our Language
- Combine Flex and Bison
- Passing information from Lexer to Parser
- Finishing Off The Grammar/Parser: [part 1] [part 2]
Semantic Analysis (1)
- Semantic Analysis Theory
- Semantics Examples
- Scope Resolution using the Symbol Table
- Type Declaration and Checking
- Function Semantics: [part 1] [part 2]
Intermediate Code Generation (AST)
- Abstract Syntax Tree Principle
- Abstract Syntax Tree Structure
- Abstract Syntax Tree Management
- Action Rules for Declarations and Initializations
- Action Rules for Expressions
- Action Rules for Assignments and Simple Statements
- Action Rules for If-Else Statements
- Action Rules for Loop Statements and some Fixes
- Action Rules for Function Declarations: [part 1][part 2]
- Action Rules for Function Calls
Semantic Analysis (2)
- Datatype attribute for Expressions
- Type Checking for Assignments
- Revisit Queue and Parameter Checking: [part 1][part 2][part 3][part 4]
- Revisit Queue and Assignment Checking : [part 1][part 2][part 3]
Machine Code Generation
- Machine Code Generation Principles
- MIPS Instruction Set
- Simple Examples in MIPS Assembly
- full_example.c in MIPS Assembly: [part 1][part 2]
- Generating Code for Declarations and Initializations
- Generating Code for Array Initializations and String Messages
- Register Allocation & Assignment Theory
- Implementing Register Allocation: [part 1][part 2][part 3][part 4]
- Generating Code for Expressions: [part 1][part 2][part 3]
- Generating Code for Simple Statements
Final words | Next up on the project
And this is actually it for today's post! In the next article we will generate the code for array assignments.Next up on this series, in general, are:
- Machine Code generation (MIPS Assembly)
- Various Optimizations in the Compiler's Code
Which are all topics for which we will need more than one article to complete them. Also, note that we might also get into AST Code Optimizations later on, or that we could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!
GitHub Account:
https://github.com/drifter1Keep on drifting! ;)
Thank you for your contribution @drifter1.
We have been analyzing your tutorial and we suggest the following points:
Again an excellent job in developing the tutorial.
Nice work on the explanations of your code, although adding a bit more comments to the code can be helpful as well.
Thanks for following our suggestions!
Looking forward to your upcoming tutorials.
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Chat with us on Discord.
[utopian-moderator]
Thank you for your review, @portugalcoin! Keep up the good work!
Congratulations @drifter1! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :
You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP
Hey, @drifter1!
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Hi @drifter1!
Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.496 which ranks you at #6929 across all Steem accounts.
Your rank has dropped 3 places in the last three days (old rank 6926).
In our last Algorithmic Curation Round, consisting of 198 contributions, your post is ranked at #134.
Evaluation of your UA score:
Feel free to join our @steem-ua Discord server
#utopian-io has been a gift. We would like to keep its curation-efforts alive here on 'marlians.com'. We want to curate and reward 'contributions to open-source projects' with MARLIANS on the the marlians.com tribe, a SCOT-enabled steem condenser. Contributions can include suggestions, graphics, bug-finds, code etc. You can simply add in #marlians to your #utopian-io posts and it will appear on https://www.marlians.com/created/utopian enabling you to earn some MARLIANS along with steem/sbd. You can also post directly to steem via 'marlians.com'. We have some overseers who curate and who can render you help too. You can find them enlisted on https://www.marlians.com/created/utopian