Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean #40

Merged
merged 15 commits into from
Dec 24, 2024
2 changes: 1 addition & 1 deletion benchmarks/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ void main(int argc, char* argv[]) {
endTimer(i, 0);
startTimer();

struct LexerResult result = runLexer(buff, size);
LEXER_RESULT result = runLexer(buff, size);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Memory leak detected in lexer implementation
The code review has revealed two memory leaks:

  1. result.tokens is allocated with malloc in runLexer but never freed after use
  2. For each token, a buff is allocated with malloc but never freed

Specific locations:

  • src/lexer/lexer.c: result.tokens = malloc(sizeof(TOKEN) * 1024)
  • src/lexer/lexer.c: char* buff = malloc(buffLen) inside the loop
  • benchmarks/main.c:126: The result from runLexer is used but its memory is not freed
  • tests/lexer.c: The test also doesn't free the allocated memory
🔗 Analysis chain

Confirm memory deallocation.
After using LEXER_RESULT result, confirm that any dynamically allocated fields within result are freed, if relevant. This ensures we avoid memory leaks in multiple runs.


Let me gather more information about the LEXER_RESULT structure and its memory management.


Let me try a different approach to find the structure and memory management patterns.


Let me check the lexer implementation files to understand the memory management better.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching for usage of result tokens disposal or free calls for the LEXER_RESULT structure
rg -A 5 $'free\\(result\\.tokens\\)' benchmarks/ src/ tests/

Length of output: 58


Script:

#!/bin/bash
# First, let's find the definition of LEXER_RESULT structure
ast-grep --pattern 'typedef struct $_$ LEXER_RESULT;'

# Also search for the runLexer function implementation
ast-grep --pattern 'LEXER_RESULT runLexer($$$)'

# Let's also look for any cleanup or free functions related to lexer results
rg -i "free.*lexer|lexer.*free" -A 5

Length of output: 144


Script:

#!/bin/bash
# Look for struct definitions that might be related to lexer results
rg -A 10 "struct.*[Ll]exer.*{|typedef.*[Ll]exer"

# Search for any memory management patterns in the lexer implementation
rg -A 5 "free\(|malloc\(" src/lexer

# Look for the implementation file containing runLexer
fd -e c -e h "lexer"

Length of output: 735


Script:

#!/bin/bash
# Check the content of lexer files
cat src/lexer/lexer.h
cat src/lexer/lexer.c

# Also check the test file for any cleanup patterns
cat tests/lexer.c

Length of output: 3969


free(buff);

Expand Down
7 changes: 7 additions & 0 deletions examples/assembly.qf
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
func main() {
myAssemblyFunction()
}

asmf myAssemblyFunction() {
0xB8 0x4C 0x00 0x00 0x00 0xC3
}
6 changes: 0 additions & 6 deletions examples/helloWorld.qf

This file was deleted.

12 changes: 3 additions & 9 deletions src/cli/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,6 @@
// Version
#define VERSION "0.1.0"

uint8_t code_section[6] = {
0xB8, 0x4C, 0x00, 0x00, 0x00, 0xC3
};

void showCommandEntry(char* commandName, char* description, int argumentCount, char* argumentNames[], char* argumentDescriptions[]) {
printf("\n > %s\n\n %s%sDescription%s: %s\n", commandName, STYLE_BOLD, STYLE_UNDERLINE, RESET, description);

Expand Down Expand Up @@ -117,8 +113,8 @@ int main(int argc, char* argv[]) {
buff[size] = '\0';
fclose(fptr);

struct LexerResult result = runLexer(buff, size);
struct ASTNode* root = parseNodes(result, 0, AST_ROOT);
LEXER_RESULT result = runLexer(buff, size);
AST_NODE* root = parseNodes(result, 0, AST_ROOT);
Comment on lines +116 to +117
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Potential null pointer and memory safety issues detected

The code has several safety issues that need to be addressed:

  1. runLexer allocates memory for tokens but doesn't check if the allocation succeeded:

    result.tokens = malloc(sizeof(TOKEN) * 1024);  // No NULL check
  2. parseNodes is called without validating if result.tokens is valid, which could lead to a null pointer dereference.

  3. The fixed size allocation of 1024 tokens could be insufficient for larger inputs, potentially leading to buffer overflows.

Recommendations:

  • Add a NULL check after the malloc in runLexer and return an error state if allocation fails
  • Validate result.tokens before calling parseNodes
  • Consider using dynamic allocation or at least add size validation
🔗 Analysis chain

Check potential null pointer usage.
If runLexer(buff, size) fails or returns an invalid structure, we must ensure parseNodes can handle it gracefully. Consider validating the result before proceeding with parseNodes.

You can confirm correct handling by searching for additional checks of result.tokens:


Let me gather more information about the LEXER_RESULT structure and error handling in runLexer.


Let me check how parseNodes handles the tokens array and any potential error cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching for any checks on the result structure or parseNodes usage 
# to ensure safety when runLexer fails or returns invalid data.
rg -A 5 $'parseNodes' 

Length of output: 1983


Script:

#!/bin/bash
# Check the LEXER_RESULT structure definition and runLexer implementation
ast-grep --pattern 'typedef struct $_ {
  $$$
} LEXER_RESULT;'

rg -A 5 'runLexer\(' 

Length of output: 1780


Script:

#!/bin/bash
# Check parseNodes implementation and token access
rg -A 10 'for\(.+; index <= result\.size;' 

# Check if there's any error handling in lexer.c
rg -A 5 'malloc.*TOKEN.*1024' src/lexer/lexer.c

Length of output: 186


IR_CTX* ctx = makeContext(root);

Expand All @@ -129,9 +125,7 @@ int main(int argc, char* argv[]) {

fptr = fopen(outputFile, "w");
compile(ctx, fptr);

//compilePE(fptr, code_section, sizeof(code_section));


break;
case 'v':
if(strlen(argv[1]) > 1 && strcmp(argv[1], "version") != 0) {
Expand Down
2 changes: 1 addition & 1 deletion src/compiler/compiler.c
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ IR_CTX* makeContext(AST_NODE* tree) {
/**
* Compiles the Context tree to an executable named the provided file name.
* @param ctx the IR context.
* @param char the output file name.
* @param out the output file.
*/
void compile(IR_CTX* ctx, FILE* out) {

Expand Down
26 changes: 13 additions & 13 deletions src/compiler/ir.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,20 @@
#ifndef IR_H
#define IR_H

enum IRNodeType {
typedef enum {
IR_FUNCTION,

IR_ASM_FUNCTION,

IR_VARIABLE,
IR_FUNCTION_ARGUMENT,
IR_FUNCTION_BODY_VARIABLE
};
} IR_TYPE;

typedef enum IRNodeType IR_TYPE;

struct IRNode {
/**
* An IR Node.
*/
typedef struct IR_NODE {

IR_TYPE nodeType;

Expand All @@ -35,27 +36,26 @@ struct IRNode {
int valueSize;

// Function Properties
struct IRNode** variables;
struct IR_NODE** variables;
int variableIndex;

struct Hashmap* variableMap;

AST_NODE* tree;
};

typedef struct IRNode IR_NODE;
} IR_NODE;

struct IRContext {
/**
* The overall IR context.
*/
typedef struct {
IR_NODE** nodes;
int nodeIndex;

struct Hashmap* nodeMap;

IR_NODE* mainFunc;

};

typedef struct IRContext IR_CTX;
} IR_CTX;

/**
* Creates an IR node based on the type and the name given.
Expand Down
2 changes: 0 additions & 2 deletions src/compiler/pe/format.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
* Quickfall PE Format Defintion.
*/

#include "../units.h"

#ifndef PE_FORMAT
#define PE_FORMAT

Expand Down
11 changes: 0 additions & 11 deletions src/compiler/units.h

This file was deleted.

93 changes: 0 additions & 93 deletions src/compiler/utils.c

This file was deleted.

8 changes: 4 additions & 4 deletions src/lexer/lexer.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@
/**
* Sets the token type of the currently selected token in the LexerResult with the provided token type.
*/
void pushToken(struct LexerResult* result, enum TokenType type) {
void pushToken(LEXER_RESULT* result, TOKEN_TYPE type) {
result->tokens[result->size].type = type;
result->size++;
}

struct LexerResult runLexer(char* string, int size) {
struct LexerResult result;
LEXER_RESULT runLexer(char* string, int size) {
LEXER_RESULT result;
result.size = 0;

result.tokens = malloc(sizeof(struct Token) * 1024);
result.tokens = malloc(sizeof(TOKEN) * 1024);

char c;

Expand Down
10 changes: 5 additions & 5 deletions src/lexer/lexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,22 @@
/**
* Contains the results of lexical analysis
*/
struct LexerResult {
typedef struct {
int size;
struct Token* tokens;
};
TOKEN* tokens;
} LEXER_RESULT;

/**
* Performs lexical analysis on an input string
* Returns a LexerResult containing the tokens
*/
struct LexerResult runLexer(char* input, int size);
LEXER_RESULT runLexer(char* input, int size);

/**
* Adds a token to the LexerResult
* @param result The LexerResult to add the token to
* @param type The type of the token to add
*/
void pushToken(struct LexerResult* result, enum TokenType type);
void pushToken(LEXER_RESULT* result, TOKEN_TYPE type);

#endif
10 changes: 5 additions & 5 deletions src/lexer/tokens.h
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#ifndef TOKENS_H
#define TOKENS_H

enum TokenType {
typedef enum {
ASM_FUNCTION,
FUNCTION,
RETURN,
Expand All @@ -23,15 +23,15 @@ enum TokenType {
USE,
NONE,
MATH_OP
};
} TOKEN_TYPE;

/**
* An lexer token generated by the Lexer.
*/
struct Token {
enum TokenType type;
typedef struct {
TOKEN_TYPE type;
char* value;
};
} TOKEN;
Comment on lines +31 to +34
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Memory management issues confirmed in token handling

The code review reveals several critical memory management issues:

  1. Memory leaks in token value handling:

    • malloc is used to allocate buffers for token values (char* buff = malloc(buffLen)) but these are never freed
    • The token array itself (result.tokens = malloc(sizeof(TOKEN) * 1024)) is never freed
    • No cleanup function exists for LEXER_RESULT or TOKEN structures
  2. Buffer overflow risks:

    • Fixed buffer size of 32 bytes for token values without bounds checking
    • Hard-coded token array size of 1024 without overflow protection
  3. String handling issues:

    • String tokenization doesn't properly handle escape sequences
    • No NULL checks after malloc calls
    • Potential buffer overrun in string literal parsing loop

Note: The codebase needs a proper memory management strategy with:

  • A cleanup function to free token values and arrays
  • Dynamic buffer resizing or proper size checks
  • Safe string handling with bounds checking
  • Clear ownership semantics for token values
🔗 Analysis chain

Consider adding safeguards for string value management

The value field being a raw char* raises some concerns:

  1. No indication of string ownership
  2. No size field to prevent buffer overflows
  3. No distinction between NULL and empty string

Let's check how the value field is managed in the codebase:


Let's search for more specific patterns in the implementation to understand how tokens are handled.


Let's examine the lexer implementation to understand how tokens are created and managed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for TOKEN value field allocations and deallocations
echo "Searching for value field memory management..."
rg -A 2 "token\.(value|->value)\s*=\s*" 
rg -A 2 "free\s*\(\s*token\.(value|->value)\s*\)"

Length of output: 196


Script:

#!/bin/bash
# Let's try a different approach to find TOKEN usage patterns
echo "Searching for TOKEN struct usage..."
rg -A 5 "TOKEN\s+\w+\s*=" 

echo -e "\nSearching for malloc/strdup with TOKEN..."
rg -A 3 "(malloc|strdup).*TOKEN"

echo -e "\nSearching for free with TOKEN..."
rg -B 2 -A 2 "free.*TOKEN"

echo -e "\nLooking for lexer implementation files..."
fd "lexer.*\.(c|h)$"

Length of output: 2440


Script:

#!/bin/bash
# Let's examine the lexer implementation
echo "Checking lexer.c content..."
cat src/lexer/lexer.c

echo -e "\nChecking lexer.h content..."
cat src/lexer/lexer.h

echo -e "\nLooking for token cleanup/destruction..."
rg -B 3 -A 3 "cleanup|destroy|free" src/lexer/

Length of output: 3845



#endif
2 changes: 1 addition & 1 deletion src/parser/ast.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* Creates a new AST Node.
* @param type the AST type of the node.
*/
AST_NODE* createASTNode(enum ASTNodeType type) {
AST_NODE* createASTNode(AST_NODE_TYPE type) {
AST_NODE* node = malloc(sizeof(AST_NODE));

node->valueSize = 0;
Expand Down
24 changes: 11 additions & 13 deletions src/parser/ast.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
* The header file of AST nodes in Quickfall.
*/

#ifndef AST_2_H
#define AST_2_H
#ifndef AST_H
#define AST_H

/**
* The type of AST Node(s).
*/
enum ASTNodeType {
typedef enum {

AST_ROOT, // A root of an AST tree, can either represent the first node of the tree or a function body.

Expand All @@ -33,31 +33,29 @@ enum ASTNodeType {
AST_MATH_OP_HEADER,

AST_PARAMETER // A function parameter AST Node, used in function declaration.
};
} AST_NODE_TYPE;

/**
* An AST Node. Has a tree-ish structure.
*/
struct ASTNode {
typedef struct AST_NODE {

struct ASTNode* left;
struct ASTNode* right;
struct ASTNode* next;
struct AST_NODE* left;
struct AST_NODE* right;
struct AST_NODE* next;

enum ASTNodeType type;
AST_NODE_TYPE type;

int valueSize;
char* value;
int endingIndex; // The index which the parsing ended

};

typedef struct ASTNode AST_NODE;
} AST_NODE;

/**
* Creates a new AST Node.
* @param type the AST type of the node.
*/
AST_NODE* createASTNode(enum ASTNodeType type);
AST_NODE* createASTNode(AST_NODE_TYPE type);

#endif
Loading
Loading