author-pic

Amila Senadheera

Tech enthusiast

Making WinZigC programs debuggable


Published on March 24, 2024

After compiling a WinZigC program into machine code, it is possible to execute it and observe the standard output in the terminal if at least one output statement is used to print the results. However, without using output statements throughout the source program, it becomes challenging to see what's happening during program execution. This limitation highlights the importance of debuggers, which are tools available for almost every programming language.

This post is a continuation of The journey of implementing the WinZigC programming language post. For more context, feel free to read the previous post.

What does a Debugger expect to have in a compiled binary?

When you set a breakpoint in your editor/IDE, the debugger reaches that point in the machine code binary and pauses execution before proceeding to execute that line. To enable this functionality, the debugger requires certain information:

  • Type Information of Variables and Functions: The debugger requires type information for declared variables and functions. This information is necessary to display the function call stack and provide access to variables within the current scope.
  • Debugging Table: To facilitate mapping between machine instructions and source code locations, a debugging table is used. This table specifies the scope of each instruction and provides mappings to the corresponding lines and columns in the source program.

By providing this essential information, the debugger can effectively analyze program execution and provide developers with insights into the program's behavior and state.

DWARF Debugging Format

"DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments."

Emitting debugging information

In order to provide support for debug information, I referred to the Kaledidoscope Chapter 09, which offers practical guidance in this area which utilizes the DWARF format for emitting debug information.

When a program undergoes optimization with LLVM's optimized passes, it can sometimes cause the debugger to unexpectedly jump to positions in the source program file, as it may not accurately map the optimized instructions back to their original source program locations. To address this issue, I compiled the programs without the -opt flag using the WinZigC compiler. Additionally, I introduced a new flag, -dbg, to control the emission of debug information.

Since the lexer already provided source line and column locations for all the different lexemes, my task was to properly integrate this information into the AST nodes. For example, the source location of a binary/unary operation opcode token was utilized as the source location for that instruction.

Below are a few examples illustrating the debug information for different constructs:

Debug information for the WinZigC program (compile unit)

The debugger needs to know this is for a C-family language (DWLANGC):

!2 = distinct !DICompileUnit(language: DW_LANG_C, file: !3, producer: "WinZigC Compiler", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !5)

The debugger needs to locate the source program file:

!3 = !DIFile(filename: "winzig_zz.c", directory: "/<path>/<to>/<winzig_zz.c>/<directory>")

Debug information for Global Variables and Types

Consider the following global variable declaration:

var
    a, b : integer;

See how the global variable is defined in the compile unit scope given above and the debug type information is specified:

!7 = distinct !DIGlobalVariable(name: "a", linkageName: "a", scope: !2, file: !3, line: 1, type: !8, isLocal: false, isDefinition: true)
!8 = !DIBasicType(name: "integer", size: 32, encoding: DW_ATE_signed)

Debug information for Instructions

Consider the following WinZigC program instruction which is to do a subtraction and assignment operations:

n := n - m;

LLVM bit code instructions are tagged with debug information tags as follows:
%n7 = load i32, i32* %n, align 4, !dbg !28
%m8 = load i32, i32* %m, align 4, !dbg !29
%subtmp9 = sub i32 %n7, %m8, !dbg !29
store i32 %subtmp9, i32* %n, align 4, !dbg !30

Above tags corresponds to following debug information table which inturn refers to source program locations:

!28 = !DILocation(line: 17, column: 9, scope: !13)
!29 = !DILocation(line: 17, column: 13, scope: !13)
!30 = !DILocation(line: 17, column: 4, scope: !13)

Debug information for Functions and Local Variables

Consider this WinZigC function declaration called GCD_Recursive with two integer type arguments m and n:

function GCD_Recursive ( m, n : integer ) : integer;

This is how the debug information tag appears for the function declaration:

define i32 @GCD_Recursive(i32 %0, i32 %1) !dbg !32 {
    ...
}

Observe how the scope of the local variable debug information maps back to the type and scope of the function:

!32 = distinct !DISubprogram(name: "GCD_Recursive", scope: !3, file: !3, type: !14, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, retainedNodes: !33)
!33 = !{!34, !35}
!34 = !DILocalVariable(name: "m", arg: 1, scope: !32, file: !3, type: !8)
!35 = !DILocalVariable(name: "n", arg: 2, scope: !32, file: !3, type: !8)

Actually, the debug information discussed in the LLVM bitcode is incorporated into the compiled binary (with debug flags as below) according to the DWARF format.

clang -x ir winzig_zz.c.ll -o winzig_zz.c_binary -g

LLVM bitcode with complete debug information for the gcd program

; ModuleID = 'gcd'
source_filename = "gcd"

@d = internal global i32 0, !dbg !0
@a = internal global i32 0, !dbg !6
@b = internal global i32 0, !dbg !9
@0 = private unnamed_addr constant [8 x i8] c"%d %d \0A\00", align 1

declare i32 @printf(i8*, ...)

declare i32 @scanf(i8*, ...)

define i32 @GCD_Iterative(i32 %0, i32 %1) !dbg !13 {
entry:
  %GCD_Iterative = alloca i32, align 4
  store i32 0, i32* %GCD_Iterative, align 4
  %m = alloca i32, align 4
  call void @llvm.dbg.declare(metadata i32* %m, metadata !17, metadata !DIExpression()), !dbg !19
  store i32 %0, i32* %m, align 4
  %n = alloca i32, align 4
  call void @llvm.dbg.declare(metadata i32* %n, metadata !18, metadata !DIExpression()), !dbg !19
  store i32 %1, i32* %n, align 4
  br label %while_cond, !dbg !20

while_cond:                                       ; preds = %ifcont, %entry
  %m1 = load i32, i32* %m, align 4, !dbg !21
  %n2 = load i32, i32* %n, align 4, !dbg !22
  %netmp = icmp ne i32 %m1, %n2, !dbg !22
  br i1 %netmp, label %while_body, label %while_exit, !dbg !22

while_body:                                       ; preds = %while_cond
  %m3 = load i32, i32* %m, align 4, !dbg !23
  %n4 = load i32, i32* %n, align 4, !dbg !24
  %gttmp = icmp sgt i32 %m3, %n4, !dbg !24
  br i1 %gttmp, label %then, label %else, !dbg !24

then:                                             ; preds = %while_body
  %m5 = load i32, i32* %m, align 4, !dbg !25
  %n6 = load i32, i32* %n, align 4, !dbg !26
  %subtmp = sub i32 %m5, %n6, !dbg !26
  store i32 %subtmp, i32* %m, align 4, !dbg !27
  br label %ifcont, !dbg !27

else:                                             ; preds = %while_body
  %n7 = load i32, i32* %n, align 4, !dbg !28
  %m8 = load i32, i32* %m, align 4, !dbg !29
  %subtmp9 = sub i32 %n7, %m8, !dbg !29
  store i32 %subtmp9, i32* %n, align 4, !dbg !30
  br label %ifcont, !dbg !30

ifcont:                                           ; preds = %else, %then
  br label %while_cond, !dbg !30

while_exit:                                       ; preds = %while_cond
  %m10 = load i32, i32* %m, align 4, !dbg !31
  store i32 %m10, i32* %GCD_Iterative, align 4, !dbg !31
  br label %exit, !dbg !31

exit:                                             ; preds = %while_exit
  %2 = load i32, i32* %GCD_Iterative, align 4, !dbg !31
  ret i32 %2, !dbg !31
}

define i32 @GCD_Recursive(i32 %0, i32 %1) !dbg !32 {
entry:
  %GCD_Recursive = alloca i32, align 4
  store i32 0, i32* %GCD_Recursive, align 4
  %m = alloca i32, align 4
  call void @llvm.dbg.declare(metadata i32* %m, metadata !34, metadata !DIExpression()), !dbg !36
  store i32 %0, i32* %m, align 4
  %n = alloca i32, align 4
  call void @llvm.dbg.declare(metadata i32* %n, metadata !35, metadata !DIExpression()), !dbg !36
  store i32 %1, i32* %n, align 4
  %n1 = load i32, i32* %n, align 4, !dbg !37
  %eqtmp = icmp eq i32 %n1, 0, !dbg !38
  br i1 %eqtmp, label %then, label %else, !dbg !38

then:                                             ; preds = %entry
  %m2 = load i32, i32* %m, align 4, !dbg !39
  store i32 %m2, i32* %GCD_Recursive, align 4, !dbg !39
  br label %exit, !dbg !39

else:                                             ; preds = %entry
  %n3 = load i32, i32* %n, align 4, !dbg !40
  %m4 = load i32, i32* %m, align 4, !dbg !41
  %n5 = load i32, i32* %n, align 4, !dbg !42
  %modtmp = srem i32 %m4, %n5, !dbg !42
  %2 = call i32 @GCD_Recursive(i32 %n3, i32 %modtmp), !dbg !42
  store i32 %2, i32* %GCD_Recursive, align 4, !dbg !42
  br label %exit, !dbg !42

exit:                                             ; preds = %else, %then
  %3 = load i32, i32* %GCD_Recursive, align 4, !dbg !42
  ret i32 %3, !dbg !42
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #0

define i32 @main() !dbg !43 {
entry:
  store i32 1080, i32* @a, align 4, !dbg !46
  store i32 192, i32* @b, align 4, !dbg !47
  %a = load i32, i32* @a, align 4, !dbg !48
  %gttmp = icmp sgt i32 %a, 0, !dbg !49
  %b = load i32, i32* @b, align 4, !dbg !50
  %gttmp1 = icmp sgt i32 %b, 0, !dbg !51
  %andtmp = and i1 %gttmp, %gttmp1, !dbg !51
  br i1 %andtmp, label %then, label %else, !dbg !51

then:                                             ; preds = %entry
  %a2 = load i32, i32* @a, align 4, !dbg !52
  %b3 = load i32, i32* @b, align 4, !dbg !53
  %0 = call i32 @GCD_Recursive(i32 %a2, i32 %b3), !dbg !53
  %a4 = load i32, i32* @a, align 4, !dbg !54
  %b5 = load i32, i32* @b, align 4, !dbg !55
  %1 = call i32 @GCD_Iterative(i32 %a4, i32 %b5), !dbg !55
  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @0, i32 0, i32 0), i32 %0, i32 %1), !dbg !55
  br label %ifcont, !dbg !55

else:                                             ; preds = %entry
  br label %ifcont, !dbg !55

ifcont:                                           ; preds = %else, %then
  ret i32 0, !dbg !55
}

attributes #0 = { nofree nosync nounwind readnone speculatable willreturn }

!llvm.module.flags = !{!11, !12}
!llvm.dbg.cu = !{!2}

!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(name: "d", linkageName: "d", scope: !2, file: !3, line: 1, type: !8, isLocal: false, isDefinition: true)
!2 = distinct !DICompileUnit(language: DW_LANG_C, file: !3, producer: "WinZigC Compiler", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !5)
!3 = !DIFile(filename: "winzig_zz.c", directory: "/<path>/<to>/<winzig_zz.c>/<directory>")
!4 = !{}
!5 = !{!0, !6, !9}
!6 = !DIGlobalVariableExpression(var: !7, expr: !DIExpression())
!7 = distinct !DIGlobalVariable(name: "a", linkageName: "a", scope: !2, file: !3, line: 1, type: !8, isLocal: false, isDefinition: true)
!8 = !DIBasicType(name: "integer", size: 32, encoding: DW_ATE_signed)
!9 = !DIGlobalVariableExpression(var: !10, expr: !DIExpression())
!10 = distinct !DIGlobalVariable(name: "b", linkageName: "b", scope: !2, file: !3, line: 1, type: !8, isLocal: false, isDefinition: true)
!11 = !{i32 2, !"Debug Info Version", i32 3}
!12 = !{i32 2, !"Dwarf Version", i32 2}
!13 = distinct !DISubprogram(name: "GCD_Iterative", scope: !3, file: !3, type: !14, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, retainedNodes: !16)
!14 = !DISubroutineType(types: !15)
!15 = !{!8, !8, !8}
!16 = !{!17, !18}
!17 = !DILocalVariable(name: "m", arg: 1, scope: !13, file: !3, type: !8)
!18 = !DILocalVariable(name: "n", arg: 2, scope: !13, file: !3, type: !8)
!19 = !DILocation(line: 0, scope: !13)
!20 = !DILocation(line: 13, column: 6, scope: !13)
!21 = !DILocation(line: 13, column: 8, scope: !13)
!22 = !DILocation(line: 13, column: 13, scope: !13)
!23 = !DILocation(line: 14, column: 6, scope: !13)
!24 = !DILocation(line: 14, column: 10, scope: !13)
!25 = !DILocation(line: 15, column: 9, scope: !13)
!26 = !DILocation(line: 15, column: 13, scope: !13)
!27 = !DILocation(line: 15, column: 4, scope: !13)
!28 = !DILocation(line: 17, column: 9, scope: !13)
!29 = !DILocation(line: 17, column: 13, scope: !13)
!30 = !DILocation(line: 17, column: 4, scope: !13)
!31 = !DILocation(line: 18, column: 9, scope: !13)
!32 = distinct !DISubprogram(name: "GCD_Recursive", scope: !3, file: !3, type: !14, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, retainedNodes: !33)
!33 = !{!34, !35}
!34 = !DILocalVariable(name: "m", arg: 1, scope: !32, file: !3, type: !8)
!35 = !DILocalVariable(name: "n", arg: 2, scope: !32, file: !3, type: !8)
!36 = !DILocation(line: 0, scope: !32)
!37 = !DILocation(line: 23, column: 5, scope: !32)
!38 = !DILocation(line: 23, column: 9, scope: !32)
!39 = !DILocation(line: 24, column: 10, scope: !32)
!40 = !DILocation(line: 26, column: 24, scope: !32)
!41 = !DILocation(line: 26, column: 27, scope: !32)
!42 = !DILocation(line: 26, column: 33, scope: !32)
!43 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 30, type: !44, scopeLine: 30, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, retainedNodes: !4)
!44 = !DISubroutineType(types: !45)
!45 = !{!8}
!46 = !DILocation(line: 30, column: 2, scope: !43)
!47 = !DILocation(line: 31, column: 2, scope: !43)
!48 = !DILocation(line: 32, column: 5, scope: !43)
!49 = !DILocation(line: 32, column: 9, scope: !43)
!50 = !DILocation(line: 32, column: 15, scope: !43)
!51 = !DILocation(line: 32, column: 19, scope: !43)
!52 = !DILocation(line: 33, column: 24, scope: !43)
!53 = !DILocation(line: 33, column: 26, scope: !43)
!54 = !DILocation(line: 33, column: 44, scope: !43)
!55 = !DILocation(line: 33, column: 46, scope: !43)

LLDB debug launcher configuration

To let the VSCode editor to place breakepoints, you have to rename your WinZigC program with .c extension for now. For example, rename the winzig_zz file as winzig_zz.c. And the debug lauch configuration will look like below:

{
  "name": "WinZigC Program Debug",
  "type": "cppdbg",
  "request": "launch",
  "program": "${workspaceFolder}/example-programs/winzig_zz.c_binary",
  "stopAtEntry": false,
  "cwd": "${workspaceRoot}",
  "externalConsole": false,
  "MIMode": "lldb",
  "sourceFileMap": {
    ".": "${workspaceFolder}/example-programs"
  }
}

That's it for this post. I hope you've gained some valuable insights into the concepts of debugging. That's all for now. Stay tuned for more posts!

Happy Learning!

If you like it, share it!


Created by potrace 1.16, written by Peter Selinger 2001-2019 © 2024 Developer Diary.

Made withusing Gatsby, served to your browser from a home grown Raspberry Pi cluster.
contact-me@developerdiary.me