Low Level Hacking: String reversing

So I’ve been toying with this example for the past week off and on and as frustrating as it was, I learned a lot.

One of my tiny/side projects I had on my notes was to write a string reverser both in C and in assembly. Sounds completely stupid and pointless, but I did it anyway.

The C version was pretty straight forward:

#include <stdio.h>

#define BUFF_SIZE 6

int main(void)
{

    int i = 0;
    char buff[BUFF_SIZE];

   fgets (buff, BUFF_SIZE, stdin);


    for(i=0; i<BUFF_SIZE; i++)
    {
        printf("%c", buff[BUFF_SIZE-i-1]);

    }
    printf("\n");
    return 0;
}

 


I wrote up a few prototypes on paper prior but wanted to see how accurate I was when compared to the assembly generated from GCC (gcc -S srev.c).

       	.section       	__TEXT,__text,regular,pure_instructions
       	.macosx_version_min 10, 12
       	.globl 	_main
       	.align 	4, 0x90
_main:                                  ## @main
       	.cfi_startproc
## BB#0:
       	pushq  	%rbp
Ltmp0:
       	.cfi_def_cfa_offset 16
Ltmp1:
       	.cfi_offset %rbp, -16
       	movq   	%rsp, %rbp
Ltmp2:
       	.cfi_def_cfa_register %rbp
       	subq   	$32, %rsp
       	movl   	$6, %esi
       	movq   	___stdinp@GOTPCREL(%rip), %rax
       	leaq   	-14(%rbp), %rdi
       	movl   	$0, -4(%rbp)
       	movl   	$0, -8(%rbp)
       	movq   	(%rax), %rdx
       	callq  	_fgets
       	movl   	$0, -8(%rbp)
       	movq   	%rax, -24(%rbp)         ## 8-byte Spill
LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
       	cmpl   	$6, -8(%rbp)
       	jge    	LBB0_4
## BB#2:                                ##   in Loop: Header=BB0_1 Depth=1
       	leaq   	L_.str(%rip), %rdi
       	movl   	$6, %eax
       	subl   	-8(%rbp), %eax
       	subl   	$1, %eax
       	movslq 	%eax, %rcx
       	movsbl 	-14(%rbp,%rcx), %esi
       	movb   	$0, %al
       	callq  	_printf
       	movl   	%eax, -28(%rbp)         ## 4-byte Spill
## BB#3:                                ##   in Loop: Header=BB0_1 Depth=1
       	movl   	-8(%rbp), %eax
       	addl   	$1, %eax
       	movl   	%eax, -8(%rbp)
       	jmp    	LBB0_1
LBB0_4:
       	leaq   	L_.str.1(%rip), %rdi
       	movb   	$0, %al
       	callq  	_printf
       	xorl   	%ecx, %ecx
       	movl   	%eax, -32(%rbp)         ## 4-byte Spill
       	movl   	%ecx, %eax
       	addq   	$32, %rsp
       	popq   	%rbp
       	retq
       	.cfi_endproc

       	.section       	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
       	.asciz 	"%c"

L_.str.1:                               ## @.str.1
       	.asciz 	"\n"


.subsections_via_symbols

I noticed that in this example, it was using the stack. My semi-final version on paper was just manipulating the address of the string to get individual characters (That’s how it works in C, right?).

.section __DATA,__data

msg:
    .ascii "Hello"

.section __TEXT,__text

.globl _main

_main:

setup:
    movq $0, %rcx
    movq msg@GOTPCREL(%rip), %rsi
    addq $10, %rsi

print:
    cmpq $10, %rcx
    je end
    movq $0x2000004, %rax
    movq $1, %rdi
    pushq %rcx
    syscall
    popq %rcx
    incq %rcx
    subq %rcx, %rsi
    jmp print

end:
    movq $0x2000001, %rax
    movq $0, %rbx
    syscall

Once I got it assembled and working, I noticed that nothing was happening, or more precisely, nothing was being printed on the screen. I decided to compile the actual file (gcc -g srev.s -o srev) and take it through GDB to see what was going on.

gdb srev
Temporary breakpoint 1, setup () at movsb2.s:11
11     	    xorq %rbx, %rbx
(gdb) watch $rsi
Watchpoint 2: $rsi
(gdb) step
12     	    movq $0, %rbx
(gdb) step
13     	    movq msg@GOTPCREL(%rip), %rsi
(gdb) step

Watchpoint 2: $rsi

Old value = 140737488350136
New value = 6293656
setup () at movsb2.s:14
14     	    addq $4, %rsi
(gdb) x /s $rsi
0x600898:      	"Hello"
(gdb) step

Watchpoint 2: $rsi

Old value = 6293656
New value = 6293660
print () at movsb2.s:17
17     	    cmpq $4, %rbx
(gdb) x /s $rsi
0x60089c:      	"o"

Hmmm, this looks like it’s working as I expected, but it’s not.. let’s dive further.

(gdb) step
18     	    je end
(gdb) step
19     	    movq $4, %rax
(gdb) step
20     	    movq $4, %rdi
(gdb) step
21     	    ;pushq %rcx
(gdb) step
print () at movsb2.s:22
22     	    syscall
(gdb) step
23     	    ;popq %rcx
(gdb) step
print () at movsb2.s:24
24     	    incq %rbx
(gdb) step
25     	    decq %rsi
(gdb) step

Watchpoint 2: $rsi

Old value = 6293660
New value = 6293659
print () at movsb2.s:26
26     	    jmp print
(gdb) x /s $rsi
0x60089b:      	"lo"

Ahhh that’s the issue! It appears that when manipulating the address for the string one byte backwards, it essentially becomes the new ‘base address’ pointing to the entire string data.

With that said, that further explains why the stack is being used. When you push data to the stack, you essentially reserve a certain number of bytes to hold your data starting from the higher addresses to lower. This way, you can actually access individual bytes of data at a specific offset.

Low Level Hacking: How structs look in memory.

I wanted to verify something so I did a proof of concept example. I know in C arrays are a contiguous bytes of memory and there really is no boundaries at that level, just pure memory. I was pretty sure structs looked the same way but I wanted to test to be sure.. I also found something interesting along the way.

The examples below were compiled on MacOS Sierra 64-bit using gcc/as.

struct.c

#include <stdio.h>

int main(void)
{
	struct myStruct
	{
		int id;
		char *name;
	};

	struct myStruct test;
	test.id = 4;
	test.name = "bacon";

	return 0;

}

Then to get the assembly output, we use gcc -S struct.c

       .section        __TEXT,__text,regular,pure_instructions
        .macosx_version_min 10, 11
        .globl  _main
        .align  4, 0x90
_main:                                  ## @main
        .cfi_startproc
## BB#0:
        pushq   %rbp
Ltmp0:
        .cfi_def_cfa_offset 16
Ltmp1:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
Ltmp2:
        .cfi_def_cfa_register %rbp
        xorl    %eax, %eax
        leaq    L_.str(%rip), %rcx
        movl    $0, -4(%rbp)
        movl    $4, -24(%rbp)
        movq    %rcx, -16(%rbp)
        popq    %rbp
        retq
        .cfi_endproc

        .section        __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
        .asciz  "bacon"


.subsections_via_symbols

This confirms what I thought in the beginning (that structs in memory are just a vast amount of memory), but it made me question why we’re using the offset of -16(%rbp) for the 6-byte string and -24(%rbp) for a 4-byte signed integer with the value of 4. (Note 6-bytes because we need to include the null byte in a C string. I had to go back and correct this.. easy to forget!)

Since this is a 64-bit system, the pointer are 8-bytes and the integer is also 8-bytes, hence why the offsets on the stack are 8-bytes apart. Please note that your hardware may have a different pointer size so be sure to reference that if in question.

64-bit version

leaq    L_.str(%rip), %rcx  ; Load address of string "bacon"
movl    $0, -4(%rbp)  ;; Return Code for function, 4 bytes padded
movl    $4, -24(%rbp) ;; 8-bytes 
movq    %rcx, -16(%rbp) ;;8-bytes for the pointer which points to a string containing 6 characters (including '\0').

32-bit version

** Using gcc -m32 -S struct.c

 
 leal L_.str-L0$pb(%eax), %eax ; load address of string "bacon"
 movl $0, -4(%ebp)    ; return value , 4bytes
 movl $4, -16(%ebp)   ; test.id = 4, 4 bytes
 movl %eax, -12(%ebp) ; test.name = "bacon"

Phd. Status – On hold

So it looks like I need 3 undergraduate classes to get admitted to the PhD program. It’s kinda crazy but I wasn’t even remotely upset, in fact I was expecting a lot worse. I’m actually excited to take these courses, really really excited.  The classes are pretty easy and stuff I can already do, but it’s just not on paper as a checked box yet.

With that said, I will be hopefully taking at least one course next year starting in Feb, 2017. I also need to start working on renewing my CCIE and studying for the written again.. challenging but doable. Soon enough I’ll get my 10 years in and can be an emeritus and no longer have to worry about it.

I’ve been pondering this for a while, but I think I’m going to start progress on a book. I’m not going to reveal what it is, it’s a bit of a surprise but I will say it combines many of my talents and knowledge areas. My expectation is it to be an evergreen book so it can be used for a while and not just end up as a paper weight. I think at this point I will self publish but I’m not 100% sure yet

Lastly I wanted to thank John Sonmez at Simple Programmer for being the inspiration for me to start progress on my book!