Low Level Hacking: String reversing

So I’ve been toying with this example for the past week off and on and as frustrating as it was, I learned a lot.

One of my tiny/side projects I had on my notes was to write a string reverser both in C and in assembly. Sounds completely stupid and pointless, but I did it anyway.

The C version was pretty straight forward:

#include <stdio.h>

#define BUFF_SIZE 6

int main(void)
{

    int i = 0;
    char buff[BUFF_SIZE];

   fgets (buff, BUFF_SIZE, stdin);


    for(i=0; i<BUFF_SIZE; i++)
    {
        printf("%c", buff[BUFF_SIZE-i-1]);

    }
    printf("\n");
    return 0;
}

 


I wrote up a few prototypes on paper prior but wanted to see how accurate I was when compared to the assembly generated from GCC (gcc -S srev.c).

       	.section       	__TEXT,__text,regular,pure_instructions
       	.macosx_version_min 10, 12
       	.globl 	_main
       	.align 	4, 0x90
_main:                                  ## @main
       	.cfi_startproc
## BB#0:
       	pushq  	%rbp
Ltmp0:
       	.cfi_def_cfa_offset 16
Ltmp1:
       	.cfi_offset %rbp, -16
       	movq   	%rsp, %rbp
Ltmp2:
       	.cfi_def_cfa_register %rbp
       	subq   	$32, %rsp
       	movl   	$6, %esi
       	movq   	___stdinp@GOTPCREL(%rip), %rax
       	leaq   	-14(%rbp), %rdi
       	movl   	$0, -4(%rbp)
       	movl   	$0, -8(%rbp)
       	movq   	(%rax), %rdx
       	callq  	_fgets
       	movl   	$0, -8(%rbp)
       	movq   	%rax, -24(%rbp)         ## 8-byte Spill
LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
       	cmpl   	$6, -8(%rbp)
       	jge    	LBB0_4
## BB#2:                                ##   in Loop: Header=BB0_1 Depth=1
       	leaq   	L_.str(%rip), %rdi
       	movl   	$6, %eax
       	subl   	-8(%rbp), %eax
       	subl   	$1, %eax
       	movslq 	%eax, %rcx
       	movsbl 	-14(%rbp,%rcx), %esi
       	movb   	$0, %al
       	callq  	_printf
       	movl   	%eax, -28(%rbp)         ## 4-byte Spill
## BB#3:                                ##   in Loop: Header=BB0_1 Depth=1
       	movl   	-8(%rbp), %eax
       	addl   	$1, %eax
       	movl   	%eax, -8(%rbp)
       	jmp    	LBB0_1
LBB0_4:
       	leaq   	L_.str.1(%rip), %rdi
       	movb   	$0, %al
       	callq  	_printf
       	xorl   	%ecx, %ecx
       	movl   	%eax, -32(%rbp)         ## 4-byte Spill
       	movl   	%ecx, %eax
       	addq   	$32, %rsp
       	popq   	%rbp
       	retq
       	.cfi_endproc

       	.section       	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
       	.asciz 	"%c"

L_.str.1:                               ## @.str.1
       	.asciz 	"\n"


.subsections_via_symbols

I noticed that in this example, it was using the stack. My semi-final version on paper was just manipulating the address of the string to get individual characters (That’s how it works in C, right?).

.section __DATA,__data

msg:
    .ascii "Hello"

.section __TEXT,__text

.globl _main

_main:

setup:
    movq $0, %rcx
    movq msg@GOTPCREL(%rip), %rsi
    addq $10, %rsi

print:
    cmpq $10, %rcx
    je end
    movq $0x2000004, %rax
    movq $1, %rdi
    pushq %rcx
    syscall
    popq %rcx
    incq %rcx
    subq %rcx, %rsi
    jmp print

end:
    movq $0x2000001, %rax
    movq $0, %rbx
    syscall

Once I got it assembled and working, I noticed that nothing was happening, or more precisely, nothing was being printed on the screen. I decided to compile the actual file (gcc -g srev.s -o srev) and take it through GDB to see what was going on.

gdb srev
Temporary breakpoint 1, setup () at movsb2.s:11
11     	    xorq %rbx, %rbx
(gdb) watch $rsi
Watchpoint 2: $rsi
(gdb) step
12     	    movq $0, %rbx
(gdb) step
13     	    movq msg@GOTPCREL(%rip), %rsi
(gdb) step

Watchpoint 2: $rsi

Old value = 140737488350136
New value = 6293656
setup () at movsb2.s:14
14     	    addq $4, %rsi
(gdb) x /s $rsi
0x600898:      	"Hello"
(gdb) step

Watchpoint 2: $rsi

Old value = 6293656
New value = 6293660
print () at movsb2.s:17
17     	    cmpq $4, %rbx
(gdb) x /s $rsi
0x60089c:      	"o"

Hmmm, this looks like it’s working as I expected, but it’s not.. let’s dive further.

(gdb) step
18     	    je end
(gdb) step
19     	    movq $4, %rax
(gdb) step
20     	    movq $4, %rdi
(gdb) step
21     	    ;pushq %rcx
(gdb) step
print () at movsb2.s:22
22     	    syscall
(gdb) step
23     	    ;popq %rcx
(gdb) step
print () at movsb2.s:24
24     	    incq %rbx
(gdb) step
25     	    decq %rsi
(gdb) step

Watchpoint 2: $rsi

Old value = 6293660
New value = 6293659
print () at movsb2.s:26
26     	    jmp print
(gdb) x /s $rsi
0x60089b:      	"lo"

Ahhh that’s the issue! It appears that when manipulating the address for the string one byte backwards, it essentially becomes the new ‘base address’ pointing to the entire string data.

With that said, that further explains why the stack is being used. When you push data to the stack, you essentially reserve a certain number of bytes to hold your data starting from the higher addresses to lower. This way, you can actually access individual bytes of data at a specific offset.

Leave a Reply

Your email address will not be published. Required fields are marked *