Assignment 3: Attack Lab (due on Sat, Oct 26, 2024 at 11:59pm)
Contents
Introduction
This assignment asks you to run buffer overflow attacks using two
strategies: (1) loading your binary code on the stack and starting its
execution by overwriting the return address, or (2) a return-oriented
attack, where return addresses are used to jump to one or more
“gadgets” (short sequences of instructions ending with ret
).
Through this assignment:
- You will get a better understanding of how to write programs that are more secure (i.e., using explicit checks on buffer sizes, or through features provided by compilers and operating systems).
- You will gain a deeper understanding of the stack and parameter-passing mechanisms of x86-64 machine code.
- You will learn debugging tools such as
gdb
andobjdump
even better!
Note: In this lab, you will gain firsthand experience with methods used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature of these security weaknesses so that you can avoid them when you write system code. We do not condone the use of any other form of attack to gain unauthorized access to any system resources.
You will want to study Sections 3.10.3 and 3.10.4 of the CS:APP3e book as reference material for this lab.
Instructions
A new repository will be created for you on GitHub, including the following files:
ctarget
: a program vulnerable to code injection attacks;rtarget
: a program vulnerable to return-oriented programming attacks;grade
: a script to check your current grade;hex2raw
: a program to convert text files with hex sequences into their raw binary values (e.g., convert the text48 65 6c 6c 6f
to the binary sequence encoding the ASCII stringHello
).
Both ctarget
and rtarget
read a string from stdin
and store it
inside the buffer array buf
. They do so using the vulnerable
getbuf
function:
unsigned getbuf() {
char buf[BUFFER_SIZE];
Gets(buf);
return 1;
}
The function Gets
is similar to the standard library function
gets
: it reads bytes from stdin
until it finds \n
or EOF
and
stores them inside the input array buf
, followed by a null
terminator \0
.
If the string is short, nothing interesting happens:
$ ./ctarget
Cookie: 0x1a7dd803
Type string: Keep it short!
No exploit. Getbuf returned 0x1
When the string typed by the user (or sourced from a text file with
ctarget < attack.raw
) is longer than the space allocated on the
stack by the compiler, Gets
will overwrite the return address of
getbuf
. Most likely, this will cause a segmentation fault:
$ ./ctarget
Cookie: 0x1a7dd803
Type string: This is not a very interesting string, but it has the property ...
Ouch!: You caused a segmentation fault!
Better luck next time
(Note that the magic cookie shown will differ from yours.)
Your goal is to craft attack strings that trigger the execution of
functions target_f1
/target_f2
/target_f3
inside ctarget
and
inside rtarget
, by “properly” overwriting return addresses.
If you enter the correct solution, the target program will save it in
a text file named sol1.txt
for level 1, sol2.txt
for level 2, and
so on. You must commit these files and push them to your
repository:
$ git add sol1.txt
$ git commit -m "Solved level 1"
$ git push
You can solve multiple levels and commit/push these text files together, at the end. The push time of your solution files will be used to count late days.
Your exploit strings will typically contain byte values that do not
correspond to the ASCII values for printing characters. The program
hex2raw
will enable you to generate these raw strings.
hex2raw
expects two-digit hex values separated by one or more white spaces. So if you want to create a byte with a hex value of 0, you need to write it as00
.- To create the 4-byte word
0xdeadbeef
(anint
) you should passef be ad de
tohex2raw
(note the reversal required for little-endian byte ordering). - Your exploit string must not contain byte value
0x0a
at any intermediate position, since this is the ASCII code for newline\n
. IfGets
encounters this byte, it will assume you intended to terminate the string.
Evaluation
You must complete the assignment using the class VM. Your virtual machine must be connected to the internet, as the program will connect to our server when you complete an attack.
Every attempt you make will be logged by the automated grading
server. As in the Bomb Lab, run ./grade
to view your
current progress. Unlike the previous project, there is no penalty
for making mistakes in this lab. Attempts to break or overload the
server, however, are not allowed, and will be considered cheating.
Your solutions may not use attacks to circumvent the validation code
in the programs. Specifically, any address you incorporate into an
attack string for use by a ret
instruction should be to one of the
following destinations:
- The addresses for functions
target_f1
,target_f2
, ortarget_f3
. - The address of your injected code.
- The address of one of your gadgets from the gadget farm.
You may only construct gadgets from file rtarget
with addresses
ranging between those for functions start_farm
and end_farm
.
There are 6 attacks to complete:
target_f1
inctarget
(10 points).target_f2
inctarget
(20 points).target_f3
inctarget
(30 points).target_f1
inrtarget
(5 points).target_f2
inrtarget
(15 points).target_f3
inrtarget
(20 points).
The first three involve code-injection (CI) attacks on ctarget
,
while the last three involve return-oriented-programming (ROP) attacks
on rtarget
.
Logistics
This is an individual project. All handins are electronic. Clarifications and corrections will be posted on the course Piazza page.
Remember that:
- You are not allowed to search for help online!
- You are not allowed to ask other students for help, show them your solution, or discuss its specifics.
- Reconsideration requests must be made within one week of our release of grades for the assignment.
Be aware that you may be asked to explain your solution to a member of our course staff.
Handout Instructions
Similarly to the previous assignment, we we will create a private GitHub repository for this assignment and share it with you. Be sure to clone the GitHub repository inside the class VM.
Hand-In Instructions
You must commit and push your solution files sol1.txt
through
sol6.txt
to GitHub. Assignment collection will be automatic: after
the assignment deadline, our grading system will fetch the most recent
commit on the master
branch of your repository.
Be sure to run ./grade
and verify that we recorded your solutions.
You can also check your solution files before submitting them with
cat sol1.txt | ./hex2raw | ./ctarget
and so on.
If you want to use late days, make a push after the deadline: we will use the push date (not the commit date) to determine your late days.
Attack Instructions: Code Injection
For the first three phases, your exploit strings will attack
ctarget
. This program is set up in a way that the stack positions
will be consistent from one run to the next and so that data on the
stack can be treated as executable code. These features make the
program vulnerable to attacks where the exploit strings contain the
byte encodings of executable code.
Level 1: target_f1
in ctarget
(10 points)
In the first attack, you will not inject new code. Instead, your
exploit string will redirect the program to execute an existing
procedure. Function getbuf
is called within ctarget
by the
function test
having the following C code:
void test() {
int val = getbuf();
printf("No exploit. Getbuf returned 0x%x\n", val);
fail();
}
When getbuf
executes its return
statement, the program ordinarily
resumes execution within function test
(with a call to fail
). We
want to change this behavior! Within the file ctarget
, there is code
for a function target_f1
having the following C representation:
void target_f1() {
level = 1;
printf("SUCCESS: You called target_f1()\n");
validate();
}
Your task is to get ctarget
to execute the code for target_f1
when
getbuf
executes its return
statement, rather than returning to
test
. Note that your exploit string may also corrupt parts of the
stack not directly related to this stage, but this will not cause a
problem, since validate()
causes the program to exit directly.
Some advice:
- All the information you need to devise your exploit string for this
level can be determined by examining a disassembled version of
ctarget
. Useobjdump -d
to get this dissembled version (layout asm
insidegdb
also works). - The idea is to position a byte representation of the starting
address for
target_f1
so that theret
instruction at the end of the code forgetbuf
will transfer control totarget_f1
. - Be careful about byte ordering (Intel CPUs are little-endian).
- Use
gdb
to step the program through the last few instructions ofgetbuf
to make sure it is doing the right thing. - The placement of
buf
within the stack frame forgetbuf
depends on the value of compile-time constantBUFFER_SIZE
, as well the allocation strategy used by GCC. You will need to examine the disassembled code to determine its position.
Level 2: target_f2
in ctarget
(20 points)
Level 2 involves injecting a small amount of code as part of your
exploit string (see the section Generating Binary Instructions on
how to generate the code to inject). Within the file ctarget
there
is code for a function target_f2
having the following C representation:
void target_f2(unsigned val) {
level = 2;
if (val == TARGET_ID) {
printf("SUCCESS: You called target_f2(0x%.8x)\n", val);
validate();
} else {
printf("Misfire: You called target_f2(0x%.8x)\n", val);
fail();
}
}
Your task is to get ctarget
to execute the code for target_f2
rather than returning to test
. In this case, however, you must make
it appear to target_f2
as if you have passed your magic cookie
number as the first argument.
Some advice:
- You will want to position a byte representation of the address of
your injected code in such a way that the
ret
instruction at the end of the code forgetbuf
will transfer control to it. - Recall that the first argument to a function is passed in register
%rdi
. - Your injected code should set this register to your magic cookie
number, and then use a
ret
instruction to transfer control to the first instruction intarget_f2
. - Do not attempt to use
jmp
orcall
instructions in your exploit code. The encodings of destination addresses for these instructions are difficult to formulate. Useret
instructions for all transfers of control. - See the discussion at the end of this page on how to use tools to generate the byte-level representations of instructions.
Level 3: target_f3
in ctarget
(30 points)
Level 3 also involves a code injection attack, but passing a string as
argument. Within the file ctarget
there is code for functions
hexmatch
and target_f3
having the following C representations:
int hexmatch(unsigned val, char *sval) {
char cbuf[110];
/* Make position of check string unpredictable */
char *s = cbuf + random() % 100;
sprintf(s, "%.8x", val);
return strncmp(sval, s, 9) == 0;
}
void target_f3(char *sval) {
level = 3;
if (hexmatch(TARGET_ID, sval)) {
printf("SUCCESS: You called target_f3(\"%s\")\n", sval);
validate();
} else {
printf("Misfire: You called target_f3(\"%s\")\n", sval);
fail();
}
}
Your task is to get ctarget
to execute the code for target_f3
rather than returning to test
. You must make it appear to
target_f3
as if you have passed a string representation of your
magic value as its first argument.
Some advice:
- You will need to include a string representation of your cookie in
your exploit string. The string should consist of the eight
hexadecimal digits (ordered from most to least significant)
without a leading
0x
and lowercase (e.g., if your cookie value is0x1A7DD803
in hexadecimal, the string should be “1a7dd803
”). - Recall that a string is represented in C as a sequence of bytes
followed by a byte with value
0
. Typeman ascii
on any Linux machine to see the byte representations of the characters you need. - Your injected code should set register
%rdi
to the address of this string representation of your magic number. - When functions
hexmatch
andstrncmp
are called, they push data onto the stack, overwriting portions of memory that held the buffer used bygetbuf
. As a result, you will need to be careful about the placement of the string representation of your magic cookie.
Attack Instructions: Return-Oriented Programming
Performing code-injection attacks on program rtarget
is much more
difficult than it is for ctarget
, because it uses two techniques to
thwart such attacks:
- It uses randomization so that the stack positions differ from one run to another. This makes it impossible to determine where your injected code will be located.
- It marks the section of memory holding the stack as nonexecutable, so even if you could set the program counter to the start of your injected code, the program would fail with a segmentation fault.
Fortunately, clever people have devised strategies for getting useful
things done in a program by executing existing code, rather than
injecting new code. The most general form of this is referred to as
return-oriented programming (ROP). The strategy of ROP is to
identify byte sequences within an existing program that consist of one
or more instructions followed by the instruction ret
. Such a segment
is called a gadget. The following figure illustrates how the
stack can be set up to execute a sequence of $n$ gadgets.
- The stack contains a sequence of gadget addresses.
- Each gadget consists of a series of instruction bytes, with the
final one being
0xc3
(encoding theret
instruction). - When the program executes a
ret
instruction starting with this configuration, it will initiate a chain of gadget executions, with theret
instruction at the end of each gadget causing the program to jump to the beginning of the next.
A gadget can make use of code corresponding to assembly-language
statements generated by the compiler, especially ones at the ends of
functions. In practice, there may be some useful gadgets of this form,
but not enough to implement many important operations. For example, it
is highly unlikely that a compiled function would have popq %rdi
as
its last instruction before ret
. Fortunately, with a byte-oriented
instruction set such as x86-64, a gadget can often be found by
extracting patterns from other parts of the instruction byte
sequence.
For example, one version of rtarget
contains code generated for the
following C function:
void setval_210(unsigned *p) {
*p = 3347663060U;
}
The chances of this function being useful for attacking a system seem pretty slim. But, the disassembled machine code for this function shows an interesting byte sequence:
0000000000400f15 <setval_210>:
400f15: c7 07 d4 48 89 c7 movl $0xc78948d4,(%rdi)
400f1b: c3 retq
The byte sequence 48 89 c7
(at the end of the binary encoding of
movl $0xc78948d4,(%rdi)
) encodes the instruction movq %rax,%rdi
.
This sequence is followed by the byte value c3
, which encodes the
ret
instruction. The function starts at address 0x400f15
, and the
sequence starts on the fourth byte of the function. Thus, this code
contains a gadget, having a starting address of 0x400f18
, that will
copy the 64-bit value in register %rax
to register %rdi
.
Your code for rtarget
contains a number of functions similar to the
setval_210
function shown above in a region we refer to as the
gadget farm. Your job will be to identify useful gadgets in the gadget
farm and use these to perform attacks similar to those you did in
Levels 2 and 3.
Important: The gadget farm is demarcated by functions start_farm
and end_farm
in your copy of rtarget
. Do not attempt to construct
gadgets from other portions of the program code.
Level 4: target_f1
in rtarget
(5 points)
For Level 4, you will repeat an attack similar to Level 1: you only
need to overwrite the return address to move control to target_f1
inside rtarget
.
Level 5: target_f2
in rtarget
(15 points)
For Level 5, you will repeat the attack of Level 2 to target_f2
, but
in the program rtarget
using gadgets from your gadget
farm.
You can construct your solution using gadgets consisting of the
following instruction types, and using only the first eight x86-64
registers (%rax
through %rdi
).
movq
: The codes for these are shown below.popq
: The codes for these are shown below.ret
: This instruction is encoded by the single byte0xc3
.nop
: This instruction (pronounced “no op,” which is short for “no operation”) is encoded by the single byte0x90
. Its only effect is to cause the program counter to be incremented by 1.
Some advice:
- All the gadgets you need can be found in the region of the code for
rtarget
demarcated by the functionsstart_farm
andmid_farm
. - You can complete this attack with just two gadgets.
Level 6: target_f3
in rtarget
(20 points)
Before starting Level 6, pause to consider what you have
accomplished so far! In Levels 2 and 3, you caused a program to
execute machine code of your own design. If ctarget
had been a
network server, you could have injected your own code into a distant
machine. In Level 5, you circumvented two of the main protections used
by modern systems to thwart buffer overflow attacks. Although you did not
inject your own code, you were able to inject a type of program that
operates by stitching together sequences of existing code.
Level 6 requires you to do an ROP attack on rtarget
to invoke
target_f3
with a pointer to a string representation of your magic
cookie number.
That may not seem significantly more difficult than using an ROP
attack to invoke target_f2
, except that we have made it more difficult
through address space randomization.
- To solve Level 6, you can use gadgets in the region of the code in
rtarget
demarcated by functionsstart_farm
andend_farm
. In addition to the gadgets used in Level 5, this expanded farm includes encodings ofmovl
instructions shown below. - The byte sequences in this part of the farm also contain 2-byte instructions that serve as functional nops, i.e., they do not change any register or memory values. These instructions, shown below, operate on the low-order bytes of some of the registers but do not change their values.
Some advice:
- You’ll want to review the effect of
movl
on the upper 4 bytes of a register (page 183 of the textbook). - The official solution requires eight gadgets (not all of which are unique).
- Remember address space randomization means you probably can’t hard-code a pointer.
- There are some gadgets that might be useful to you as is (i.e., no hidden instructions… just call that function).
Using hex2raw
The program hex2raw
takes as input a hex-formatted string. In this
format, each byte value is represented by two hex digits. For example,
the string “012345
” could be entered in hex format as “30 31 32 33
34 35 00
.” (Recall that the ASCII code for decimal digit x
is
0x3x
, and that the end of a string is indicated by a null byte.)
The hex characters you pass to hex2raw
should be separated by
whitespace (blanks or newlines). We recommend separating different
parts of your attack string with newlines while you’re working on it.
hex2raw
supports C-style block comments, so you can mark off
sections of your attack string. For example:
48 c7 c1 f0 11 40 00 /* mov $0x40011f0,%rcx */
Be sure to leave space around both the starting and ending comment
delimiters (/*
and */
), so that the comments are ignored.
If you generate a hex-formatted attack string in the file
attack.txt
, you can apply the raw string to ctarget
or rtarget
in several different ways:
- You can set up a series of pipes to pass the string through
hex2raw
:$ cat attack.txt | ./hex2raw | ./ctarget
- You can store the raw string in a file and use I/O redirection:
$ ./hex2raw < attack.txt > attack.raw $ ./ctarget < attack.raw
This approach can also be used when running from within GDB:
$ gdb ctarget (gdb) run < attack.raw
Generating Binary Instructions
For code-injection attacks, you need to save binary instructions on the stack. But how can you figure out the binary encoding of your attack instructions?
Using gcc
as an assembler and objdump
as a disassembler makes it
convenient to generate the byte codes for instruction sequences. For
example, suppose you write a file example.s
containing the following
assembly code:
pushq $0xabcdef # Push value onto stack
addq $17,%rax # Add 17 to %rax
movl %eax,%edx # Copy lower 32 bits to %edx
The code can contain a mixture of instructions and data. Anything to the right of a “#
” character is a
comment. You can now assemble and disassemble this file:
$ gcc -c example.s
$ objdump -d example.o > example.d
The generated file example.d
contains the following:
$ objdump -d example.o
example.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <.text>:
0: 68 ef cd ab 00 pushq $0xabcdef
5: 48 83 c0 11 add $0x11,%rax
9: 89 c2 mov %eax,%edx
The lines at the bottom show the machine code generated from the
assembly language instructions. Each line has a hexadecimal number on
the left indicating the instruction’s starting address (starting with
0), while the hex digits after the “:
” character indicate the byte
codes for the instruction. Thus, we can see that the instruction push
$0xABCDEF
has hex-formatted byte code 68 ef cd ab 00
.
From this file, you can get the byte sequence for the code: 68 ef cd
ab 00 48 83 c0 11 89 c2
.
This string can then be passed through hex2raw
to generate an input
string for the target programs. Alternatively, you can edit
example.d
to omit extraneous values and to contain C-style comments
for readability, yielding:
68 ef cd ab 00 /* pushq $0xabcdef */
48 83 c0 11 /* add $0x11,%rax */
89 c2 /* mov %eax,%edx */
This is also a valid input you can pass through hex2raw
before
sending to one of the target programs.
Acknowledgements. This lab was developed by the authors of the course textbook and their staff. It has been customized for use by this course.