.PLT and .GOT – the key to code sharing and dynamic libraries

This text was found here: http://www.technovelty.org/linux/pltgot.html

The shared library is an integral part of a modern system, but often the mechanisms behind the implementation are less well understood. There are, of course, many guides to this sort of thing. Hopefully this adds another perspective that resonates with someone.

Let’s start at the beginning — – relocations are entries in binaries that are left to be filled in later — at link time by the toolchain linker or at runtime by the dynamic linker. A relocation in a binary is a descriptor which essentially says “determine the value of X, and put that value into the binary at offset Y” — each relocation has a specific type, defined in the ABI documentation, which describes exactly how “determine the value of” is actually determined.

Here’s the simplest example:

$ cat a.c
extern int foo;

int function(void) {
    return foo;
}

$ gcc -c a.c 
$ readelf --relocs ./a.o  

Relocation section '.rel.text' at offset 0x2dc contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000004  00000801 R_386_32          00000000   foo

Continue reading

Tagged , , ,

Understanding Linux ELF RTLD internals

This text was found here: http://s.eresi-project.org/inc/articles/elf-rtld.txt

/*
Last update Sun Dec 22 06:55:39 2002 mayhem

- Version 0.1 May 2001
- Version 0.2 .::. 2002 (WIP) : 
  - Added stuff about rtld relocation .
  - Added stuff about rtld symbol resolution .
  - Various fixes and some links added .

This draft remained unreleased for one year, most of it is based on the 
glibc-2.2.3 implementation, information about the subject has been
disclosed on bugtraq and phrack in beg 2002 :

http://online.securityfocus.com/archive/1/274283/2002-05-29/2002-06-04/2


http://www.phrack.org/phrack/59/p59-0x08.txt

However, it still contains some kewl info, I'll try to keep it updated, 
hope this will help . I am also adding/clearing/correcting stuffs (and
giving credits) on demand, so dont hesitate to send comments, etc .
 
/mM [mayhem at devhell dot org]
*/

		Understanding Linux ELF RTLD internals
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Most of the paper has been developed in a security perspective, your
comments are always welcomed .

Actually there's many ELF documentation at this time, most of them
are virii coding or backdooring related . To be honest, I never found
any documentation on the dynamic linking sources, and thats why I wrote
this one . Sometimes it looks more like an internal ld.so reference or
a comments review on the ELF dynamic linking implementation in ld-linux.so .

It's not that unuseful since the dynamic linking is one of the worse
documented part of the Linux operating system . I also decided to write
a (tiny) chapter on ELF kernel handling code, because it is
really necessary to know some kernel level stuffs (like the stack 
initialisation) to understand the whole interpreting . 

You can find the last glibc sources on the GNU's FTP server :

ftp://ftp.gnu.org/pub/gnu/glibc/

If you dont know anything in ELF, you should read the reference before :

http://x86.ddj.com/ftp/manuals/tools/elf.pdf

Want to know more ? Go on !


     O] Prologue
		    A) Kernel handling code 
		    B) Introducting glibc macros 
     1] Dynamic linker implementation
		    A) Sources graphics
		    B) The link_map structure explained
		    C) Relocating the interpretor
		    D) Runtime GOT relocation
		    E) Symbol resolution
     2] FAQ, thanks and references


TODO	:
		    X) Stack information gathering 
		    X) SHT_DYNAMIC information gathering
		    X) PHT interpreting 
		    X) Loading shared libraries 
		    X) Shared libraries relocation 


 Continue reading 
Tagged , ,

An Emulator Writer’s HOWTO for Static Binary Translation

This is a very interesting article that I found at: http://www.gtoal.com/sbt/. This is a practical article showing to to craft a simple static binary translator and emulator.

There is a lot of Computer Science literature on binary translation, both of the sexy dynamic variety and the slightly duller (from the CS point of view) static variety. However most of what you’ll find when you research the subject is rather dry and somewhat academic. I don’t know of any accessible primers on static translation that someone from the emulation world can pick up and use in a practical project.
So… the aim of this HOWTO document is to give you a very practical method that you can adapt to your own area of expertise, which should pretty much guarantee that if you have already written or worked on a reasonable emulator, you will be able to write a translator which will take a specific program and generate a binary from it that runs on a different architecture, and does so faster than any emulator that you’re used to using.

And that’s why we’re doing this: you have a favorite old video game that you want to run on some system and the emulator for that system works but it just isn’t fast enough. Or perhaps you want to port an emulator which works OK on a 300MHz Pentium to a 16MHz ARM in your PDA. A straight port runs your favourite game at 2FPS and you don’t think the hardware is likely to get 25 times faster in the near future! Play your cards right and you may get that factor of 25 out of a static binary translator.

This document tries to explain things simply – perhaps too simply at times, and there are a lot of examples which differ from the previous example in small details. This makes the document rather long, but that’s deliberate; I don’t want to skip too many stages and risk having anyone lose track of the process. Apologies in advance to those of you who think I’m taking it too slowly or have made this document too long.

Continue reading

The C++ ‘const’ Declaration: Why & How

This text was found here: http://duramecho.com/ComputerInformation/WhyHowCppConst.html

The ‘const’ system is one of the really messy features of C++.

It is simple in concept: variables declared with ‘const’ added become constants and cannot be altered by the program. However it is also used to bodge in a substitute for one of the missing features of C++ and there it gets horridly complicated and sometimes frustratingly restrictive. The following attempts to explain how ‘const’ is used and why it exists.

Simple Use of ‘const’

The simplest use is to declare a named constant. This was available in the ancestor of C++, C.

To do this, one declares a constant as if it was a variable but add ‘const’ before it. One has to initialise it immediately in the constructor because, of course, one cannot set the value later as that would be altering it. For example,

const int Constant1=96;

will create an integer constant, unimaginatively called ‘Constant1’, with the value 96.

Such constants are useful for parameters which are used in the program but are do not need to be changed after the program is compiled. It has an advantage for programmers over the C preprocessor ‘#define’ command in that it is understood & used by the compiler itself, not just substituted into the program text by the preprocessor before reaching the main compiler, so error messages are much more helpful.

It also works with pointers but one has to be careful where ‘const’ is put as that determines whether the pointer or what it points to is constant. For example,

const int * Constant2

declares that Constant2 is a variable pointer to a constant integer and

int const * Constant2

is an alternative syntax which does the same, whereas

int * const Constant3

declares that Constant3 is constant pointer to a variable integer and

int const * const Constant4

declares that Constant4 is constant pointer to a constant integer. Basically ‘const’ applies to whatever is on its immediate left (other than if there is nothing there in which case it applies to whatever is its immediate right).

Continue reading

Tagged ,

Linkers and Loaders

This is an excelent (!!!) article describing in general terms how the process of linking (static and dynamic) and loading elf programs on linux works. This is a very valuable article.

The original is found here: http://www.linuxjournal.com/article/6463?page=0,0

Discussing how compilers, links and loaders work and the benefits of shared libraries.
Linking is the process of combining various pieces of code and data together to form a single executable that can be loaded in memory. Linking can be done at compile time, at load time (by loaders) and also at run time (by application programs). The process of linking dates back to late 1940s, when it was done manually. Now, we have linkers that support complex features, such as dynamically linked shared libraries. This article is a succinct discussion of all aspects of linking, ranging from relocation and symbol resolution to supporting position-independent shared libraries. To keep things simple and understandable, I target all my discussions to ELF (executable and linking format) executables on the x86 architecture (Linux) and use the GNU compiler (GCC) and linker (ld). However, the basic concepts of linking remain the same, regardless of the operating system, processor architecture or object file format being used.
Tagged , , , , , ,

Yet another oprofile tutorial

This text was found here: http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html

Recently it came as a surprise to me that many people don’t know how to use oprofile efficiently when working on performance optimizations. I’m not going to duplicate the oprofile manual here in details, but at least will try to explain some basic usage.

A bit of theory

Oprofile does its magic by using statistical sampling. The processor gets interrupted at regular intervals (the interrupts happen after a certain amount of time has elapsed, or some hardware performance counter accumulated a certain amount of events) and oprofile driver identifies which code had control at that moment. The part of code which was ‘lucky’ to be interrupted by oprofile, gets an oprofile sample attributed to it. The parts of code which take a lot of execution time are naturally more likely to accumulate many oprofile samples. In fact, the amount of collected oprofile samples for some function tends to be directly proportional to the execution time taken by this function. This all is somewhat similar to Monte Carlo method.

The collection of samples done by oprofile for each individual function is a Poisson process. Standard deviation forPoisson distribution is the square root of the number of samples. So the more samples got collected, the lower is the relative error. The following diagram shows the confidence intervals for normal distribution (because Poisson distribution is approximately normal for the large number of samples):

Standard_deviation_diagram.svg from wikipedia, created by Petter Strandmark and licensed under CC BY 2.5

Using the 3-sigma rule, we can be fairly confident that the actual time spent in each function (measured in oprofile samples) is within 3*sqrt(N) interval for each function. Where N is the number of samples reported by oprofile for that function.

Continue reading

Tagged , , , ,

IA32 Machine Language

This text was found here: http://brokenthorn.com/Resources/OSDevX86.html

Introduction

This chapter covers IA32 machine language programming. The information provided here is for information purposes only and is not needed for the development of a basic operating system or executive software. Understanding the instruction format for the IA32 (and IA64) instructions can help debugging improperly assembled instructions, v86 monitors that are required for supporting v8086 mode, emulating instructions (which is required for emulating certain FPU instructions or when developing assemblers, emulators, virtual machines, and some other types of software), and when developing certain system software like debuggers and compilers.

This chapter is also for testing a new editor being used to write the new chapters that should help improve formatting and resolve spelling errors. If this test is successful, all of the new and earlier chapters will be updated to reflect the new format. Please send any feedback if you encounter any errors.

Machine Language Overview

Machine language, also known as machine code, native code, and byte code, is the set of raw instructions and data that can be executed by a central processing unit (CPU). It allows a CPU to interpret a certain set of byte sequences as an “instruction” to perform a task. These tasks are very small, such as copying small amounts of data or arithmetic. The act of building a byte sequence that represents a CPU instruction is known as coding. The definition of coding has evolved as programming languages evolved. Originally the term referred to the actual coding of the byte sequence for an instruction; today it applies to many forms of programming in second, third, and fourth generation programming languages. Computer programs, also known as software, is the collection of machine code and data that performs a complicated task, such as word processing or playing Halo®. Machine language is often interpreted by popular media as a “series of 1′s and 0′s”. This is an accurate description—to an extent.

Continue reading

How debugger works

This text was found here http://www.alexonlinux.com/how-debugger-works

Introduction

In this article, I’d like to tell you how real debugger works. What happens under the hood and why it happens. We’ll even write our own small debugger and see it in action.

I will talk about Linux, although same principles apply to other operating systems. Also, we’ll talk about x86 architecture. This is because it is the most common architecture today. On the other hand, even if you’re working with other architecture, you will find this article useful because, again, same principles work everywhere.

Kernel support

Actual debugging requires operating system kernel support and here’s why. Think about it. We’re living in a world where one process reading memory belonging to another process is a serious security vulnerability. Yet, when debugging a program, we would like to access a memory that is part of debugged process’s (debuggie) memory space, from debugger process. It is a bit of a problem, isn’t it? We could, of course, try somehow to use same memory space for both debugger and debuggie, but then what if debuggie itself creates processes. This really complicates things.

Debugger support has to be part of the operating system kernel. Kernel able to read and write memory that belongs to each and every process in the system. Furthermore, as long as process is not running, kernel can see value of its registers and debugger have to be able to know values of the debuggie registers. Otherwise it won’t be able to tell you where the debuggie has stopped (when we pressed CTRL-C in gdb for instance).

As we spoke about where debugger support starts we already mentioned several of the features that we need in order to have debugging support in operating system. We don’t want just any process to be able to debug other processes. Someone has to monitor debuggers and debuggies. Hence the debugger has to tell the kernel that it is going to debug certain process and kernel has to either permit or deny this request. Therefore, we need an ability to tell the kernel that certain process is a debugger and it is about to debug other process. Also we need an ability to query and set values from debuggie’s memory space. And we need an ability to query and set values of the debuggie’s registers, when it stops.

And operating system lets us to do all this. Each operating system does it in it’s manner of course. Linux provides single system call named ptrace() (defined in sys/ptrace.h), which allows to do all these operations and much more.

ptrace()

ptrace() accepts four arguments. First is one of the values from enum __ptrace_request that defined in sys/ptrace.h. This argument specifies what operation we would like to do, whether it is reading debuggie registers or altering values in its memory. Second argument specifies pid of the debuggie process. It’s not very obvious, but single process can debug several other processes. Thus we have to tell exactly what process we’re referring. Last two arguments are optional arguments for the call.

Starting to debug

One of the first things debuggers do to start debugging certain process is attaching to it or running it. There is a ptrace() operation for each one of these cases.

First called PTRACE_TRACEME, tells the kernel that calling process wants its parent to debug itself. I.e. me calling ptrace( PTRACE_TRACEME ) means I want my dad to debug me. This comes handy when you want debugger process to spawn the debuggie. In this case you do fork() creating a new process, then ptrace( PTRACE_TRACEME ) and then you call exec() or execve().

Second operation called PTRACE_ATTACH. It tells the kernel that calling process should become debugging parent of the process being called. Debugging parent means debugger and a parent process.

Debugger-debuggie synchronization

Alright. Now we told operating system that we are going to debug certain process. Operating system made it our child process. Good. This is a great time for us to have the debuggie stopped and us doing preparations before we actually start to debug. We may want to, for instance, analyze executable that we run and place a breakpoints before we actually start debugging. So, how do we stop the debuggie and let debugger do its thing?

Operating system does that for us using signals. Actually, operating system notifies us, the debugger, about all kinds of events that occur in debuggie and it does all that with signals. This includes the “debuggie is ready to shoot” signal. In particular, if we attach to existing process it receives SIGSTOP and we receive SIGCHLD once it actually stops. If we spawn a new process and it did ptrace( PTRACE_TRACEME ) it will receive SIGTRAP signal once it attempts to exec() or execve(). We will be notified with SIGCHLD about this, of course.

Continue reading

Tagged , , , , ,

Linux Assembler Tutorial – by Robin Miyagi

========================================================================
			LINUX ASSEMBLER TUTORIAL

				   by

			      Robin Miyagi
			      
				   @

http://www.geocities.com/SiliconValley/Ridge/2544/

========================================================================
	   
start@: Thu Feb 03 02:14:37 UTC 2000

update: Fri Jul 30 23:52:23 UTC 2000

update: Fri Sep 15 22:39:17 UTC 2000 :

    - This tutorial  now explains  Linux assembler in  terms of  the GNU
      assembler `as'.

    - Information  about  Binutils programs  such  as  Objdump, and  ld.
      Discussion     on     Debugging     and    `gdb'     is     added.

update: Thu Jan 11 20:13:06 UTC 2001 :
      
========================================================================

* Introduction
------------------------------------------------------------------------

    When programming in  assembler for Linux (or any  other Unix variant
    for  that matter),  it  is important  to  remember that  Linux is  a
    protected mode  operating system  (on i386 machines,  Linux operates
    the  CPU in  protected mode).   This means  that ordinary  user mode
    processes are not allowed to  do certain things, such as access DMA,
    or access IO ports.  Writing  Linux kernel modules on the other hand
    (which  operate in  kernel  mode), are  allowed  to access  hardware
    directly  (Read the Assembler-HOWTO  on my  assembler page  for more
    information on this issue).  User mode processes may access hardware
    using  device files.   Device files  actually access  kernel modules
    which  access hardware directly.   This file  will be  restricted to
    user mode operation.  See my pages on kernel module programming.

    Please email me comments  and suggestions regarding this tutorial at
    penguin@dccnet.com .

* System Calls
------------------------------------------------------------------------

    In  programming  in assembler  for  DOS  you  probably made  use  of
    software interrupts,  especially the  int 0x21 functions  which were
    the DOS system calls.  In Linux, system calls are made via int 0x80.
    The sytem call number is passed via register EAX, and the parameters
    to the  system call  are passed via  the remaining  registers.  This
    discussion only  applies if there  are no more than  five parameters
    passed to  the system  call.  If there  are more than  5 parameters.
    The parameters  must be located in  memory (e.g. on  the stack), and
    EBX must contain the address of the beginning of the parameters.

    If you  would like a  list of the  system call numbers, look  at the
    contents   of   /usr/include/asm/unistd.h.    If  you   would   like
    information about a specific system  call (e.g. write ()), type `man
    2 write'  at the prompt.   Section 2 of  the linux man  pages covers
    sytem calls.

    If you  look at the contents of  /usr/include/asm/unistd.h, you will
    see the following line near the top of the file;

    #define __NR_write		4

    This indicates that  register EAX must be set to 4  in order to call
    the  write  () system  call.   Now,  if  you execute  the  following
    command;

    $ man 2 write

    you  get  the following  function  description  (under the  SYNOPSIS
    heading).

    ssize_t write(int fd, const void *buf, size_t count);

    This indicates that ebx is equal  to the file descriptor of the file
    you want  to write to, ecx  is a pointer  of the string you  want to
    write, and edx  contains the length of the string.   If there were 2
    more parameters  to this system call,  they would be  placed in esi,
    and edi respectively.

    How do I know  the file discriptor for stdout is 1.   If you look at
    your /dev directory, you will  notice that /dev/stdout is a symbolic
    link  that  points to  /proc/self/fd/1.   Therefore  stdout is  file
    descriptor 1.

    I leave looking up the _exit system call as an exercise.
    
    In linux, system calls are processed by the kernel.

* GNU Assembler
------------------------------------------------------------------------

    On  most Linux systems,  you will  usually find  the GNU  C compiler
    (gcc).  This compiler  uses an assembler called `as'  as a back-end.
    This means that the C compiler translates the C code into assembler,
    which in turn is assembled by `as' to an object file (*.o).

    `As'  uses  the AT&T  syntax.   Experienced  intel syntax  assembler
    programmers find  AT&T `really weird'.  It  is really no  more or no
    less difficult than  intel syntax.  I switched over  to `as' because
    there is  less ambiguity, works  better with the  standard GNU/Linux
    programs such as gdb  (supports the gstabs format), objdump (objdump
    dissassembles  code in  `as' syntax).   In short,  it is  a standard
    component of a GNU Linux system with programming tools installed.  I
    will explain debugging and objdump later in this tutorial.

    If  you would  like more  information about  `as' look  in  the info
    documentation under  as (e.g. type  `info as' at the  shell prompt).
    Also look  in the info  documentation on the Binutils  package (this
    package contains such programming tools as objdump, ld, etc.).

    
** GNU assembler v.s. Intel Syntax
------------------------------------------------------------------------

    Since most assembler documentation  for the i386 platform is written
    using  intel syntax,  some comparison  between the  2 formats  is in
    order.  Here is a summarized list of the differences;

      - In `as' the source comes before the the destination, opposite to
        the intel syntax.

      - The opcodes  are suffixed with  a letter indicating the  size of
        the opperands (e.g. `l' for dword, `w' for word, `b' for byte).

      - Immediate values must be prefixed with a `$', and registers must
        be prefixed with a `%'.

      - Effective      addresses     use     the      General     syntax
        DISP(BASE,INDEX,SCALE).  A concrete example would be;

	    movl mem_location(%ebx,%ecx,4), %eax

	Which is equivelent to the following in intel syntax;

	    mov eax, [eax + ecx*4 + mem_location]

    Now  for an example  illustrating the  difference (intel  version in
    comments);

        movl %eax, %ebx		# mov %ebx, %eax
	movw $0x3c4a, %ax
    
    Now for our little program;
------------------------------------------------------------------------   
	## hello-world.s

	## by Robin Miyagi
	## http://www.geocities.com/SiliconValley/Ridge/2544/

	## Compile Instructions:
	## -------------------------------------------------------------
	## as -o hello-world.o hello-world.s
	## ld -o hello-world -O0 hello-world.o

	## This  file is  a basic  demonstration of  the  GNU assembler,
	## `as'.
	
	## This program  displays a friendly string on  the screen using
	## the write () system call
########################################################################
	.section .data
hello:	
	.ascii 	"Hello, world!\n"
hello_len:
	.long 	. - hello
########################################################################
	.section .text
	.globl _start
	
_start:
	## display string using write () system call
	xorl %ebx, %ebx		# %ebx = 0
	movl $4, %eax		# write () system call
	xorl %ebx, %ebx		# %ebx = 0
	incl %ebx		# %ebx = 1, fd = stdout
	leal hello, %ecx	# %ecx ---> hello
	movl hello_len, %edx	# %edx = count
	int $0x80		# execute write () system call
	
	## terminate program via _exit () system call 
	xorl %eax, %eax		# %eax = 0
	incl %eax		# %eax = 1 system call _exit ()
	xorl %ebx, %ebx		# %ebx = 0 normal program return code
	int $0x80		# execute system call _exit ()

------------------------------------------------------------------------

    In the above program, notice the use of `#' to start comments.  `As'
    also supports the `/* C comment *' syntax.  If you use the C comment
    syntax, it works exactly the same  as for C (multiple lines, as well
    as inline commenting).  I always use the `#' comment syntax, as this
    works better with  emacs' asm-mode.  The double `##'  is allowed but
    not neccessary (this is only because of a quirk of emacs asm-mode).

    Notice the names  of the sections .text, and  .data.  these are used
    in ELF  files to tell  the linker where  the code and  data segments
    are.  There  is also the  .bss section to store  uninitialized data.
    It  is  only  these  sections  that occupy  memory  durring  program
    execution.


* Accessing Command Line Arguments and Environment Variables

    When an  ELF executable starts  running, the command  line arguments
    and environment variables are  available on the stack.  In assembler
    this means that  you may access these via the  pointer stored in ESP
    when  the program  starts execution.   See the  documentation  on my
    assembler programming page relating to the ELF binary format.

    So how  is this  data arranged on  the stack?  Quite  simple really.
    The  number of  command line  arguments (including  the name  of the
    program)  are stored as  an integer  at [esp].   Then, at  [esp+4] a
    pointer to the first command line argument (which is the name of the
    program)  is stored.   If  there were  any  additional command  line
    parameters,  their pointers  would be  stored in  [esp+8], [esp+12],
    etc.  After  all the  command line argument  pointers, comes  a NULL
    pointer.   After  the NULL  pointer  are  all  the pointers  to  the
    environment variables,  and then finally a NULL  pointer to indicate
    the end of the environment variables have been reached.

    A summary of the initial ELF stack is shown below;

    (%esp)	 argc, count of arguments (integer)
    4(%esp)	 char *argv (pointer to first command line argument)
       ...	 pointers to the rest of the command line arguments
    ?(%esp) NULL pointer
       ...	 pointers to environment variables
    ??(%esp)	 NULL pointer

    Now for our little program;
------------------------------------------------------------------------
	## stack-param.s ###############################################

	## Robin Miyagi ################################################
	## http://www.geocities.com/SiliconValley/Ridge/2544/ ##########

	## This file  shows how one  can access command  line parameters
	## via the stack at process  start up.  This behavior is defined
	## in the ELF specification.

	## Compile Instructions:
	## -------------------------------------------------------------
	## as -o stack-param.o stack-param.s
	## ld -O0 -o stack-param stack-param.o
########################################################################
	.section .data

new_line_char:
	.byte 0x0a
########################################################################
	.section .text

	.globl _start

	.align 4
_start:
	movl %esp, %ebp		# store %esp in %ebp
again:
	addl $4, %esp		# %esp ---> next parameter on stack
	movl (%esp), %eax	# move next parameter into %eax
	testl %eax, %eax	# %eax (parameter) == NULL pointer?
	jz end_again		# get out of loop if yes
	call putstring		# output parameter to stdout.
	jmp again		# repeat loop
end_again:
	xorl %eax, %eax		# %eax = 0
	incl %eax		# %eax = 1, system call _exit ()
	xorl %ebx, %ebx		# %ebx = 0, normal program exit.
	int $0x80		# execute _exit () system call

	## prints string to stdout
putstring:	.type @function
	pushl %ebp
	movl %esp, %ebp
	movl 8(%ebp), %ecx
	xorl %edx, %edx
count_chars:
	movb (%ecx,%edx,$1), %al
	testb %al, %al
	jz done_count_chars
	incl %edx
	jmp count_chars
done_count_chars:
	movl $4, %eax
	xorl %ebx, %ebx
	incl %ebx
	int $0x80
	movl $4, %eax
	leal new_line_char, %ecx
	xorl %edx, %edx
	incl %edx
	int $0x80 
	movl %ebp, %esp
	popl %ebp
	ret
		
------------------------------------------------------------------------

* The Binutils Package
------------------------------------------------------------------------

    Binutils stands  for binary utilities,  and includes a lot  of tools
    useful to programmers, especially durring debugging.

    I will now address some of these utilities.

** Objdump
------------------------------------------------------------------------

    Objdump  diplays information  about  1 or  more  object files.   For
    example, to  see information  about param-stack, type  the following
    command  at  shell  prompt   (be  sure  working  directory  contains
    param-stack);

        objdump -x param-stack | less

    Since the  information is likely to  span more than  one screen, the
    output  of objdump  is piped  to the  standard input  of  the paging
    command  `less'.   the option  `-x'  tells  objdump  to display  the
    numeric information in hexadecimal.  Here is the output of the above
    command;

        ----------------------------------------------------------------
	stack-param:     file format elf32-i386
	stack-param
	architecture: i386, flags 0x00000112:
	EXEC_P, HAS_SYMS, D_PAGED
	start address 0x08048074
	
	Program Header:
	    LOAD off    0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
		 filesz 0x000000be memsz 0x000000be flags r-x
	    LOAD off    0x000000c0 vaddr 0x080490c0 paddr 0x080490c0 align 2**12
		 filesz 0x00000001 memsz 0x00000004 flags rw-
	
	Sections:
	Idx Name          Size      VMA       LMA       File off  Algn
	  0 .text         0000004a  08048074  08048074  00000074  2**2
			  CONTENTS, ALLOC, LOAD, READONLY, CODE
	  1 .data         00000001  080490c0  080490c0  000000c0  2**2
			  CONTENTS, ALLOC, LOAD, DATA
	  2 .bss          00000000  080490c4  080490c4  000000c4  2**2
			  ALLOC
	SYMBOL TABLE:
	08048074 l    d  .text	00000000 
	080490c0 l    d  .data	00000000 
	080490c4 l    d  .bss	00000000 
	00000000 l    d  *ABS*	00000000 
	00000000 l    d  *ABS*	00000000 
	00000000 l    d  *ABS*	00000000 
	080490c0 l       .data	00000000 new_line_char
	08048076 l       .text	00000000 again
	08048087 l       .text	00000000 end_again
	0804808e l       .text	00000000 putstring
	08048096 l       .text	00000000 count_chars
	080480a0 l       .text	00000000 done_count_chars
	00000000       F *UND*	00000000 
	080480be g     O *ABS*	00000000 _etext
	08048074 g       .text	00000000 _start
	080490c1 g     O *ABS*	00000000 __bss_start
	080490c1 g     O *ABS*	00000000 _edata
	080490c4 g     O *ABS*	00000000 _end

	----------------------------------------------------------------

    Notice the  Information provided from the program  header (ELF files
    have  header  information  at  the  beginning  of  the  file  giving
    information to the kernel on how to load the file into memory etc.).

    ELF files also contain  information about the sections (contained in
    section tables).  Notice that  the .text section contains 0x4a bytes
    of information, is located 0x74  bytes into the file, and is aligned
    at a  4 byte  boundary (4  == 2 **  2), has  memory allocated  to it
    (ALLOC), is readoly, and contains  code (the segment selector cs for
    this  process  points to  this  section  (handled  by the  operating
    system)).

    Information  about   the  symbols   is  also  provided.    All  this
    information  is used  by debuggers  and other  programming  tools to
    examine binary files.

    Objdump can also be used to dissasemble binary executables.  Typeing
    the following command will  dissassemble the file to standard output
    (this does  nothing to the actual  file, as objdump  only reads from
    the file);

        objdump -d stack-param | less

    Here is the output of the above command;

        ----------------------------------------------------------------
	stack-param:     file format elf32-i386

	Disassembly of section .text:
	
	08048074 :
	 8048074:	89 e5                	movl   %esp,%ebp
	
	08048076 :
	 8048076:	83 c4 04             	addl   $0x4,%esp
	 8048079:	8b 04 24             	movl   (%esp,1),%eax
	 804807c:	85 c0                	testl  %eax,%eax
	 804807e:	74 07                	je     8048087 
	 8048080:	e8 09 00 00 00       	call   804808e 
	 8048085:	eb ef                	jmp    8048076 
	
	08048087 :
	 8048087:	31 c0                	xorl   %eax,%eax
	 8048089:	40                   	incl   %eax
	 804808a:	31 db                	xorl   %ebx,%ebx
	 804808c:	cd 80                	int    $0x80
	
	0804808e :
	 804808e:	55                   	pushl  %ebp
	 804808f:	89 e5                	movl   %esp,%ebp
	 8048091:	8b 4d 08             	movl   0x8(%ebp),%ecx
	 8048094:	31 d2                	xorl   %edx,%edx
	
	08048096 :
	 8048096:	8a 04 11             	movb   (%ecx,%edx,1),%al
	 8048099:	84 c0                	testb  %al,%al
	 804809b:	74 03                	je     80480a0 
	 804809d:	42                   	incl   %edx
	 804809e:	eb f6                	jmp    8048096 
	
	080480a0 :
	 80480a0:	b8 04 00 00 00       	movl   $0x4,%eax
	 80480a5:	31 db                	xorl   %ebx,%ebx
	 80480a7:	43                   	incl   %ebx
	 80480a8:	cd 80                	int    $0x80
	 80480aa:	b8 04 00 00 00       	movl   $0x4,%eax
	 80480af:	8d 0d c0 90 04 08    	leal   0x80490c0,%ecx
	 80480b5:	31 d2                	xorl   %edx,%edx
	 80480b7:	42                   	incl   %edx
	 80480b8:	cd 80                	int    $0x80
	 80480ba:	89 ec                	movl   %ebp,%esp
	 80480bc:	5d                   	popl   %ebp
	 80480bd:	c3                   	ret    
        ----------------------------------------------------------------

    The `-d' tells objdump to  disassemble sections that are expected to
    contain  code (usually the  .text section).   Using the  `-D' option
    will disassemble all  sections.  Objdump was able to  give the names
    of labels  in the code because  of the information  contained in the
    symbols table.

    The first column  displays the virtual memory address  for each line
    of code.  The second  column displays the machine code corresponding
    to its  respective assembler line of  code, and finally  the code in
    assembler is contained in the 3rd column.

    For more information look in the info documentation system.

** Getting the amount of memory used with size
------------------------------------------------------------------------

    If you do an `ls -l stack-param' you get the following

        -rwxrwxr-x    1 robin    robin         932 Sep 15 18:21 stack-param

    This tells you  that the file is 932 bytes  long.  However this file
    also contains header tables, section tables, symbol tables etc.  The
    amount of memory that this program will use durring run time will be
    less than this.  To find out actual memory use, type the following;

        size stack-param

    The above will result in the following output;

   	text	   data	    bss	    dec	    hex	filename
   	  74	      1	      0	     75	     4b	stack-param

    This tells you that .text  occupies 74 bytes, and .data occupies one
    byte, for a total of 75 bytes memory use.

** Getting rid of symbol information with strip
------------------------------------------------------------------------

    The strip command can be used  to get rid of the symbol information.
    With no options, this command  only strips symbols that are not used
    for debugging.  With the `--stip-all' option provided, it will strip
    all  symbol  information, including  those  used  for debugging.   I
    recommend not doing this, as  this makes the files harder to analyse
    with the standard  programming tools.  This command is  used only if
    file size is of paramount importance.

* debugging and gdb
------------------------------------------------------------------------

    Perhaps  the  most difficult  aspect  of  programming is  debugging.
    Quite  often  the  error   that  caused  the  program  to  terminate
    abnormally  is not  at the  line where  the program  terminated (the
    example later on will show this).

    Program that exits with SIG_SEGV
------------------------------------------------------------------------
	## stack-param-error.s #########################################

	## Robin Miyagi ################################################
	## http://www.geocities.com/SiliconValley/Ridge/2544/ ##########

	## This file  shows how one  can access command  line parameters
	## via the stack at process  start up.  This behavior is defined
	## in the ELF specification.

	## Compile Instructions:
	## -------------------------------------------------------------
	## as --gstabs -o stack-param-error.o stack-param-error.s
	## ld -O0 -o stack-param-error stack-param-error.o
########################################################################
	.section .data

new_line_char:
	.byte 0x0a
########################################################################
	.section .text

	.globl _start

	.align 4
_start:
	movl %esp, %ebp		# store %esp in %ebp
again:
	addl $4, %esp		# %esp ---> next parameter on stack
	leal (%esp), %eax	# move next parameter into %eax
	testl %eax, %eax	# %eax (parameter) == NULL pointer?
	jz end_again		# get out of loop if yes
	call putstring		# output parameter to stdout.
	jmp again		# repeat loop
end_again:
	xorl %eax, %eax		# %eax = 0
	incl %eax		# %eax = 1, system call _exit ()
	xorl %ebx, %ebx		# %ebx = 0, normal program exit.
	int $0x80		# execute _exit () system call

	## prints string to stdout
putstring:	.type @function
	pushl %ebp
	movl %esp, %ebp
	movl 8(%ebp), %ecx
	xorl %edx, %edx
count_chars:
	movb (%ecx,%edx,$1), %al
	testb %al, %al
	jz done_count_chars
	incl %edx
	jmp count_chars
done_count_chars:
	movl $4, %eax
	xorl %ebx, %ebx
	incl %ebx
	int $0x80
	movl $4, %eax
	leal new_line_char, %ecx
	xorl %edx, %edx
	incl %edx
	int $0x80 
	movl %ebp, %esp
	popl %ebp
	ret
------------------------------------------------------------------------

    Notice  that the  above  program is  assembled  with the  `--gstabs'
    option of  `as'.  This make  as put debugging information  in output
    file,  such as  the  original source  file,  debugging symbols  etc.
    Using  `objdump  -x stack-param-error  |  less'  will  show you  the
    inclusion of debugging symbols.

    Now to find out where our error occurred type the following command;

        gdb stack-param-error

    this will get you to the gdb prompt `(gdb)';

        (gdb) run eat my shorts
	/home/robin/programming/asm-tut/stack-param-error
	eat
	my
	shorts
	Program recieved SIGSEGV, segmentation fault
	count_chars () at stack-param-error.s:47

	47 	movb (%ecx,%edx,$1), %al
	Current language: auto; currently asm
	(gdb) q
	[~]$ _

	(gdb will output more than this, I just wanted to highlight what
	is important).

    This  tells us that  the segmentation  fault occured  at line  47 of
    param-stack-error.s.  However the problem was caused in line 29.  If
    you look  at line 29 of  stack-param.s, you will see  that this line
    reads `movl (%esp), %eax'.  This is due to the way intel i386 opcode
    lea handles  NULL pointers.  EAX was  never loaded with 0  on a null
    pointer (just some invalid pointer),  which caused line 47 to access
    an  area  of  memory  not  available  to  this  process  (hence  the
    segmentation fault).  The loop  in _start () never stopped normally,
    as the condition for breaking out  of the loop is eax being 0, which
    never happened.

    Debugging is an art that  comes with practice.  For more information
    about gdb, look  in the info pages (e.g. `info  gdb').  You can also
    type `help' at the (gdb) prompt.

    The only  reason gdb was  able to tell  you what line number  in the
    source  code the  error occured  is that  the debugging  symbols and
    source code was included in the output file (recall that we used the
    `--gstabs' option).

    --------------------------------------------------------------------
    Comments and suggestions 

========================================================================

You are free to make verbatim copies of this file, providing that this
notice is preserved.
Tagged , , , , , ,
Follow

Get every new post delivered to your Inbox.