John Tortugo

Life, programming, math and much else ;-)

Building a better computer bug finder: Researchers improve the process of finding vulnerabilities by intentionally adding swarms of bugs to source code

Detecting bugs in computer programs is an expensive task, and there is no way of measuring their efficacy without knowing exactly how many go unnoticed. To tackle this problem, researchers have created LAVA (Large-Scale Automated Vulnerability Addition), a cost-effective technique of intentionally adding vulnerabilities to a program’s source code to test the limits of bug-finding tools and ultimately help developers improve them.

Source: Building a better computer bug finder: Researchers improve the process of finding vulnerabilities by intentionally adding swarms of bugs to source code

The Ada Programming Language

Introduction

This is the first of a series (I hope ;-) of posts about characteristics of different programming languages. The structure of the text is not consolidated yet, it may (and probably will) change in further texts. In this first post, I will write about the Ada programming language. You should not consider this text as a tutorial (e.g., I will not write about usual things like conditional or looping structures, etc), but rather as a collection of notes that I collected about the language as I tried to write a lot of codes (implement several algorithms) in it. You can find a collection of examples here: https://github.com/JohnTortugo/Babel

Overview

Ada as a language permits a great deal of freedom and, sincerely, surprised with its versatility and comprehensive list of features. Ada supports not only very structured code blocks (like Pascal) but also object-oriented programming, exception handling, and even generics. If you are interested about knowing a little better about Countess Ada Lovelace or some of the amazing things that were done using the Ada programming language take a look at this page: http://celebratingada.com/

To make long numbers easier to read, underscores are permitted inside a numeric literal. For example, “1_000_000” is legal. This is similar to the way commas are used in the United States and periods are used in Europe. Underscores aren’t allowed to be consecutive, numbers may not end in an underscore, and underscores don’t change the value of a number.

A useful Ada capability is its ability to write out literals in other bases from 2 to 16. These are called, reasonably enough, based literals. To create a based literal, write out the desired base, a “#” sign, the number in the requested base, and another “#” sign. For example, “2#1001_1000#” is a base 2 number equal to 128+16+8 = 152.

Ada guarantees that an Integer can store numbers between -32767 and 32767 (inclusive); an Integer is likely to have a wider range. In other words, an Integer type must use at least 16 bits, but the actual number of bits used will depend on the compiler and machine.

A key difference between Ada and some other languages (such as C and C++) is what happens when an evaluation cannot be completed. If a division by zero is attempted, or an expression result is too large, Ada will normally raise an exception. Exceptions can be handled, but if they aren’t, the program will halt (with some debugging output to help identify the kind and location of the problem). This means that instead of silently giving wrong answers, Ada programs normally will halt when a computation cannot be completed. This simplifies debugging.

Unlike C or C++, but like Pascal and many other languages, Integers are not considered the same as True or False. Ada insists that types be correct in operations, and there aren’t any predefined operations for mixing Boolean, Integer and Float using +, -, *, or /. Thus, if you’re using an Integer and Float together, put a function called `Float()’ around the Integer variables to cause them to be converted into floating-point values. This makes it clear when such conversions are taking place, which is sometimes important in understanding what a program is doing. Also, whenever you set a Float to a constant, the constant must be written with a period in it, or the compiler will complain.

Normally Ada will evaluate these expressions in whatever order is most efficient for the machine. If it’s important to evaluate them in a certain order and to stop evaluating them when the answer is known, there are versions of `and’ and `or’ that are called `short-circuit operations’. These operations will execute strictly left-to-right and will not execute anything if they don’t have to. C’s && and || operations work this way. The short-circuit version of `and’ is `and then’; the short-circuit version of `or’ is `or else’.

One very important difference between Ada and some other languages is that Ada considers types different even if they happen to be implemented the same way at a particular time.

Ada array indices are not required to start at zero or one. Array indices can begin (and end) with any discrete value – whatever makes the most sense for the data. This means that you can start arrays at -5 (if that makes sense), and you can use enumerated values as indices as well. Ada programs usually use a starting index of 1 if there’s no particularly natural starting point;

Ada 95 provides a number of different “string” types, each best for a certain purpose.

Type Description
String This is the basic Ada String type, and is also called a “fixed length string”. This type (String) is simply an array of Characters.
Bounded String Values of this type can vary in length up to a maximum length (which you supply).
Unbounded_String Values of this type can vary in length up to the largest value of type `Natural’ (usually that’s over 2 billion characters).

Ada also includes some types that represent strings from other languages, namely C, COBOL, and Fortran.

The main difference between a procedure and function is that a function returns a value, while a procedure does not (though a procedure can change the values of parameters sent to it). Another interesting point is that function/procedure parameters receive annotations indicating whether they will be used for reading (in) and/or writing (in out) data. Therefore a function definition may look something like:

	procedure Insert(root : in out TreeNode_ptr; key : in Integer) is
	begin
	-- Comment example...
	end Insert;

Ada programs consist of packages, functions, procedures, variables, etc. You can think of a package like a collection of related functions, procedures, etc. A package can have a specification section that can be stored in a file specific for this purpose, which must have the extension “.ads”. Think of a specification as a C/C++ header file. An “.adb” file is like an C++ “.cpp” file and contains the definition for the functions/procedures declared in the specification file.

The “main” procedure of a Ada program must be a library unit procedure with no parameters. You put most of your code in packages, in files like this_package.ads, this_package.adb, that_package.ads, that_package.adb. Then you put a main procedure (that is not inside any package) inside an “.adb” program. This procedure “with’s” (i.e., import) one or more packages and calls
things in them.

Compiling and Running Ada Programs in Linux

I will describe first the gnatmake tool, which automatically determines the set of sources needed by an Ada compilation unit (i.e., a “.adb” file) and executes the necessary (re)compilations, binding and linking. I also explains how to use each tool individually: the compiler gcc, the binder gnatbind, and linker gnatlink, to build executable programs.

“gnatmake” automatically takes care of compiling, rebinding and linking all object files related to changed source files. In other words, it determines which sources need to be compiled, compiles them, and binds and links the resulting object files producing the executable file.

The usual form of the gnatmake command is:

$ gnatmake [switches] <file_name> [<file_name>] [<switches>]

The only required argument is one “file_name”, which specifies a compilation unit. If you are using standard file extensions (.adb and .ads), then the extension may be omitted from the file_name arguments.

Using GCC:

The first step in creating an executable program is to compile the units of the program using the gcc command. In case of Ada programming this means compiling “.adb” files to object files “.o”. The basic command for compiling a file containing an Ada unit is:

$ gcc -c [switches] <file_name>

After producing the object files for all compilation units you need to bind the object of the projects. The GNAT binder, gnatbind, is used to bind compiled objects. You can invoke gnatbind using the following command:

$ gnatbind [<switches>] <mainprog>[.ali] [<switches>]

the “mainprog.ali” file is an ‘Ada Library Information’ file created in the previous step. It contains information that the binder will use to check the consistency of the program and also to create a linking plan for the object file. As usual the last step is to link all object files together, and for that you can use the gnatlink:

$ gnatlink [<switches>] <mainprog>[.ali] [<switches>]

this will produce a “mainprog” executable file.

The Type System

The primitive types are: Integer (Natural, Positve), Float, Duration, Character, String, Boolean and others. The user can create new types and subtypes (similar to classes in C/C++) as well as records (similar to structs in C/C++).

Four principles govern the type system:

Strong typing: types are incompatible with one another, so it is not possible to mix apples and oranges. There are, however, ways to convert between types.

Static typing: type checked while compiling, this allows type errors to be found earlier.

Abstraction: types represent the real world or the problem at hand; not how the computer represents the data internally. There are ways to specify exactly how a type must be represented at the bit level, but we will defer that discussion to another chapter.

Name equivalence, as opposed to structural equivalence used in most other languages. Two types are compatible if and only if they have the same name; not if they just happen to have the same size or bit representation. You can thus declare two integer types with the same ranges that are totally incompatible, or two record types with exactly the same components, but which are incompatible.

Types are incompatible with one another. However, each type can have any number of subtypes, which are compatible with their base type and may be compatible with one another. You can define a new type with the following syntax:

type T is...

type Integer_1 is range 1 .. 10;
type Integer_2 is range 1 .. 10;

Note how the new type also specify the range of values that a variable of that type can assume.

A derived type is a new, full-blown type created from an existing one. Like any other type, it is incompatible with its parent; however, it inherits the primitive operations defined for the parent type.

type Integer_2 is new Integer_1 range 2 .. 8;
A : Integer_1 := 8;
B : Integer_2 := A; -- illegal!

Ada also have something similar to pointers and also support dynamic memory allocation. To create a pointer to a variable of type T1 you need to create a new type T2 that represents a pointer to a variable of type T1. The declaration of T1 may specify the access permissions (i.e., reading, writing, etc.) of the pointers. For instance:

-- declare existence of type TreeNode
type TreeNode;

-- pointer to TreeNode
type TreeNode_ptr is access TreeNode;

-- complete definition of TreeNode
type TreeNode is
record
Value : Integer;
Left : TreeNode_ptr;
Right : TreeNode_ptr;
end record;

creates a new type named “TreeNode_ptr” that can points to objects of type “TreeNode”. To create a new object of type TreeNode and assign its address to a pointer named “ptr” you can use something like:

ptr : ptr := null;
ptr := new TreeNode'(Value => 0, Left => null, Right => null);

“new” takes a free block of memory from a storage pool of available memory (often referred to as a heap) and reserves it for use as an TreeNode variable. A reference to its location is then assigned to the variable ptr so that we then have some way of accessing it. An initial value is specified for the new node by means of named parameters to the type “constructor”.

Apart from assigning a value generated by new to ptr, you can assign the special value null to ptr to indicate that it doesn’t point to anything (a ‘null pointer’). Access variables are automatically set to null when they are declared unless new is used to initialise them. Attempting to access the value that a null pointer points to will generate a constraint error.

Having set ptr to point to a dynamically allocated TreeNode variable, you can then use ‘ptr.all’ to access the node itself. You can then select components of the node in the usual way:

ptr.all.Value := 2016; -- you can also omit the ".all." part...
if (key < ptr.all.Value) then ...

Be careful not to confuse ‘ptr’ and ‘ptr.all’; ‘ptr’ on its own is the name of the access variable, but ‘ptr.all’ is the value that ptr points to:

ptr.all := ptr2.all; -- copy one node into another
ptr := ptr2; -- set ptr to point to the same thing as ptr2

Assuming that ptr and ptr2 point to different nodes, the first assignment will copy the contents of one node into the other so that you end up with two identical nodes. In the second case ptr and ptr2 will both end up pointing to the same node, and the node that ptr pointed to before is now inaccessible unless there’s another access variable which points to it. After the first assignment, you can alter ptr.Value and it won’t affect ptr2.Value since ptr and ptr2 point to different nodes, but after the second assignment ptr.Value and ptr2.Value both refer to the same thing, so any change to ptr.Value will also be a change to ptr2.Value.

An Small Example

I decided to include a small example program to illustrate several points of the language syntax/semantics. The example is simply a package that implements a Binary Search Tree and a driver function that uses this package. I added comments in the code in places where I think the language mostly differ from C/C++. The first file below is the file one that contains the main entry point to the program: BSTDriver.

-- Ada.Text_IO: This will enable us to print texts to standard output.
-- BinSearchTree: This will include the BinSearchTree package in the scope of
-- this compilation unit.
with Ada.Text_IO, BinSearchTree;
use Ada.Text_IO, BinSearchTree;

-- This is the "main" procedure of the project. The execution
-- of the program will start here.
procedure BSTDriver is
	-- Local variables usually are declared before the "begin" key word.
	-- Node that TreeNode_ptr is a pointer to TreeNode.
	root : TreeNode_ptr := null;
begin

	-- The usual syntax of function call is the same as other imperative
	-- languages like C, Java, etc.
	Insert(root, 1);	
	Insert(root, 10);	
	Insert(root, 5);	
	Insert(root, 8);	

	Remove(root, 1);
	Remove(root, 5);
	Remove(root, 8);
	Remove(root, 10);

	Insert(root, 50);

	-- Note the use of the "then" and "end if" keywords, similar to pascal
	-- and also the comparison operator. Ada uses the ":=" for assignment
	-- and "=" for comparison.
	if Search(root, 50) = True then
		Put_Line("Found.");
	else
		Put_Line("Not found.");
	end if;

	Insert(root, 1);	
	Insert(root, 10);	
	Insert(root, 5);	
	Insert(root, 8);	

	Put("InOrder: ");
	PrintInOrder(root);
	-- This is one of the ways to print new lines in Ada. The parameter tells
	-- how many new lines should be printed.
	New_Line(1);

	Put("PreOrder: ");
	PrintPreOrder(root);
	New_Line(1);

	Put("PosOrder: ");
	PrintPosOrder(root);
	New_Line(1);

end BSTDriver;

the next file is “binsearchtree.ads”, the specification for the package BinSearchTree:

-- This is the specification file ".ads" for the BinSearchTree
-- package. Here we don't include the body of the functions, we
-- just declare their prototype.

package BinSearchTree is
	-- Declare the existence of a TreeNode type
	type TreeNode;

	-- This is how we declare a type "pointer to TreeNode"
	type TreeNode_ptr is access TreeNode;

	-- Complete definition of the TreeNode type.
	-- This is like an struct in C/C++.
	type TreeNode is
	record
		Value		: Integer;
		Left		: TreeNode_ptr;
		Right		: TreeNode_ptr;
	end record;

	-- Note that we specify how the parameters will be used inside the
	-- function: if they are input or output parameters, or both.
	-- Note also that to separe the parameters you use a semicolon instead of a comma.
	procedure Insert(root : in out TreeNode_ptr; key : in Integer);
	procedure Remove(root : in out TreeNode_ptr; key : in Integer);

	-- The return type of the function is the last past of the prototype
	-- and only functions can have return types.
	function Search(root : in TreeNode_ptr; key : in Integer) return Boolean;

	procedure PrintInOrder(root : in TreeNode_ptr);
	procedure PrintPosOrder(root : in TreeNode_ptr);
	procedure PrintPreOrder(root : in TreeNode_ptr);
end BinSearchTree;

and this last file “binsearchtree.adb” is the one containing the body (implementation) for the package BinSearchTree:

with Ada.Text_IO, Ada.Integer_Text_IO;
use Ada.Text_IO, Ada.Integer_Text_IO;

package body BinSearchTree is
	function SearchRightmost(root : in TreeNode_ptr) return Integer is
	begin
		-- The way we access the fields of the struct is the same as in 
		-- C/C++.
		if (root.Right = null) then
			return root.Value;
		else 
			return SearchRightmost(root.Right);
		end if;
	end;

	procedure Insert(root : in out TreeNode_ptr; key : in Integer) is
	begin
		if root = null then
			-- You use the keyword "New" to allocate memory in the heap.
			-- For some reason you need to include an apostrophe after
			-- the type name. A nice thing is that you can initialize the
			-- record parameters by their name. The syntax for assignment
			-- here is a little different from the ones in other places.
			root := new TreeNode'(Value => key, Left => null, Right => null);
		else
			if key < root.Value then
				Insert(root.Left, key);
			else
				Insert(root.Right, key);
			end if;
		end if;
	end Insert;

	procedure Remove(root : in out TreeNode_ptr; key : in Integer) is
		-- Local variables are declared in this region.
		rightmostValue : Integer := 0;
	begin
		-- The inequality operator (/=) is a little different of the commonly
		-- used nowadays (!=). But it is more similar to the mathematical symbol.
		if (root /= null) then
			if (root.Value = key) then
				if (root.Left = null and root.Right = null) then
					root := null;
				-- note that the word is "elsif" (like Bash) not "elseif"
				elsif (root.Left /= null) then
					root := root.Left;
				elsif (root.Right /= null) then
					root := root.Right;
				else 
					rightmostValue := SearchRightmost(root.Left);

					Remove(root.Left, rightmostValue);

					root.Value := rightMostValue;
				end if;
			elsif (key < root.Value) then
				Remove(root.Left, key);
			else
				Remove(root.Right, key);
			end if;
		end if;
	end Remove;

	function Search(root : in TreeNode_ptr; key : in Integer) return Boolean is
	begin
		if (root = null) then
			return False;
		else
			if (root.Value = key) then
				return True;
			elsif (key < root.Value) then
				return Search(root.Left, key);
			else
				return Search(root.Right, key);
			end if;
		end if;
	end Search;

	procedure PrintInOrder(root : in TreeNode_ptr) is
	begin
		if (root /= null) then
			PrintInOrder(root.Left);

			Put("(");
			Put(root.Value, 2, 10);
			Put(")");

			PrintInOrder(root.Right);
		end if;
	end PrintInOrder;

	procedure PrintPreOrder(root : in TreeNode_ptr) is
	begin
		if (root /= null) then
			Put("(");
			Put(root.Value, 2, 10);
			Put(")");

			PrintPreOrder(root.Left);

			PrintPreOrder(root.Right);
		end if;
	end PrintPreOrder;

	procedure PrintPosOrder(root : in TreeNode_ptr) is
	begin
		if (root /= null) then
			PrintPosOrder(root.Left);

			PrintPosOrder(root.Right);

			Put("(");
			Put(root.Value, 2, 10);
			Put(")");
		end if;
	end PrintPosOrder;

end BinSearchTree;

 

References

  1. Types:
    https://en.wikibooks.org/wiki/Ada_Programming/Type_System
  2. Access Types and Storage Allocation:
    http://www.adaic.org/resources/add_content/docs/craft/html/ch11.htm
  3. GNAT Users Guide:
    https://docs.adacore.com/gnat_ugn-docs/html/gnat_ugn/gnat_ugn.html
  4. Running a simple Ada program:
    https://gcc.gnu.org/onlinedocs/gnat_ugn/Running-a-Simple-Ada-Program.html

.PLT and .GOT – the key to code sharing and dynamic libraries

This text was found here: http://www.technovelty.org/linux/pltgot.html

The shared library is an integral part of a modern system, but often the mechanisms behind the implementation are less well understood. There are, of course, many guides to this sort of thing. Hopefully this adds another perspective that resonates with someone.

Let’s start at the beginning — – relocations are entries in binaries that are left to be filled in later — at link time by the toolchain linker or at runtime by the dynamic linker. A relocation in a binary is a descriptor which essentially says “determine the value of X, and put that value into the binary at offset Y” — each relocation has a specific type, defined in the ABI documentation, which describes exactly how “determine the value of” is actually determined.

Here’s the simplest example:

$ cat a.c
extern int foo;

int function(void) {
    return foo;
}

$ gcc -c a.c 
$ readelf --relocs ./a.o  

Relocation section '.rel.text' at offset 0x2dc contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000004  00000801 R_386_32          00000000   foo

Read the rest of this entry »

Understanding Linux ELF RTLD internals

This text was found here: http://s.eresi-project.org/inc/articles/elf-rtld.txt

/*
Last update Sun Dec 22 06:55:39 2002 mayhem

- Version 0.1 May 2001
- Version 0.2 .::. 2002 (WIP) : 
  - Added stuff about rtld relocation .
  - Added stuff about rtld symbol resolution .
  - Various fixes and some links added .

This draft remained unreleased for one year, most of it is based on the 
glibc-2.2.3 implementation, information about the subject has been
disclosed on bugtraq and phrack in beg 2002 :

http://online.securityfocus.com/archive/1/274283/2002-05-29/2002-06-04/2
http://www.phrack.org/phrack/59/p59-0x08.txt

However, it still contains some kewl info, I'll try to keep it updated, 
hope this will help . I am also adding/clearing/correcting stuffs (and
giving credits) on demand, so dont hesitate to send comments, etc .
 
/mM [mayhem at devhell dot org]
*/

		Understanding Linux ELF RTLD internals
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Most of the paper has been developed in a security perspective, your
comments are always welcomed .

Actually there's many ELF documentation at this time, most of them
are virii coding or backdooring related . To be honest, I never found
any documentation on the dynamic linking sources, and thats why I wrote
this one . Sometimes it looks more like an internal ld.so reference or
a comments review on the ELF dynamic linking implementation in ld-linux.so .

It's not that unuseful since the dynamic linking is one of the worse
documented part of the Linux operating system . I also decided to write
a (tiny) chapter on ELF kernel handling code, because it is
really necessary to know some kernel level stuffs (like the stack 
initialisation) to understand the whole interpreting . 

You can find the last glibc sources on the GNU's FTP server :

ftp://ftp.gnu.org/pub/gnu/glibc/

If you dont know anything in ELF, you should read the reference before :

http://x86.ddj.com/ftp/manuals/tools/elf.pdf


Want to know more ? Go on !


     O] Prologue
		    A) Kernel handling code 
		    B) Introducting glibc macros 
     1] Dynamic linker implementation
		    A) Sources graphics
		    B) The link_map structure explained
		    C) Relocating the interpretor
		    D) Runtime GOT relocation
		    E) Symbol resolution
     2] FAQ, thanks and references


TODO	:
		    X) Stack information gathering 
		    X) SHT_DYNAMIC information gathering
		    X) PHT interpreting 
		    X) Loading shared libraries 
		    X) Shared libraries relocation 


 Read the rest of this entry »

An Emulator Writer’s HOWTO for Static Binary Translation

This is a very interesting article that I found at: http://www.gtoal.com/sbt/. This is a practical article showing to to craft a simple static binary translator and emulator.

There is a lot of Computer Science literature on binary translation, both of the sexy dynamic variety and the slightly duller (from the CS point of view) static variety. However most of what you’ll find when you research the subject is rather dry and somewhat academic. I don’t know of any accessible primers on static translation that someone from the emulation world can pick up and use in a practical project.
So… the aim of this HOWTO document is to give you a very practical method that you can adapt to your own area of expertise, which should pretty much guarantee that if you have already written or worked on a reasonable emulator, you will be able to write a translator which will take a specific program and generate a binary from it that runs on a different architecture, and does so faster than any emulator that you’re used to using.

And that’s why we’re doing this: you have a favorite old video game that you want to run on some system and the emulator for that system works but it just isn’t fast enough. Or perhaps you want to port an emulator which works OK on a 300MHz Pentium to a 16MHz ARM in your PDA. A straight port runs your favourite game at 2FPS and you don’t think the hardware is likely to get 25 times faster in the near future! Play your cards right and you may get that factor of 25 out of a static binary translator.

This document tries to explain things simply – perhaps too simply at times, and there are a lot of examples which differ from the previous example in small details. This makes the document rather long, but that’s deliberate; I don’t want to skip too many stages and risk having anyone lose track of the process. Apologies in advance to those of you who think I’m taking it too slowly or have made this document too long.

Read the rest of this entry »

A very kind introduction to GDB: customize it the way you want

You can find it here: http://haifux.org/lectures/211/index.html =D

Linkers and Loaders

This is an excelent (!!!) article describing in general terms how the process of linking (static and dynamic) and loading elf programs on linux works. This is a very valuable article.

The original is found here: http://www.linuxjournal.com/article/6463?page=0,0

Discussing how compilers, links and loaders work and the benefits of shared libraries.
Linking is the process of combining various pieces of code and data together to form a single executable that can be loaded in memory. Linking can be done at compile time, at load time (by loaders) and also at run time (by application programs). The process of linking dates back to late 1940s, when it was done manually. Now, we have linkers that support complex features, such as dynamically linked shared libraries. This article is a succinct discussion of all aspects of linking, ranging from relocation and symbol resolution to supporting position-independent shared libraries. To keep things simple and understandable, I target all my discussions to ELF (executable and linking format) executables on the x86 architecture (Linux) and use the GNU compiler (GCC) and linker (ld). However, the basic concepts of linking remain the same, regardless of the operating system, processor architecture or object file format being used.

How debugger works

This text was found here http://www.alexonlinux.com/how-debugger-works

Introduction

In this article, I’d like to tell you how real debugger works. What happens under the hood and why it happens. We’ll even write our own small debugger and see it in action.

I will talk about Linux, although same principles apply to other operating systems. Also, we’ll talk about x86 architecture. This is because it is the most common architecture today. On the other hand, even if you’re working with other architecture, you will find this article useful because, again, same principles work everywhere.

Kernel support

Actual debugging requires operating system kernel support and here’s why. Think about it. We’re living in a world where one process reading memory belonging to another process is a serious security vulnerability. Yet, when debugging a program, we would like to access a memory that is part of debugged process’s (debuggie) memory space, from debugger process. It is a bit of a problem, isn’t it? We could, of course, try somehow to use same memory space for both debugger and debuggie, but then what if debuggie itself creates processes. This really complicates things.

Debugger support has to be part of the operating system kernel. Kernel able to read and write memory that belongs to each and every process in the system. Furthermore, as long as process is not running, kernel can see value of its registers and debugger have to be able to know values of the debuggie registers. Otherwise it won’t be able to tell you where the debuggie has stopped (when we pressed CTRL-C in gdb for instance).

As we spoke about where debugger support starts we already mentioned several of the features that we need in order to have debugging support in operating system. We don’t want just any process to be able to debug other processes. Someone has to monitor debuggers and debuggies. Hence the debugger has to tell the kernel that it is going to debug certain process and kernel has to either permit or deny this request. Therefore, we need an ability to tell the kernel that certain process is a debugger and it is about to debug other process. Also we need an ability to query and set values from debuggie’s memory space. And we need an ability to query and set values of the debuggie’s registers, when it stops.

And operating system lets us to do all this. Each operating system does it in it’s manner of course. Linux provides single system call named ptrace() (defined in sys/ptrace.h), which allows to do all these operations and much more.

ptrace()

ptrace() accepts four arguments. First is one of the values from enum __ptrace_request that defined in sys/ptrace.h. This argument specifies what operation we would like to do, whether it is reading debuggie registers or altering values in its memory. Second argument specifies pid of the debuggie process. It’s not very obvious, but single process can debug several other processes. Thus we have to tell exactly what process we’re referring. Last two arguments are optional arguments for the call.

Starting to debug

One of the first things debuggers do to start debugging certain process is attaching to it or running it. There is a ptrace() operation for each one of these cases.

First called PTRACE_TRACEME, tells the kernel that calling process wants its parent to debug itself. I.e. me calling ptrace( PTRACE_TRACEME ) means I want my dad to debug me. This comes handy when you want debugger process to spawn the debuggie. In this case you do fork() creating a new process, then ptrace( PTRACE_TRACEME ) and then you call exec() or execve().

Second operation called PTRACE_ATTACH. It tells the kernel that calling process should become debugging parent of the process being called. Debugging parent means debugger and a parent process.

Debugger-debuggie synchronization

Alright. Now we told operating system that we are going to debug certain process. Operating system made it our child process. Good. This is a great time for us to have the debuggie stopped and us doing preparations before we actually start to debug. We may want to, for instance, analyze executable that we run and place a breakpoints before we actually start debugging. So, how do we stop the debuggie and let debugger do its thing?

Operating system does that for us using signals. Actually, operating system notifies us, the debugger, about all kinds of events that occur in debuggie and it does all that with signals. This includes the “debuggie is ready to shoot” signal. In particular, if we attach to existing process it receives SIGSTOP and we receive SIGCHLD once it actually stops. If we spawn a new process and it did ptrace( PTRACE_TRACEME ) it will receive SIGTRAP signal once it attempts to exec() or execve(). We will be notified with SIGCHLD about this, of course.

Read the rest of this entry »

Where 0x08048000 ELF address came from?

Reading some ELF [1] and linux memory managing papers [2] I noticed the use of address 0x08048000 for the start of linear address but no one told why this address was chosen. Until now I didn’t find an reasonably explanation, below are some links about what I was reading and commenting about this misteriousssss number:

http://flint.cs.yale.edu/cs422/doc/ELF_Format.pdf
http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory
http://stackoverflow.com/questions/7187981/whats-the-memory-before-0x08048000-used-for-in-32-bit-machine
http://forum.osdev.org/viewtopic.php?f=13&t=24474

Writing a compiler with SableCC

The original post of this text is http://www.brainycreatures.org/compiler/sablecc.asp

To create a compiler with SableCC we follow the following steps:

  1. We create a SableCC specification file containing the lexical definitions and the grammar for the language being designed.
  2. After creating the SableCC specification file, we generate the framework by launching SableCC on the specification file. .
  3. At this stage, we generate the working classes. This is the only code we have to write in JavaTM. It is in this step that we write the semantic analyzer, the code generator, and possibly the code optimizer. In the case of an interpreter, we write a single class. These working classes may be subclasses of one of the classes from the analysis subfolder generated by SableCC in the previous step.
  4. It is during this step that we create the driver class of the compiler. It is used to activate the lexer, parser and working classes.
  5. Finally, in this step, we compile the compiler with a JavaTM compiler.

To illustrate this, we are going to implement a subset of Pascal by following the steps outlined above. We are going to implement a subset instead of the full language, because we want to give the reader an idea of how SableCC works.

We are now going to start the implementation

SableCC specification file

Before we start implementing the SableCC specification file for the subset of Pascal, we are going to describe the syntax of SableCC using BNF. I am assuming that you have read the introduction to regular expressions and the context–free grammars.

The syntax of SableCC is as follows:

[] []
[] []
[] []

As we can see from the grammar, a SableCC specification file may by an empty file. Remember the meaning of anything between [ and ] ? This means that anything between [ and ] can only be the empty string or a single occurrence of whatever is between the brackets.

From the allowed sections in a SableCC specification file, we will not take into account the because we will not use it. But if the reader is interested to know how it works, he should refer to [1]. We will now describe the syntax of the other productions. The is used to name the destination root, that is, the directory where all the generated files will be placed. We declare the name of the package as follows:

Package uk.co.brainycreatures.smallpascal;

In this example, the root directory will be uk\co\brainycreatures\smallpascal. Meaning that all the genarated subfolders will be placed under this directory. A package name may be any sequence of identifiers, starting with a letter followed by zero or more letters or digits, separated by a dot. The works like constants in Pascal. And, as its name imply, it is used as a helper of something, in this case the section (more about this next). To see how a is helpful let’s examine the following regular expression:

id = [‘a’ .. ‘z’] ([‘a’ .. ‘z’] | [‘0’ .. ‘9’])*

In this example we could simplify this regular expression by defining digit = [‘0’ .. ‘9’] and letter = [‘a’ .. ‘z’], then our regular expression ID becomes

id = letter (letter | digit)*

Letter and digit are our helpers. Here’s how we declare helpers in SableCC:

Helpers
letter = [‘a’ .. ‘z’];
digit = [‘0’ .. ‘9’];

After the follows the , which is where we define our terminals or tokens. Lets use the example above to show how to declare tokens in SableCC:

Tokens
id = letter (letter | digit)*;

This is very similar to regular expressions as described in the Regular Expressions tutorial. A token defined by a string of characters is declared between quotes, e.g. program = ‘program’, and every declaration ends with a semicolon. Following the is the , in other words, the section where we declare the tokens to be ignored by the parser. For example, comments, blanks, carriage return, etc.

Here’s an example:

Helpers
any_charater = [0x0 .. 0xfffff];
nl = ‘\n’;
Tokens
comment = ‘⁄⁄’ any_character nl
blank = 10 | 10 13 | 9;
Ignored Tokens
comment,
blank;

In this case, comment and blank will be ignored by the parser.

Finally, in the section, we declare the productions grammar for the language. The productions are defined in BNF or EBNF (Extended Backus–Naur Form).

Here’s an example of how to declare productions:

Tokens
identifier = letter (letter | digit)*;
number = digit+;
plus = ‘+’;
Productions
expression = identifier plus identifier minus number ;

As we can see it is similar to the context-free grammars presented in the grammars tutorial. The only difference is that in SableCC we don‘t declare a nonterminal surrounded by , and we replace the → by =. In the productions section, it is sometimes required to precede a token by T. and a production by a P. This happens when we have a token and a production with the same name. For example if we have a production program and a token program, then in the production below we must precede program by T., because will not know if it is the token program or the production program.

Tokens
program = ‘program’;
semicolon = ‘;’ ;
Productions
program = T.program identifier semicolon ;

Now that we are familiar with the syntax of SableCC, let’s implement our subset.

The root of our package will be uk.co.brainycreatures.smallpascal. Hence in the package declaration we will have

Package uk.co.brainycreatures.smallpascal;After defining the package, we are going to declare our helpers, which are shown below:

Helpers
a = ‘a’ | ‘A’ ;
b = ‘b’ | ‘B’ ;
e = ‘e’ | ‘E’ ;
g = ‘g’ | ‘G’ ;
i = ‘i’ | ‘I’ ;
m = ‘m’ | ‘M’;
n = ‘n’ | ‘N’ ;
o = ‘o’ | ‘O’ ;
p = ‘p’ | ‘P’ ;
r = ‘r’ | ‘R’ ;
t = ‘t’ | ‘T’;
v = ‘v’ | ‘V’ ;
w = ‘w’ | ‘W’;
cr = 13 ; ⁄⁄ carriage return
lf = 10 ; ⁄⁄ line feed
tab = 9 ; ⁄⁄ tab char
ascii_char = [32 .. 127] ;
blank = ‘’ ;
digit = [‘0’ .. ‘9’] ;
letter = [[‘a’ .. ‘z’] + [‘A’ .. ‘Z’]] ;
l_brace = ‘{’ ;
r_brace = ‘}’ ;
l_paren_star = ‘(*’ ;
r_paren_star = ‘*)’ ;

Note the space between the letters in the definition of the keywords. In this case we need them, because each letter is a helper token. So, in defining our tokens, we have a sequence of helpers separated by a space.

Finally, we will describe the grammar for the language. Here it is:

Productions
program = program_heading global_declaration_part begin main_statement_part end dot;
program_heading = {non_empty} program identifier semicolon | {empty} ;
global_declaration_part = var_declaration;
var_declaration = var var_decl+;
var_decl = identifier_list colon type;
identifier_list = {single} identifier | {sequence} identifier_list comma identifier;
type = integer;
main_statement_part = statement_sequence;
statement_sequence = {single} statement | {sequence} statement_sequence semicolon statement ;
statement = {assign} identifier assignop expression | {writeln} writeln_stmt ;
writeln_stmt = {simple}writeln | {arguments} writeln l_paren expression_list r_paren ;
expression_list = {single} expression | {sequence} expression_list comma expression ;
expression = {term} term | {plus} expression plus term | {minus} expression minus term ;
term = {factor} factor | {mult} term mult factor | {div} term div factor;
factor = {identifier} identifier | {integer} integer;

Generating the working classes

In this section we will implement the semantic analyser and the code generators for the language. We start by implementing the semantic analyser below.

The semantic analyser

The semantic analysis phase completes the static analysis (parsing) phase of the source program, by ensuring that the grammatically correct statements passed to it by the parser ‘make sense’. For instance, the syntax of assignment statements in Pascal does not require that the identifiers have been declared, that they are variables, that they are of similar type, and that the operations can be performed on those types. All these restrictions are defined by the static semantics of Pascal, and the semantic analyser performs the corresponding checks. To perform this checking, the semantic analyser completely processes the declarations and creates equivalent information known as property information. This information is stored in a data structure known as symbol table. The symbol table is used so that given any identifier the associated information may be found.

For this small subset, the only checking that needs to be performed is that the identifiers are declared before use and that they are declared only once. There is no need for type checking because all the identifiers are of the same type. The semantic analysis is processed by a depth first traversal on the abstract syntax tree, storing information about identifiers in the symbol table after having visited the subtrees of the production containing the variables declaration. These information will then be fetched as we encounter identifiers in productions factor and variable, respectively.

Now that we have the required information to implement the semantic analyser, we will now describe how to implement it. Returning to the grammar above, we are now going to explain the effect of the names between { and } in the alternatives of each production. The name of each alternative in the productions of the grammar above are prefixed by A and concatenated by the name of the production to produce a type name for the alternative. For example, for the production below

Factor =           {identifier} identifier | {integer} integer_literal | … ;

a type named AIdentifierFactor and a type name AIntegerFactor will be produced.

Given the information above, we design our semantic analyser by defining the methods for the alternatives including identifiers. That is, alternatives ASingleIdentifierList, ASequenceIdentifierList, AAssignStatement and AIdentifierFactor. Also, we need to process alternative AVarDecl. Note that for the alternative for var_decl doe not have a name. This is allowed in SableCC if the production is defined by one alternative only, and the resulting type is the name of the production prefixed by A.

Here is the implementation of our semantic analyser:

package uk.co.brainycreatures.smallpascal;

import uk.co.brainycreatures.smallpascal.node.*;
import uk.co.brainycreatures.smallpascal.analysis.*;
import java.util.*; // for the Hashtable

public class SemanticAnalyzer extends DepthFirstAdapter {
    // stores the identifiers being defined
    Hashtable symbol_table = new Hashtable(); // create a new table

    // check if the identifier is already in the table and report an error
    // if it is
    public void outASingleIdentifierList(AidentifierList node) {
        // identifier to be stored in the symbol table
        TIdentifier ident = node.getIdentifier();
    
        // name of the identifier to be stored in the table
        String key = ident.getText().toUpperCase(); //is the identifier in the table?

        if (symbol_table.containsKey(key)) { // report an error
            System.out.println("Identifier already defined.");
            System.exit(0);
        }
        else {
            symbol_table.put(key, key);
        }
    }

    // same as above
    public void outASingleIdentifierList(AidentifierList node) {
        // identifier to be stored in the symbol table
        TIdentifier ident = node.getIdentifier();

        // name of the identifier to be stored in the table
        String key = ident.getText().toUpperCase(); // is the identifier in the table?

        if (symbol_table.containsKey(key)) { // report an error
            System.out.println("Error: [" ident.getLine() + "," + ident.getPos() + "] Identifier already defined.");
            System.exit(0);
        }
        else {
            symbol_table.put(key, key);
        }
    }

    // checks if the identifier in the assignment statement was previously
    // declared, and report an error if it wasn’t
    public void outAAssignStatement(AAssignStatement node) {
        Tidentifier ident = node.getIdentifier();
        String key = ident.getText().toUpperCase();

        // Is the identifier in the table?
        // if not report error
        if (!symbol_table.containsKey(key)) {
            System.out.println("Error: [" + ident.getLine() + "," ident.getPos() + "] Unknown identifier.");
            System.exit(0);
        }
    }

    // same as above
    public void outAIdentifierFactor(AIdentifierFactor node) {
        Tidentifier ident = node.getIdentifier();
        String key = ident.getText().toUpperCase();

        // Is the identifier in the table?
        // if not report error
        if (!symbol_table.containsKey(key)) {
            System.out.println("Error: [" + ident.getLine() + "," + ident.getPos() + "] Unknown identifier.");
            System.exit(0);
        }
    }
}

We define the class SemanticAnalyser as extending class DepthFirstAdapter.

This automatically provides a depth–first traversal of the AST. To check the declarations of identifiers we just need methods outASingleIdentifierList and outASequenceIdentifierList. To check if they were declared before use we just need methods outAAssignStatement and outAIdentifierFactor. For more about how SableCC generates this methods refer to chapters 5 and 6 of [1]. We are now done with the semantic analyser. For our code generation we do the same thing, but we generate code as well as we process the identifiers in the symbol table.

The Class Generator

The ClassGenerator is also a subclass of class DepthFirstAdapter. To implement the code generator we need to do a depth first traversal of the tree, just as we did with the semantic analyser. But for the code generator we will use alternatives from other productions as well. For example, we will use the program_heading to define the name of the class to be generated. That is, we will use the name of the identifier that follows the program keyword, if the program heading is present in the program. If the program heading is not present in the program, the name of the class will be a.out, just as the executables generated by a C compiler in a Unix based system. We will now start the description of the design of our code generator. For example, from the production

program_heading =
{non_empty} T.program identifier semicolon |
{empty} ;

we will define methods outANonEmptyProgramHeading and outAEmptyProgramHeading. To generate a class file using the Jas API we need to create an instance of class ClassEnv first. Then we set the class attributes: its name, its super class, the source from which it is being compiled and its access modifiers. This is all done in the methods outlined above. We also need to create a default constructor. Although we don’t realise, the Java compiler generates a default constructor for us. This is how it looks in Java assembly:

.method public ()V
aload_0
invokevirtual java⁄lang⁄Object⁄()V
return
.end method

To define this method, we can either do it in production program_heading, or even before production program traverses its subtrees. More precisely, on the definition of its inAProgram method. It is up to the programmer’s preference. In our case, we are going to define the constructor in methodinAProgram. To define a method we need to define an instance of class CodeAttr, then we add the instructions with its addInsn method. We also need to create a new instance of class CodeAttr for object main_code. Here is how we implement it:

public void inAProgram(AProgram node) {
    // create an new instance of CodeAttr
    init = new CodeAttr();
    main_code = new CodeAttr(); // create a new object for main program body

    try {
        // define the method
        init.addInsn(new Insn(opc_aload_0));
        init.addInsn(new Insn(opc_invokevirtual, new MethodCP("java/lang/Object", "<init>", "()V")));
        init.addInsn(new Insn(opc_return));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

After defining the method, we must add it to the class. This will be done in either method outAEmptyProgramHeading or outANonEmptyProgramHeading. Here is its implementation:

public void outANonEmptyProgramHeading(ANonEmptyProgramHeading node) {
    // name of the class to be generated
    class_name = node.getIdentifier().getText();

    try {
        // set class attributes
        main_class.setClass((new ClassCP(class_name));
        main_class.setSuper(new ClassCP("java/lang/Object"));

        // source from which it is compiled
        main_class.setSource(new SourceAttr(source_file_name));

        // make class public
        main_class.setClassAccess((short) ACC_PUBLIC);

        // add the constructor method to the class
        main_class.addMethod((short) ACC_PUBLIC, , "<init>", "()V", init, null);
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

The code is self explanatory, so we won’t go into more details.

The code for method outAEmptyProgramHeading is similar to the code above, except for the name of the class. It sets class_name to “a.out”. Proceeding with the grammar, we are now going to define the code that processes identifiers; that is, the code that defines the identifiers. Pascal variables will be defined as Java fields. To define fields in the Jas API, we use method addField of class ClassEnv. The name of the generated variables will be in lowercase. We don’t need a symbol table in order to fetch the signature of a field. Rather, we use the name of the class to which it belongs. Here is the implementation:

public void outASingleIdentifierList(AsingleIdentifierList node) {
    // get the name of the name of the variable in lower case
    String var_name = node.getIdentifier().getText().toLowerCase();

    try {
        // add the field to the class
        main_class.addField(new Var((short) (ACC_STATIC | ACC_PUBLIC)), new AsciiCP(var_name), new AsciiCP("I"), new ConstAttr(new IntegerCP(0)))));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

The code above is equivalent to the following Java declaration:

public static int = 0;where is the same as the variable being declared.

The code for method outASequenceIdentifierList is the same as the code foroutASingleIdentifierList above. So, there is no need to describe it again.

Lets now describe how we implement the statements. Before going any further lets analyse how we translate Pascal statements into Java assembly. Because the Java Virtual Machine is stack based, we need to think in terms of stack operations only. So to translate an assignment statement we push all the factors on the right hand side to the stack and then we pop the result into the variable on the left hand side. For example, the assignment statement

a := b + cis translated to the following Java assembly code:

getstatic <class name>⁄b I
getstatic <class name>⁄c I
iadd
putstatic <class name>⁄a I</td>

where is the name of the class to which each of the fields a, b and c belong. In Java we use the ‘.’, but in Java assembly we use the ‘/’ to access either methods or fields. The I following the name of the variable means that this variable is of type int. The instruction getstatic is used to get the value of static fields of a class, and the instruction putstatic is used to store values into static fields of a class. The other arithmetic instructions that will be used in the code generation of the compiler for this subset are:

isub – used to subtract integer numbers. This instruction expects to find two numbers on
the top of the stack, and if they are there it pops them, subtracts them and pushes
the result onto the top of the stack.imul – the same as isub, but it multiplies insteadidiv – used to divide integers

In order to generate the appropriate code, we need to traverse the subtrees of the corresponding operator, and after visiting them we generate the opcode corresponding to this same operator. We need to push numbers and identifiers as we encounter them in production factor, and then generate the opcode for the arithmetic instruction in the alternatives of productions expression and term. Here are the implementations:

public void caseAIdentifierFactor(AidentifierFactor node) {
    String var_name = node.getIdentifier().getText().toLowerCase();
    
    try {
        // getstatic <class_name>⁄<var_name> I
        code.addInsn(new Insn(opc_getstatic, new FieldCP(class_name, var_name, "I")));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

Code for integer factors:

public void caseAIntegerFactor(AIntegerFactor node) {
    // get the string value
    String num_image = node.getIdentifier().getText();

    int value = Integer.parseInt(num_image);

    try {
        switch(value) {
            case –1 :
                code.addInsn(new Insn(opc_iconst_m1)); break;
            case 0:
                code.addInsn(new Insn(opc_iconst_0)); break;
            case 1 :
                code.addInsn(new Insn(opc_iconst_1)); break;
            case 2:
                code.addInsn(new Insn(opc_iconst_2)); break;
            case 3:
                code.addInsn(new Insn(opc_iconst_3)); break;
            case 4:
                code.addInsn(new Insn(opc_iconst_4)); break;
            case 5:
                code.addInsn(new Insn(opc_iconst_5)); break;
            default:
                if (value => –128 && value <= 127) {
                    code.addInsn(new Insn(opc_bipush, value));
                }
                else if (value >= –65536 && value <= 65535) {
                    code.addInsn(new Insn(opc_sipush, value));
                }
                else {
                    code.addInsn(new Insn(opc_ldc, value));
                }
                break;
        }
    }
}

Code for addition operator:

public void outAPlusExpression(APlusExpression node) {
    try {
        code.addInsn(new Insn(opc_iadd));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

Code for subtraction operator:

public void outAMinusExpression(AMinusExpression node) {
    try {
        code.addInsn(new Insn(opc_isub));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

Code for multiplication operator:

public void outAMultTerm(AMultTerm node) {
    try {
        code.addInsn(new Insn(opc_imul));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

Code for division operator:

public void outADivTerm(ADivTerm node) {
    try {
        code.addInsn(new Insn(opc_idiv));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

Code for assignment statement:

public void outAAssignStatement(AassignStatement node) {
    String var_name = node.getIdentifier().getText().toLowerCase();
    
    try {
        code.addInsn(new Insn(opc_putstatic, new FieldCP(class_name, var_name, "I")));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

Code for writeln statement:

public void inAWritelnStatement(AwritelnStatement node) {
    try {
        code.addInsn(new Insn(opc_getstatic, new FieldCP("java/lang/Object", "out", "Ljava/io/PrintStream;")));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

public void outAWritelnStatement(AwritelnStatement node) {
    try {
        code.addInsn(new Insn(opc_invokevirtual, new MethodCP("java/io/PrintStream", "println", "(I)V")));
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

After generating all the code, we add an extra return instruction, we add the method to the class, and we dump the class. This is done in method outAProgram, that is at the exit of a program. The code is a follows:

public void outAProgram(AProgram node) {
    try {
        // add return instruction to the end of the method
        code.addInsn(new Insn(opc_return);

        // add main method to main_class
        main_class.addMethod((short)( ACC_PUBLIC | ACC_STATIC), "main", "([Ljava/lang/String;)V", code, null));// generate class file
        main_class.write(new DataOutputStream(new FileOutputStream(class_name + ".class"))); // output status

        System.out.println("Wrote " + class_name + ".class"]");
    }
    catch (Exception e) {
        System.out.println(e);
    }
}

The Main class

Now that we have implemented the working classes, it time to implement the Main class. This class simply activates the lexer, parser, semantic analyser and the class generator. This is always the same for every compiler or interpreter implemented using SableCC. The code for the main class is shown below:

package uk.co.brainycreatures.smallpascal;

import uk.co.brainycreatures.smallpascal.semantic.SemanticAnalyser;
import uk.co.brainycreatures.smallpascal.code.ClassGenerator;
import uk.co.brainycreatures.smallpascal.parser.*;
import uk.co.brainycreatures.smallpascal.lexer.*;
import uk.co.brainycreatures.smallpascal.node.*;
import java.io.*;

public class Main {
    public static void main(String[] args) {
        long start_time, stop_time; // times compilation

        if (args.length < 1) {
            System.out.println("Usage:");
            System.out.println(" java uk.co.brainycreatures.smallpascal.Main <filename>");
        }

        try {
            start_time = System.currentTimeMillis();// create lexer
            Lexer lexer = new Lexer (new PushbackReader(new BufferedReader(new FileReader(args[0])), 1024));    // parser program
            Parser parser = new Parser(lexer);
            Start ast = parser.parse(); // check program semantics
            ast.apply(new SemanticAnalyser());  // generate class file
            ast.apply(new ClassGenerator());
        }
        catch (Exception e) {
            System.out.println(e);
        }
    }
}

Now that we have implemented everything, it is time to compile all the source code. Assuming that the current working directory is in our CLASSPATH, we compile the compiler as follows:

C:\mycompilers\javac uk\co\brainycreatures\Main.java

If no errors were found in the source, we will have all the sources compiled. We then just have to debug the compiler. That is, test it with small pascal programs. The best way to test a compiler is by inputig erroneous data to it. That is, invalid programs.

If you have any queries, please do not hesitate to send an email at fidel dot viegas at brainycreatures dot co dot uk

Best Regards
Fidel.

References:

  1. Etienne Gagnon,”SableCC, An Object–Oriented Compiler Framework”, Master’s thesis, McGill University, Montreal, Quebec, March 1998.