Thread Deadlock in Java

Deadlock occurs when a group of processes blocks forever because each process is waiting for resources which are held by another process in the group. What happens is that a task is stuck waiting for another task to release a resource which itself is stuck waiting for another task to release a resource and so on such that a circular wait loop ensures. The result is that no task can proceed. Thus a deadlock results.

The classis case of deadlock is the Dining Philosopher problem.
In this problem we have, say 5 philosophers sitting down for dinner at a round table.
To the left and right of each philosopher is a chopstick and there are 5 of these chopsticks F1 – F5, illustrated below:

Dining philosopher problem


In order for each philospher to eat, they must pick up the left and right chopsticks.
Each philosopher decides to pick up the chopstick on his right 1st before picking up the one on his left.
After picking up the chopstick on the right, each philosopher attempts to pick up the chopstick on his left and if it is not yet available, has to wait.

Thus we can have the following scenario:

P1 picks up F1, waits to pick up F2
P2 picks up F2, waits to pick up F3
P3 picks up F3, waits to pick up F4
P4 picks up F4, waits to pick up F1
P5 picks up F5, waits to pick up F1

Thus we have a circular wait scenario, where each philosopher is waiting on the next philosopher to his left to drop his right chopstick and so on such that no philosopher can eat.

Here is Java code for a simpler example of deadlock involving 2 tasks:

public class DeadlockDemo
  public Integer obj1 = 1;
  public Integer obj2 = 2;

  private class Runnable1 implements Runnable
   public void run()
     System.out.println("R1: Obtained lock on obj1:" + obj1);
     System.out.println("R1: Waiting to obtain lock on obj2..."):
      System.out.println("R1: Obtained lock on obj2:" + obj2);

  private class Runnable2 implements Runnable
   public void run()
     System.out.println("R2: Obtained lock on obj2:" + obj2);
     System.out.println("R2: Waiting to obtain lock on obj1..."):
      System.out.println("R2: Obtained lock on obj1:" + obj1);

  public static void main(String[] args)
   DeadlockDemo dDemo=new DeadlockDemo();
   Runnable Runnable1();
   Runnable r2=new Runnable2();
   new Thread(r1).start();
   new Thread(r2).start();


I ran the above code and it produced the following result:

R2: Obtained lock on obj2:2
R2: Waiting to obtain lock on obj1...
R1: Obtained lock on obj1:1
R1: Waiting to obtain lock on obj2...

and the program hung with the 2 threads stuck in a circular wait.

Defining and Starting a Thread in Java

There are 3 ways to do this:

1. Subclass Thread class

i. Subclass the Thread class and override the run() method
ii. Instantiate the Thread subclass.
iii. Call Thread.start() method.

public class MyThread extends Thread {
   public void run() { 
      System.out.println("Thread via Extending Thread!"); 
   public static void main(String []args) { 
      (new MyThread()).start(); 



2. Provide a Runnable object by implementing Runnable

Runnable interface defines a single method run(), meant to contain the code executed in the thread.

i. Implement the Runnable interface by implementing run() method.
ii. Pass instance of Runnable object to the Thread(..) constructor.
iii. Call Thread.start() method.

public class MyRunnable implements Runnable {

   public void run() {
      System.out.println("Thread via Implementing Runnable!");

   public static void main(String[] args) {
      (new Thread(new MyRunnable())).start();



Notice that both cases above invoke Thread.start() in order to start the new thread. In either case above, the result is a Thread object, where the run() method is the body of the thread. When the start() method of the Thread object is called, the interpreter creates a new thread to execute the run() method. The new thread continues to run until the run() method exits. Meanwhile the original thread continues running itself, starting with the statement following the start() method.


3. Using Executor interface

The Executor interface can be used to invoke threads as well. It isn’t really a new idiom, since either a Thread object or object that implements Runnable needs to be created first. The Executor interface provides a layer of indirection between a client and the execution of a task; instead of a client executing a task directly, an intermediate object executes the task. Executors allow you to manage the execution of asynchronous tasks without having to explicitly manage the lifecycle of threads.


  1. Implement and create a new instance of Runnable.
  2. Create a concrete instance of ExecutorService by calling one of the Executors factory methods.
  3. Call Executor.execute(..) method, passing the Runnable object as argument.
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class MyExecutor {

	public static void main(String[] args) {
		ExecutorService exec = Executors.newCachedThreadPool(); 
	    for(int i = 0; i < 5; i++) 
	      exec.execute(new MyRunnable()); //MyRunnable implements Runnable


Thread vs. Runnable
Which of these idioms should you use? The first idiom, which employs a Runnable object, is more general, because the Runnable object can subclass a class other than Thread. Also using Runnable enhances separation of concerns vi a composition, by separating the method of execution from the interface used to construct the Runnable
The second idiom is easier to use in simple applications, but is limited by the fact that your task class must be a descendant of Thread.

Executor Factory Methods

FactoryMethod Details
newCachedThreadPool() Creates thread pool that creates new threads as needed, but will reuse previously constructred threads when they’re available
newFixedThreadPool(..) Creates thread pool that reuses a fixed number of threads operating off a shared unbounded queue
newScheduledThreadPool(..) Creates a thread pool that can schedule commands to run after a given delay, or to execute periodically
newSingleThreadExecutor() Creates an Executor that uses a single worker thread operating off an unbounded queue
newSingleThreadScheduledExecutor() Creates a single-threaded executor that can schedule commands to run after a given delay, or to execute periodically

Abstract Data Types vs Data Structures

What is the difference between an abstract data type (ADT) and a data structure ?

The question above can be reframed in a more concrete manner by asking this question:
What is the difference between an array and a stack ?
An array is a data structure while a stack is an abstract data type.
An abstract data type (ADT) is a specification for how an ‘data’ interface should behave without any reference to its actual implementation.

The wikipedia definition of an ADT is as follows:
An abstract data type is defined as a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects.
From the NIST Dictionary of Algorithms and Data structures we have the following definition of an ADT:
A set of data values and associated operations that are precisely specified independent of any particular implementation.

Thus in the case of a stack, it exhibits Last In First Out (LIFO) behavior when elements are added to and removed from it. The concrete implementation of the ADT is where the data structure comes in. Thus a Stack can be implemented as an array or as a linked list.

This might lead us to conclude that an ADT is more a theoretical or abstract concept, while the data structure has more to do with the concrete implementation.

Under the definitions above, the following constructs are ADTs:

  • Stack
  • Queue
  • Bag
  • List
  • Priority Queue
  • Trie
  • Heap
  • Binary Tree
  • Set
  • Map

while these are data structures:

  • array
  • linked list
  • hash map/dictionary

We can gain a better understanding of this when we consider these constructs in Java.
The ADT corresponds to the interface type, while the data structure would correspond to the concrete class.Thus a Java array, ArrayList, LinkedList, HashMap are actually data structures and the corresponding interfaces they implement would be equivalent to ADTs.
See this article for more about abstract data types in Java

Variable storage in Java

In order to figure out where a variable is stored in Java the most important factor is where the variable is declared.
A general rule of thumb is this:

  • local variables are stored on the stack
  • instance variables are stored on the heap
  • static variables are stored on the PermGen area of the heap

There are caveats to this however, which are explained below:

Variable Storage Details

Local variables
Primitives and object references declared in a method will be stored on the stack. However, the actual object, if created using new() will be stored on the heap, regardless of where the declaration took place. Hence in the following piece of code:
void aMethod()
int playerNum=5;
Player pl=new Player();

The primitive variable playerNum and object reference variable pl will be stored on the stack, while the actual Player object itself will live on the heap. When the code exits aMethod and goes out of scope, playerNum and pl will be popped from the stack and cease to exist but the Player object will persist on the heap until it is eventually garbage collected.
Instance variables
Instance variables, even primitives live on the heap.

Consider the code below:

public class Car {
int vinNumber;
String make;
String model;
int year;
String class;

Car(int vin, String make, String model, int year, String class)
public static void main(String[] args)
Car c=new Car(19281,"Audi", "A6",2012,"sedan");

Since an instance of Car can only be instantiated via a call to new(), we see that:

  • The Car object c lives on the heap
  • All instance primitives and objects that are part of the Car object are also stored on the heap.

Static variables
The rule for static variables is this: Static methods, primitive variables and object references are stored in the PermGen section of the heap since they are part of the reflection i.e. class, not instance related data. However, in the case of objects, the actual object itself is stored in the regular areas of the heap (young/old generation or survivor space).




Clustered Indexes

In this article I explain what is meant by a clustered index and compare it with a non-clustered index.


A clustered index on a table determines the order in which the rows of the table are stored on disk. If the table has a clustered index, the rows of the table are stored in the same order as that of the clustered index. As an illustration, suppose we have a Customer table that contains the following columns:

  • customerID
  • firstName
  • lastName

where customerID is the primary key on the table. If we define the clustered index on customerID, then the rows of the table will be stored in sorted order according to customerID. This means that rows with customerID=1000, customerID=1001, etc will be adjacent to each other on disk.

Advantages and Disadvantages

The advantages of having a clustered index include the following:

  1. Range queries involving the clustered index will be faster since once the row with 1st key value is located the remaining rows physically stored next to each other and no more searching is needed.
  2. The leaf nodes of the B-tree that make up the clustered index contain the actual data pages as opposed to a non-clustered index where the leaf nodes contain pointers to rows on data pages. Hence there is 1 less level of indirection for a clustered index and this improves performance.

The disadvantages of a clustered index include:

  1. Updates involving columns used in the clustered index result in a performance hit since the rows may have to be re-arranged to keep the table in sorted order in line with the clustered index. In light of this, it is recommended that a clustered index is created on a primary or foreign key, since this would be less prone to updates.

Comparison of clustered vs non-clustered indexes

  • Returning to our Customer table example, if we have a clustered index on Customer id, then the leaf node of the clustered index will contain the actual row data (data pages) for a particular customerID while for a non-clustered index the value of customerID and a pointer to the actual row is what is stored at the leaf node.
  • There can only be 1 clustered index per table, but multiple non-clustered indexes (up to 249 in the case of Sybase). This is because the rows in the table are physically ordered according to the clustered index and there is only 1 way of doing so.
  • A clustered index determines the order in which the rows of the table can be stored on disk, but this is not the case for a non-clustered index
  • A clustered index can be a performance hit in the case of updates to an indexed column since the rows may have to be re-ordered in order to maintain order. Range queries and queries involving indexed foreign keys tend to show better performance for a clustered vs non-clustered index.


For a more in depth look at clustered indexes, see the following very good article by Michelle Ufford : 
Effective Clustered Indexes

Removing duplicates from a table with no primary keys

Consider the following table:

mysql> SELECT * FROM Philosopher;
| philosopherID | firstName | lastName |
| 1234 | John | Locke |
| 1234 | John | Locke |
| 2345 | Rene | Descartes |
| 2347 | John Stuart | Mill |
| 1562 | Emmanuel | Kant |
| 1562 | Emmanuel | Kant |
| 1671 | Baruch | Spinoza |
| 1562 | Emmanuel | Kant |
| 1761 | Jean-Paul | Sartre |
9 rows in set (0.00 sec)

Come up with a strategy to first identify and then remove the duplicate rows.

What SQL queries would you use ?


1. Identify duplicate rows:

SELECT * FROM Philosopher
 GROUP BY philosopherID, firstName, lastName HAVING COUNT(*) > 1;

2. Remove duplicate rows

i. Create temporary table and copy data over:

 philosopherID INT,
 firstName VARCHAR(50),
 lastName varchar(50)

SELECT philosopherID,firstName, lastName FROM Philosopher;

ii. Truncate original table and do SELECT DISTINCT:

INSERT INTO Philosopher(philosopherID, firstName, lastName)
SELECT DISTINCT philosopherID, firstName, lastName 
FROM PhilosopherDups;

iii. Another option to using DISTINCT would be to use the rowId in the temptable to delete duplicate rows, truncate the original table and copy data back:

DELETE FROM PhilosopherDups
SELECT MAX(rowId) FROM PhilosopherDups  
GROUP BY philosopherID, firstName, lastName 


INSERT INTO Philosopher(philosopherID, firstName, lastName)
SELECT philosopherID, firstName, lastName 
FROM PhilosopherDups;