Linked List Implementation

A linked list looks like this:

linked_list

Note that a linked list consists of one or more nodes. Each node contains some data (in this example, item 1, item 2, etc) and a pointer. For each node other than the last one, the pointer points to the next node in the list. For the last node, the pointer is null (indicated in the example using a diagonal line). To implement linked lists in Java, we define a ListNode class, to be used to represent the individual nodes of the list.

Note that the next field of a ListNode is itself of type ListNode. That works because in Java, every non-primitive type is really a pointer; so a ListNode object is really a pointer that is either null or points to a piece of storage (allocated at runtime) that consists of two fields named data and next.

Linked List Operations

Before thinking about how to implement lists using linked lists, let’s consider some basic operations on linked lists:

  • Adding a node after a given node in the list.
  • Removing a given node from the list.

Adding a node

Assume that we are given:

  1. n, (a pointer to) a node in a list (i.e., n is a ListNode), and
  2. newdat, the data to be stored in a new node

and that the goal is to add a new node containing newdat immediately after n.
To do this we must perform the following steps:

  1. create the new node using the given data
  2.  “link it in”:
    a) make the new node’s next field point to whatever n‘s next field was pointing to
    b) make n‘s next field point to the new node.

addnode

 

And here’s the code:

ListNode tmp = new ListNode(newdat);    // Step 1
tmp.setNext( n.getNext() );             // Step 2(a)
n.setNext( tmp );                       // Step 2(b)

Note that it is vital to first copy the value of n’s next field into tmp’s next field (step 2(a)) before setting n’s next field to point to the new node (step 2(b)). If we set n’s next field first, we would lose our only pointer to the rest of the list after node n!

Also note that, in order to follow the steps shown in the picture above, we needed to use variable tmp to create the new node (in the picture, step 1 shows the new node just “floating” there, but that isn’t possible — we need to have some variable point to it so that we can set its next field, and so that we can set n’s next field to point to it). However, we could in fact accomplish steps 1 and 2 with a single statement that creates the new node, fills in its data and next fields, and sets n’s next field to point to the new node! Here is that amazing statement:

n.setNext(new Listnode(newdat, n.getNext()) ); // steps 1,2(a),2(b)

Now consider the worst-case running time for this add operation. Whether we use the single statement or the list of three statements, we are really doing the same thing:

  1. Using new to allocate space for a new node (start step 1).
  2. Initializing the new node’s data and next fields (finish step 1 + step 2(a)).
  3. Changing the value of n’s next field (step 2(b)).

We will assume that storage allocation via new takes constant time. Setting the values of the three fields also takes constant time, so the whole operation is a constant-time (O(1)) operation. In particular, the time required to add a new node immediately after a given node is independent of the number of nodes already in the list.

Removing a node

To remove a given node n from a linked list, we need to change the next field of the node that comes immediately before n in the list to point to whatever n’s next field was pointing to. Here’s the conceptual picture:

removenode

 

Note that the fact that n’s next field is still pointing to a node in the list doesn’t matter — n has been removed from the list, because it cannot be reached from L. It should be clear that in order to implement the remove operation, we first need to have a pointer to the node before node n (because that node’s next field has to be changed). The only way to get to that node is to start at the beginning of the list. We want to keep moving along the list as long as the current node’s next field is not pointing to node n. Here’s the appropriate code:

ListNode tmp = L;
while (tmp.getNext() != n) 
       tmp = tmp.getNext();  // find the node before n

Note that this kind of code (moving along a list until some condition holds) is very common. For example, similar code would be used to implement a lookup operation on a linked list (an operation that determines whether there is a node in the list that contains a given piece of data).
Note also that there is one case when the code given above will not work. When n is the very first node in the list, the picture is like this:

firstnode

 

In this case, the test (tmp.getNext() n)= will always be false, and eventually we will “fall off the end” of the list (i.e., tmp will become null, and we will get a runtime error when we try to dereference a null pointer). We will take care of that case in a minute; first, assuming that n is not the first node in the list, here’s the code that removes n from the list:

ListNode tmp = L;
while (tmp.getNext() != n) 
       tmp = tmp.getNext();  // find the node before n 
tmp.setNext( n.getNext() );  // remove n from the linked list

How can we test whether n is the first node in the list, and what should we do in that case? If n is the first node, then L will be pointing to it, so we can test whether L == n. The following before and after pictures illustrate removing node n when it is the first node in the list:

removeFirst

Here’s the complete code for removing node n from a linked list, including the special case when n is the first node in the list:

 if (L == n) {
  // special case: n is the first node in the list
    L = n.getNext();
} else {
  // general case: find the node before n, then "unlink" n
    ListNode tmp = L;
    while (tmp.getNext() != n) tmp = tmp.getNext();
    tmp.setNext( n.getNext() );
}

List class method

Let’s discuss these 3 functions:

  1. The version of add that adds to the end of the list.
  2. The version of add that adds to a given position in the list.
  3. The constructor.

add (to end of list)
Recall that the first version of method add adds a given value to the end of the list. We have already discussed how to add a new node to a linked list following a given node. The only question is how best to handle adding a new node at the end of the list. A straightforward approach would be to traverse the list, looking for the last node (i.e., use a variable tmp as was done above in the code that looked for the node before node n). Once the last node is found, the new node can be inserted immediately after it.
The disadvantage of this approach is that it requires O(N) time to add a node to the end of a list with N items. An alternative is to add a lastNode field (often called a tail pointer) to the List class, and to implement the methods that modify the linked list so that lastNode always points to the last node in the linked list (which will be the header node if the list is empty). There is more opportunity for error (since several methods will need to ensure that the lastNode field is kept up to date), but the use of the lastNode field will mean that the worst-case running time for this version of add is always O(1).

add (at a given position)
As discussed above for the “add to the end” method, we already know how to add a node to a linked list after a given node. So to add a node at position pos, we just need to find the previous node in the list.

Here’s a picture of the “ant, bat, cat” list, when the implementation includes a lastNode pointer:

lastnode

 

The List constructor
The List constructor needs to initialize the three List fields:

  1. Listnode items (the pointer to the header node)
  2. Listnode lastNode (the pointer to the last node in the list)
  3. int numItems

so that the list is empty. An empty list is one that has just a header node, pointed to by both items and lastNode.

Linked List Variations

Here we discuss 2 variations of a linked list:

  1. doubly linked lists
  2. circular linked lists

Doubly linked lists

Recall that, given (only) a pointer to a node n in a linked list with N nodes, removing node n takes time O(N) in the worst case, because it is necessary to traverse the list looking for the node just before n. One way to fix this problem is to require two pointers: a pointer the the node to be removed, and also a pointer to the node just before that one. Another way to fix the problem is to use a doubly linked list.

Here’s the conceptual picture:

doubly_linked

Each node in a doubly linked list contains three fields: the data, and two pointers. One pointer points to the previous node in the list, and the other pointer points to the next node in the list. The previous pointer of the first node, and the next pointer of the last node are both null. Here’s the Java class definition for a doubly linked list node: DoubleListNode

To remove a given node n from a doubly linked list, we need to change the prev field of the node to its right, and we need to change the next field of the node to its left, as illustrated below.

removenode_dbl

Here’s the code for removing node n:

 

// Step 1: change the prev field of the node after n
  DoubleListNode tmp = n.getNext();
  tmp.setPrev( n.getPrev() );

// Step 2: change the next field of the node before n
  tmp = n.getPrev();
  tmp.setNext( n.getNext() );

Unfortunately, this code doesn’t work (causes an attempt to dereference a null pointer) if n is either the first or the last node in the list. We can add code to test for these special cases, or we can use a circular, doubly linked list, as discussed below.

Circular linked lists

Both singly and doubly linked lists can be made circular. Here are the conceptual pictures:

circular

The class definitions are the same as for the non-circular versions. The difference is that, instead of being null, the next field of the last node points to the first node, and (for doubly linked circular lists) the prev field of the first node points to the last node.

The code given above for removing node n from a doubly linked list will work correctly except when node n is the first node in the list. In that case, the variable L that points to the first node in the list needs to be updated, so special-case code will always be needed unless the list includes a header node.

Another issue that you must address if you use a circular linked list is that if you’re not careful, you may end up going round and round in circles! For example, what happens if you try to search for a particular value val using code like this:

ListNode tmp = L;
while (tmp != null && !tmp.getData().equals(val)) tmp = tmp.getNext();

and the value is not in the list? You will have an infinite loop!

 

Comparison of Linked List Variations

The major disadvantage of doubly linked lists (over singly linked lists) is that they require more space (every node has two pointer fields instead of one). Also, the code to manipulate doubly linked lists needs to maintain the prev fields as well as the next fields; the more fields that have to be maintained, the more chance there is for errors. The major advantage of doubly linked lists is that they make some operations (like the removal of a given node, or a right-to-left traversal of the list) more efficient.

The major advantage of circular lists (over non-circular lists) is that they eliminate some special-case code for some operations. Also, some applications lead naturally to circular list representations. For example, a computer network might best be modeled using a circular list.

 

Complete Linked List Implementation

The complete linked list implementation may be found here: ListImpl
References
http://pages.cs.wisc.edu/~vernon/cs367/notes/4.LINKED-LIST.html

 

Thread Deadlock in Java

Deadlock occurs when a group of processes blocks forever because each process is waiting for resources which are held by another process in the group. What happens is that a task is stuck waiting for another task to release a resource which itself is stuck waiting for another task to release a resource and so on such that a circular wait loop ensures. The result is that no task can proceed. Thus a deadlock results.

The classis case of deadlock is the Dining Philosopher problem.
In this problem we have, say 5 philosophers sitting down for dinner at a round table.
To the left and right of each philosopher is a chopstick and there are 5 of these chopsticks F1 – F5, illustrated below:

Dining philosopher problem

Reference: http://samplecodes.files.wordpress.com/2008/11/23.jpg

In order for each philospher to eat, they must pick up the left and right chopsticks.
Each philosopher decides to pick up the chopstick on his right 1st before picking up the one on his left.
After picking up the chopstick on the right, each philosopher attempts to pick up the chopstick on his left and if it is not yet available, has to wait.

Thus we can have the following scenario:

P1 picks up F1, waits to pick up F2
P2 picks up F2, waits to pick up F3
P3 picks up F3, waits to pick up F4
P4 picks up F4, waits to pick up F1
P5 picks up F5, waits to pick up F1

Thus we have a circular wait scenario, where each philosopher is waiting on the next philosopher to his left to drop his right chopstick and so on such that no philosopher can eat.

Here is Java code for a simpler example of deadlock involving 2 tasks:

public class DeadlockDemo
{
  public Integer obj1 = 1;
  public Integer obj2 = 2;

  private class Runnable1 implements Runnable
  {
   public void run()
   {
    synchronized(obj1)
    {
     System.out.println("R1: Obtained lock on obj1:" + obj1);
     System.out.println("R1: Waiting to obtain lock on obj2..."):
     synchronized(obj2)
     {
      System.out.println("R1: Obtained lock on obj2:" + obj2);
     }
    }
   }
  }

  private class Runnable2 implements Runnable
  {
   public void run()
   {
    synchronized(obj2)
    {
     System.out.println("R2: Obtained lock on obj2:" + obj2);
     System.out.println("R2: Waiting to obtain lock on obj1..."):
     synchronized(obj1)
     {
      System.out.println("R2: Obtained lock on obj1:" + obj1);
     }
    }
   }
  }

  public static void main(String[] args)
  {
   DeadlockDemo dDemo=new DeadlockDemo();
   Runnable r1=dDemo.new Runnable1();
   Runnable r2=new Runnable2();
   new Thread(r1).start();
   new Thread(r2).start();
  }
 }

 

I ran the above code and it produced the following result:

R2: Obtained lock on obj2:2
R2: Waiting to obtain lock on obj1...
R1: Obtained lock on obj1:1
R1: Waiting to obtain lock on obj2...

and the program hung with the 2 threads stuck in a circular wait.

Defining and Starting a Thread in Java

There are 3 ways to do this:

1. Subclass Thread class

i. Subclass the Thread class and override the run() method
ii. Instantiate the Thread subclass.
iii. Call Thread.start() method.

public class MyThread extends Thread {
   public void run() { 
      System.out.println("Thread via Extending Thread!"); 
   }
   public static void main(String []args) { 
      (new MyThread()).start(); 
   }

}

 

2. Provide a Runnable object by implementing Runnable

Runnable interface defines a single method run(), meant to contain the code executed in the thread.

i. Implement the Runnable interface by implementing run() method.
ii. Pass instance of Runnable object to the Thread(..) constructor.
iii. Call Thread.start() method.

public class MyRunnable implements Runnable {

   @Override
   public void run() {
      System.out.println("Thread via Implementing Runnable!");
   }

   public static void main(String[] args) {
      (new Thread(new MyRunnable())).start();

   }

}

Notice that both cases above invoke Thread.start() in order to start the new thread. In either case above, the result is a Thread object, where the run() method is the body of the thread. When the start() method of the Thread object is called, the interpreter creates a new thread to execute the run() method. The new thread continues to run until the run() method exits. Meanwhile the original thread continues running itself, starting with the statement following the start() method.

 

3. Using Executor interface

The Executor interface can be used to invoke threads as well. It isn’t really a new idiom, since either a Thread object or object that implements Runnable needs to be created first. The Executor interface provides a layer of indirection between a client and the execution of a task; instead of a client executing a task directly, an intermediate object executes the task. Executors allow you to manage the execution of asynchronous tasks without having to explicitly manage the lifecycle of threads.

 

  1. Implement and create a new instance of Runnable.
  2. Create a concrete instance of ExecutorService by calling one of the Executors factory methods.
  3. Call Executor.execute(..) method, passing the Runnable object as argument.
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class MyExecutor {

	public static void main(String[] args) {
		ExecutorService exec = Executors.newCachedThreadPool(); 
	    for(int i = 0; i < 5; i++) 
	      exec.execute(new MyRunnable()); //MyRunnable implements Runnable
	}

}

Thread vs. Runnable
Which of these idioms should you use? The first idiom, which employs a Runnable object, is more general, because the Runnable object can subclass a class other than Thread. Also using Runnable enhances separation of concerns vi a composition, by separating the method of execution from the interface used to construct the Runnable
The second idiom is easier to use in simple applications, but is limited by the fact that your task class must be a descendant of Thread.
References: http://www.coderanch.com/t/233370/threads/java/Thread-vs-Runnable
http://stackoverflow.com/questions/541487/java-implements-runnable-vs-extends-thread

Executor Factory Methods

FactoryMethod Details
newCachedThreadPool() Creates thread pool that creates new threads as needed, but will reuse previously constructred threads when they’re available
newFixedThreadPool(..) Creates thread pool that reuses a fixed number of threads operating off a shared unbounded queue
newScheduledThreadPool(..) Creates a thread pool that can schedule commands to run after a given delay, or to execute periodically
newSingleThreadExecutor() Creates an Executor that uses a single worker thread operating off an unbounded queue
newSingleThreadScheduledExecutor() Creates a single-threaded executor that can schedule commands to run after a given delay, or to execute periodically

Abstract Data Types vs Data Structures

What is the difference between an abstract data type (ADT) and a data structure ?

The question above can be reframed in a more concrete manner by asking this question:
What is the difference between an array and a stack ?
An array is a data structure while a stack is an abstract data type.
An abstract data type (ADT) is a specification for how an ‘data’ interface should behave without any reference to its actual implementation.

The wikipedia definition of an ADT is as follows:
An abstract data type is defined as a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects.
From the NIST Dictionary of Algorithms and Data structures we have the following definition of an ADT:
A set of data values and associated operations that are precisely specified independent of any particular implementation.

Thus in the case of a stack, it exhibits Last In First Out (LIFO) behavior when elements are added to and removed from it. The concrete implementation of the ADT is where the data structure comes in. Thus a Stack can be implemented as an array or as a linked list.

This might lead us to conclude that an ADT is more a theoretical or abstract concept, while the data structure has more to do with the concrete implementation.

Under the definitions above, the following constructs are ADTs:

  • Stack
  • Queue
  • Bag
  • List
  • Priority Queue
  • Trie
  • Heap
  • Binary Tree
  • Set
  • Map

while these are data structures:

  • array
  • linked list
  • hash map/dictionary

We can gain a better understanding of this when we consider these constructs in Java.
The ADT corresponds to the interface type, while the data structure would correspond to the concrete class.Thus a Java array, ArrayList, LinkedList, HashMap are actually data structures and the corresponding interfaces they implement would be equivalent to ADTs.
See this article for more about abstract data types in Java

Variable storage in Java

In order to figure out where a variable is stored in Java the most important factor is where the variable is declared.
A general rule of thumb is this:

  • local variables are stored on the stack
  • instance variables are stored on the heap
  • static variables are stored on the PermGen area of the heap

There are caveats to this however, which are explained below:

Variable Storage Details

Local variables
Primitives and object references declared in a method will be stored on the stack. However, the actual object, if created using new() will be stored on the heap, regardless of where the declaration took place. Hence in the following piece of code:
void aMethod()
{
int playerNum=5;
Player pl=new Player();
}

The primitive variable playerNum and object reference variable pl will be stored on the stack, while the actual Player object itself will live on the heap. When the code exits aMethod and goes out of scope, playerNum and pl will be popped from the stack and cease to exist but the Player object will persist on the heap until it is eventually garbage collected.
Instance variables
Instance variables, even primitives live on the heap.

Consider the code below:

public class Car {
int vinNumber;
String make;
String model;
int year;
String class;


Car(int vin, String make, String model, int year, String class)
{
this.vinNumber=vin;
this.make=make;
this.model=model;
this.year=year;
this.class=class;
}
...
public static void main(String[] args)
{
Car c=new Car(19281,"Audi", "A6",2012,"sedan");
}
}

Since an instance of Car can only be instantiated via a call to new(), we see that:

  • The Car object c lives on the heap
  • All instance primitives and objects that are part of the Car object are also stored on the heap.

Static variables
The rule for static variables is this: Static methods, primitive variables and object references are stored in the PermGen section of the heap since they are part of the reflection i.e. class, not instance related data. However, in the case of objects, the actual object itself is stored in the regular areas of the heap (young/old generation or survivor space).

 

References:

  1. http://www.tutorialspoint.com/java/java_variable_types.htm
  2. http://www.coderanch.com/t/202217/Performance/java/JVM-heap-stores-local-objects
  3. http://stackoverflow.com/questions/8387989/where-is-a-static-method-and-a-static-variable-stored-in-java-in-heap-or-in-sta
  4. http://stackoverflow.com/questions/3698078/where-does-the-jvm-store-primitive-variables

Clustered Indexes

In this article I explain what is meant by a clustered index and compare it with a non-clustered index.

Definition

A clustered index on a table determines the order in which the rows of the table are stored on disk. If the table has a clustered index, the rows of the table are stored in the same order as that of the clustered index. As an illustration, suppose we have a Customer table that contains the following columns:

  • customerID
  • firstName
  • lastName

where customerID is the primary key on the table. If we define the clustered index on customerID, then the rows of the table will be stored in sorted order according to customerID. This means that rows with customerID=1000, customerID=1001, etc will be adjacent to each other on disk.

Advantages and Disadvantages

The advantages of having a clustered index include the following:

  1. Range queries involving the clustered index will be faster since once the row with 1st key value is located the remaining rows physically stored next to each other and no more searching is needed.
  2. The leaf nodes of the B-tree that make up the clustered index contain the actual data pages as opposed to a non-clustered index where the leaf nodes contain pointers to rows on data pages. Hence there is 1 less level of indirection for a clustered index and this improves performance.

The disadvantages of a clustered index include:

  1. Updates involving columns used in the clustered index result in a performance hit since the rows may have to be re-arranged to keep the table in sorted order in line with the clustered index. In light of this, it is recommended that a clustered index is created on a primary or foreign key, since this would be less prone to updates.

Comparison of clustered vs non-clustered indexes

  • Returning to our Customer table example, if we have a clustered index on Customer id, then the leaf node of the clustered index will contain the actual row data (data pages) for a particular customerID while for a non-clustered index the value of customerID and a pointer to the actual row is what is stored at the leaf node.
  • There can only be 1 clustered index per table, but multiple non-clustered indexes (up to 249 in the case of Sybase). This is because the rows in the table are physically ordered according to the clustered index and there is only 1 way of doing so.
  • A clustered index determines the order in which the rows of the table can be stored on disk, but this is not the case for a non-clustered index
  • A clustered index can be a performance hit in the case of updates to an indexed column since the rows may have to be re-ordered in order to maintain order. Range queries and queries involving indexed foreign keys tend to show better performance for a clustered vs non-clustered index.

References

For a more in depth look at clustered indexes, see the following very good article by Michelle Ufford : 
Effective Clustered Indexes

Removing duplicates from a table with no primary keys

Consider the following table:

mysql> SELECT * FROM Philosopher;
+---------------+-------------+-----------+
| philosopherID | firstName | lastName |
+---------------+-------------+-----------+
| 1234 | John | Locke |
| 1234 | John | Locke |
| 2345 | Rene | Descartes |
| 2347 | John Stuart | Mill |
| 1562 | Emmanuel | Kant |
| 1562 | Emmanuel | Kant |
| 1671 | Baruch | Spinoza |
| 1562 | Emmanuel | Kant |
| 1761 | Jean-Paul | Sartre |
+---------------+-------------+-----------+
9 rows in set (0.00 sec)

Come up with a strategy to first identify and then remove the duplicate rows.

What SQL queries would you use ?

Solution

1. Identify duplicate rows:

SELECT * FROM Philosopher
 GROUP BY philosopherID, firstName, lastName HAVING COUNT(*) > 1;

2. Remove duplicate rows

i. Create temporary table and copy data over:

CREATE TEMPORARY TABLE PhilosopherDups
(
 rowId INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
 philosopherID INT,
 firstName VARCHAR(50),
 lastName varchar(50)
)

SELECT philosopherID,firstName, lastName FROM Philosopher;

ii. Truncate original table and do SELECT DISTINCT:

TRUNCATE TABLE Philosopher;
INSERT INTO Philosopher(philosopherID, firstName, lastName)
SELECT DISTINCT philosopherID, firstName, lastName 
FROM PhilosopherDups;

iii. Another option to using DISTINCT would be to use the rowId in the temptable to delete duplicate rows, truncate the original table and copy data back:

DELETE FROM PhilosopherDups
WHERE rowID IN 
SELECT MAX(rowId) FROM PhilosopherDups  
GROUP BY philosopherID, firstName, lastName 
HAVING COUNT(*) > 1;

TRUNCATE TABLE Philosopher;

INSERT INTO Philosopher(philosopherID, firstName, lastName)
SELECT philosopherID, firstName, lastName 
FROM PhilosopherDups;

Pass-by-value vs Pass-by-reference in Java and C++

In this article I illustrate what it means to pass-by-value as opposed to pass-by-reference with a focus on Java vs C++.

The question often asked is this : Is Java pass-by-reference ?
A common and often erroneous answer is : Java is pass by reference for objects, and pass-by-value for primitives.
This is WRONG. To illustrate why this is so, let me refer you to this quote by the father
of Java himself, James Gosling:

Some people will say incorrectly that objects are passed “by reference.” In programming language design, the term pass by reference properly means that when an argument is passed to a function, the invoked function gets a reference to the original value, not a copy of its value. If the function modifies its parameter, the value in the calling code will be changed because the argument and parameter use the same slot in memory…. The Java programming language does not pass objects by reference; it passes object references by value. Because two copies of the same reference refer to the same actual object, changes made through one reference variable are visible through the other. There is exactly one parameter passing mode — pass by value — and that helps keep things simple.

— James Gosling, et al., The Java Programming Language, 4th Edition

The above clearly states that Java passes object references by value meaning that when the reference is passed, a copy of that reference (which is an address) is passed. Since the copy of the reference and the reference refer to the same object, if a call is made to a method that modifies the object in Java, that object is modified, hence the line “Because two copies of the same reference refer to the same actual object, changes made through one reference variable are visible through the other”.

I will now illustrate what pass-by-reference means, via a clear example in C++.

Let us create the following files in an appropriate directory with the following contents:

PassByReference.hpp:
#ifndef PassByReference_hpp
#define PassByReference_hpp
void swapIntByRef(int& iParam, int& jParam);
void swapIntByVal(int iParam, int jParam);
#endif

PassByReference.cpp:
#include <iostream>
#include "PassByReference.hpp"
using namespace std;

int main()
{
int i=1000;
int j=2300;

cout << "Illustration of Pass By Reference:\n";
cout << "Before: i= " << i << " j=" << j;
cout << "\n";
swapIntByRef(i,j);
cout << "After: i= " << i << " j=" << j;
cout << "\n";

cout << "\nIllustration of Pass By Value:\n";

i=1100;
j=2500;

cout << "Before: i= " << i << " j=" << j;
cout << "\n";
swapIntByVal(i,j);
cout << "After: i= " << i << " j=" << j;
cout << "\n";

}

void swapIntByRef(int& iParam, int& jParam)
{
int temp(iParam);
iParam=jParam;
jParam=temp;
}

void swapIntByVal(int iParam, int jParam)
{
int temp(iParam);
iParam=jParam;
jParam=temp;
}

We now compile and run the code (assuming you have the g++ compiler):

g++ -o PassByReference PassByReference.cpp
./PassByReference
Illustration of Pass By Reference:
Before: i=1000 j=2300
After: i=2300 j=1000

Illustration of Pass By Value:
Before: i=1100 j=2500
After: i=1100 j=2500

The results above perfectly illustrate the difference between passing by reference vas pass-by-value, at least from the C++ point of view.
By using the reference operator &, when the value of i is passed to the swapIntByRef function, the actual parameter value is modified in the function such that when the function returns back to the main() function that calls it and the values of i and j are printed out, the values of i and j have been swapped.

In the latter case of pass-by-value, copies of i and j are passed, not references via the & operator.
The result of this is that even though an attempt is made to swap the values in the swapIntByVal function, the original actual parameter values remain unchanged, and this is what we see in the result.

The latter case is what prevails in Java even for all cases, even in the case of objects.

Here is an illustration in Java for both primitives and object references:

Create the file PassByValueDemo.java:

public class PassByValueDemo {

public static void main(String[] args) {
int i=1000;
int j=2300;
System.out.println("Primitives Case");
System.out.println("----------------");
System.out.println(" Before: i=" + i + " j=" + j);
swapInt(i,j);

System.out.println(" After: i=" + i + " j=" + j + "\n");

System.out.println("Wrapper Case");
System.out.println("--------------");
Integer iw=1000;
Integer jw=2300;
System.out.println(" Before: iw=" + iw + " jw=" + jw);
swapInteger(iw,jw);

System.out.println(" After: iw=" + iw + " jw=" + jw);

}

static void swapInt(int iParam, int jParam)
{
int temp=jParam;
jParam=iParam;
iParam=temp;
System.out.println(" iParam=" + iParam + " jParam=" + jParam);

}

static void swapInteger(Integer iParam, Integer jParam)
{
Integer temp=jParam;
jParam=iParam;
iParam=temp;
System.out.println(" iParam=" + iParam + " jParam=" + jParam);
}

}

We now compile and run the code:

javac PassByValueDemo.java

java PassByValueDemo

which produces:

Primitives Case
----------------
Before: i=1000 j=2300
iParam=2300 jParam=1000
After: i=1000 j=2300

Wrapper Case
--------------
Before: iw=1000 jw=2300
iParam=2300 jParam=1000
After: iw=1000 jw=2300

Thus we can see that in both cases of primitive and wrapper classes the values of the actual parameters i and j remain unchanged in the calling routine main. There is no way to achieve the effect we observed in the C++ method PassByRef in Java where the original actual parameters are changed. The underlying object that the reference refers to can be changed via a call to a modifying method on the referenced object, but the reference parameter is always a copy of the original actual parameter.

Summary

  • C++ supports pass-by-value and pass by reference via its & operator.

  • Java supports pass-by-value ONLY. What is erroneously thought of as pass-by-reference is really pass-by-value of an object reference.

Future of 21st Century Databases meetup

Some takeaways from Future of 21st Century Databases meetup hosted by AppNexus in NYC and a roundtable featuring NoSQL db heavyweights :

  • Eliot Horowitz, CTO and Co-Founder, 10gen / MongoDB
  • Barry Morris, Founder and CEO, NuoDB
  • Bob Wiederhold, President and CEO, Couchbase

NuoDB
SQL interface but NOSQL underneath
Doesn’t have a document model as yet
The How to elastically scale SQL problem has been solved
NuoDB isn’t open source yet.
Largest known installation: 400 nodes w/ sharding

CouchBase
Many travel sites use CouchDB – e.g. Orbitz migrated its cache from Oracle Coherence
to Couchbase. Full presentation here.
Couchbase doesn’t consider NuoDB a competitor since NuoDB isn’t really document database.
Couchbase DevDays – ways to build up skills
Largest known installation: 80 nodes w/ no application sharding
SQL databases will still be around – but growth area is NOSQL

MongoDB
Mongo has document features more than Couchbase
Largest known installation: 100 nodes w/ no application sharding

Installing Hortonworks Hadoop on Amazon EC2

After viewing this youtube video and following article How to Hadoop , I was able to successfully install Hortonworks Hadoop on a 4-node cluster on Amazon EC2. However I needed to make the following additions below to make it work properly:

1. Make sure that you install postgresql-8.4, not postgresql-9.1

If you did yum install postgresql, it may have installed postgresql-9.1
if that is the case, then you need to erase it:

yum erase postgresql

Download the 8.4 version
curl -O http://yum.postgresql.org/8.4/redhat/rhel-5-x86_64/pgdg-centos-8.4-3.noarch.rpm

yum install postgresql84
yum install postgresql84-server

2. edit the file /usr/sbin/ambari-server.py as follows:

Change

PG_HBA_DIR = "/var/lib/pgsql/data/"

to

PG_HBA_DIR = "/var/lib/pgsql/8.4/data/"

3. Make sure that the bindir for postgresql is added to the PATH

export PATH=$PATH:/usr/pgsql-8.4/bin

4. Make sure that the “ambari-server” user is created and add the following

Otherwise, you will obtain the following error:
...
internal exception: org.postgresql.util.psqlexception: fatal: ident authentication failed for user "ambari-server"

Also, edit the /var/lib/pgsql/8.4/data/pg_hba.conf file
and add the following line:

host    all         all         127.0.0.1/32          md5
solution referenced from : http://stackoverflow.com/questions/4562471/connecting-to-local-instance-of-postgresql-with-jdbc