Monday, April 13, 2009

What are JMS Grid Network Connections?

One of the interesting features in JMS Grid (aka Sun Message Service Grid) is the concept of Network Connections.

JMS Grid has two types of connections that can exist between active daemons. Cluster connections are the more familiar of the connection types: these are the synchronous connections that exist between daemons in a cluster. Each daemon has its own data store, therefore these connections ensure the data store for each connection is maintained in a consistent state. This includes ensuring the same messages are available at each daemon, and that messages are not simultaneously dequeued from two different daemons.

JMS Grid also has the concept of Network connections. These are the connections that can exist between clusters. Network connections are asynchronous, and are recommended for links spanning large geographical areas or unreliable networks.

The following diagram shows the distinction between these types of connection:

I was initially drawn to Network connections as a way of solving a specific high availability solution. The solution in question consisted of two sites: a primary site and a disaster recovery site. The primary site needed two be highly available in its own right and consisted of two daemons on a LAN: this was an obvious cluster connections.

In addition, the disaster recovery site needed an up-to date store of all messages. In the case of a failover to disaster recovery, connections would be redirected to the DR daemons, and continue dequeuing any existing messages, or enqueuing new messages. Asynchronous network connections seemed like the perfect way of ensuring message state was persisted at the geographically remote disaster recovery site without impacting performance at the active site.

Sun's documentation implies this usage should succeeed:

"Clusters can be connected to each other to form networks of clusters.
These intercluster connections are known as network connections...
Networks differ from clusters in that they are loosely coupled."

Unfortunately the JMS Grid User Guide is vague on the exact intention of network connections. The following statements imply some of the caveats that apply to network connections:

"Messages are only routed between two clusters when a message sent by a client on one cluster needs to be delivered to a client on the other cluster."

This begged the question of how JMS Grid knows what messages should be routed between clusters - my initial inclination was that all messages would be routed to all clusters - thereby allowing messages to be enqueued onto any daemon and dequeued from any daemon.

The following provided a clue to how this works:

"A network filter explicitly controls the propagation of subscriptions and hence messages across a network connection."

I initially overlooked this statement, but it is the key to understanding the purpose of network connections. Network connections are specifically intended for cases where messages are enqueued in one cluster, and then dequeued in a different (but specific) cluster at the other end of a network connection. This is not what I needed: I needed messages to be enqueued and dequeued from the same cluster, unless there was a failover to the DR site, in which case dequeuing would occur in a different cluster. I specifically wanted all messages to be available for dequeuing from any daemon in the architecture.

The documentation for JMS Grid does not clearly state that network connections do not fulfil the use case I intended, but I have verified this through trial and error. Given this intended use, it is difficult to see many scenarios where network connections can provide any value. I have never encountered a scenario where a business always wanted to enqueue in one geographic region, and then dequeue in a seaparte geographic region. I have however come across many situations where the usage I intended is required.

It does seem reasonable that network connections cannot fulfill the purpose I intended. Network connections use asynchronous links - but send data to active daemons. The JMS Grid network could not manage transactional dequeuing of messages across such a network, since two clients may attempt to dequeue in different clusters at the same time. This scenario can only be handled with synchronous communication (cluster connections) between daemons, as the dequeuing daemon needs to check with all other daemons that the message hasn't been dequeued already.

My intended usage of network connections was more akin to the use of Dataguard with Oracle. This uses asynchronous messaging to keep two Oracle instances in sync. The reason Dataguard is able to accomplish this over an asynchronous channel is that only one of the Oracle instances is active at a time - and therefore data does not need to be synchronized in both directions simultaneously.

JMS Grid only allowed my required architecture to be implemented by placing a clustered daemon at the DR location. This however limited by flexibility, since only two active daemons are recommended in any cluster. The ultimate solution to this would be providing database backed storage to JMS Grid. This would allow me to use Dataguard to ensure my DR site had a copy of the message store, and remove the responsibility from JMS Grid.

cisdal - simply integration

No comments:

Post a Comment