Today, GlassFish 3.1 was released. This is an important milestone for our team and for me as I’ve been working on this release for the last 10 months. As the technical lead for the administration infrastructure area, I’ve been involved with the implementation of many parts of the clustering feature for GlassFish 3.1, which was one of the main release drivers.
The clustering functionality in 3.1 is very much like that from GlassFish 2.1, but the design and implementation is quite different in several areas. Specifically, the data synchronization algorithm that used when an instance is started uses a more efficient algorithm for determined what files need to be transfered to the instance, and the method for performing dynamic reconfiguration is now based on command replication rather than state replication. The remainder of this article focuses on how dynamic reconfiguration works in GlassFish 3.1. The design and implementation for this was primarily developed by Vijay Ramachandran, but towards the later part of the project, I inherited responsibility for this part of the code as Vijay moved on to another project within Oracle.
The goal of dynamic reconfiguration is to ensure that as administrative actions are taken on clusters and instances, the results of those actions are propagated to all effected instances. In the 3.1 design, this is accomplished using command replication. As each asadmin command (or console action via the REST interface) is executed, the command is first executed on the domain administration server (DAS), and then, if necessary, it is replicated to one or more instances. To do this:
- commands are annotated to indicate where they need to be executed, for example, all instances, or all instances in a cluster, or just one instance,
- given the target for the command, the infrastructure determines on which specific instances to execute the command,
- depending on the state of the instances (up or down), either the command is executed for information about a command that has not been executed is preserved.
To accomplish the first step, a new annotation, @ExecuteOn, has been developed. The @ExecuteOn annotation tells the framework where the command should be executed. Many of the list-* commands only need to be executed on the DAS because the DAS has all of the information that is necessary. When a new instance in a cluster is created, the registration command has to be executed on all of the instances in the cluster, so that each instance knows about the other instances in the cluster.
In GlassFish 3.1, many subcommands now take a “–target” option. For example, the –target option for the deploy subcommand specifies the cluster to which the application is to be deployed. The deploy command uses the @TargetType annotation (also new for 3.1) to specify the valid target types, such as cluster, config, clustered instance or stand-alone instance. For example, with deploy, cluster and stand-alone instance are valid, but it is not allowed to deploy an application to only a cluster instance. The framework (GlassFishClusterExecutor class) uses this information and the name of the target to determine on which instances to execute the command.
Once the framework has the list of instances, the ClusterOperationUtil class takes care of executing the command on each instance. If the instance is down, a state file on the DAS (config/.instancestate) is updated with information about what command failed to execute. The information in this file is used in the output of the list-instances subcommand to report whether an instance needs to be restarted because it missed an update. Once an instance has missed an update, no further updates are sent to that instance until it is restarted.
When you execute commands that are replicated to instances, the command output may including warnings about being unable to replicate commands if the target instance(s) is down. This output is suppressed if the instance has never been started. The command replication framework implements state transitions for the instances that are recorded in the .instancestate file.
Because commands are only replicated to those instances that need the information, the domain.xml files for each instance will not be identical overtime. However, the instances do have the information that they need. When instances are restarted, the synchronization process will copy the domain.xml from the DAS to the instance, bringing them back into exact synchronization.
When implementing an additional asadmin subcommand for GlassFish 3.1, it may be necessary to use the @ExecuteOn and @TargetType annotations to make the command execute correctly.