A while ago, I was experimenting with h2o
and wanted to generate a Plain Old Java Object (POJO) model. I found the documentation useful but I decided to write a post with a simple example for future reference.
In this post, we will see how to:
- build a simple
h2o
model inR
. - convert the model to
POJO
. - create a
main
program in Java to use thePOJO
model. - compile and run the program.
Initialize h2o
First of all, we need to load h2o
library and connect to an h2o
instance.
## load packages
library(h2o)
## initialize h2o
h2o.init()
Import data
Once we make sure you are connected to h2o, we will import the iris
dataset, which we are going to use in our example, using h2o.importFile()
.
## import data
iris_path = system.file("extdata", "iris.csv", package = "h2o")
iris_hex = h2o.importFile(path = iris_path, destination_frame = "iris_hex")
We can check the returned dataframe, which has four columns corresponding to the flowers features and a the last column corresponds to the class/label.
## display dataframe
iris_hex
C1 C2 C3 C4 C5
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa
6 5.4 3.9 1.7 0.4 Iris-setosa
[150 rows x 5 columns]
Build h2o
model
For demo purposes, we will create a simple k-means model to cluster the entries in the iris
dataset.
## create kmeans model with 3 clusters
iris_km_model = h2o.kmeans(training_frame = iris_hex,
x = c("C1", "C2", "C3", "C4"),
k = 3,
model_id = "iris_km_model",
seed = 1000)
You can print some info about the model as follows:
## print model summary
iris_km_model
Model Details:
==============
H2OClusteringModel: kmeans
Model ID: iris_km_model
Model Summary:
number_of_rows number_of_clusters number_of_categorical_columns
1 150 3 0
number_of_iterations within_cluster_sum_of_squares total_sum_of_squares
1 6 140.02859 596.00000
between_cluster_sum_of_squares
1 455.97141
H2OClusteringMetrics: kmeans
** Reported on training data. **
Total Within SS: 140.0286
Between SS: 455.9714
Total SS: 596
Centroid Statistics:
centroid size within_cluster_sum_of_squares
1 1 52.00000 42.99326
2 2 48.00000 48.87702
3 3 50.00000 48.15831
Convert h2o
model to POJO
To convert your model to POJO
, you can simply use h2o.download_pojo()
with a path to save the java file.
## download pojo model (replace "saved_models" with your path)
h2o.download_pojo(iris_km_model, path = here::here("saved_models"))
A java file, entitles iris_km_model.java
, will be created in the given directory.
You can expand the following section to see the code corresponding to the POJO
model in this file.
Click to expand
/*
Licensed under the Apache License, Version 2.0
http://www.apache.org/licenses/LICENSE-2.0.html
AUTOGENERATED BY H2O at 2019-07-13T17:22:02.421+02:00
3.22.1.1
Standalone prediction code with sample test data for KMeansModel named iris_km_model
How to download, compile and execute:
mkdir tmpdir
cd tmpdir
curl http:/localhost/127.0.0.1:54321/3/h2o-genmodel.jar > h2o-genmodel.jar
curl http:/localhost/127.0.0.1:54321/3/Models.java/iris_km_model > iris_km_model.java
javac -cp h2o-genmodel.jar -J-Xmx2g -J-XX:MaxPermSize=128m iris_km_model.java
(Note: Try java argument -XX:+PrintCompilation to show runtime JIT compiler behavior.)
*/
import java.util.Map;
import hex.genmodel.GenModel;
import hex.genmodel.annotations.ModelPojo;
import hex.genmodel.IClusteringModel;
@ModelPojo(name="iris_km_model", algorithm="kmeans")
public class iris_km_model extends GenModel implements IClusteringModel {
public hex.ModelCategory getModelCategory() { return hex.ModelCategory.Clustering; }
// Names of columns used by model.
public static final String[] NAMES = NamesHolder_iris_km_model.VALUES;
// Column domains. The last array contains domain of response column.
public static final String[][] DOMAINS = new String[][] {
/* C1 */ null,
/* C2 */ null,
/* C3 */ null,
/* C4 */ null
};
public iris_km_model() { super(NAMES,DOMAINS,null); }
public String getUUID() { return Long.toString(2946031392675382139L); }
// Pass in data in a double[], pre-aligned to the Model's requirements.
// Jam predictions into the preds[] array; preds[0] is reserved for the
// main prediction (class for classifiers or value for regression),
// and remaining columns hold a probability distribution for classifiers.
public final double[] score0( double[] data, double[] preds ) {
Kmeans_preprocessData(data,iris_km_model_MEANS.VALUES,iris_km_model_MULTS.VALUES,iris_km_model_MODES.VALUES);
preds[0] = KMeans_closest(iris_km_model_CENTERS.VALUES, data, DOMAINS);
return preds;
}
// Pass in data in a double[], in a same way as to the score0 function.
// Cluster distances will be stored into the distances[] array. Function
// will return the closest cluster. This way the caller can avoid to call
// score0(..) to retrieve the cluster where the data point belongs.
public final int distances( double[] data, double[] distances ) {
Kmeans_preprocessData(data,iris_km_model_MEANS.VALUES,iris_km_model_MULTS.VALUES,iris_km_model_MODES.VALUES);
int cluster = KMeans_distances(iris_km_model_CENTERS.VALUES, data, DOMAINS, distances);
return cluster;
}
// Returns number of cluster used by this model.
public final int getNumClusters() {
int nclusters = iris_km_model_CENTERS.VALUES.length;
return nclusters;
}
}
// The class representing training column names
class NamesHolder_iris_km_model implements java.io.Serializable {
public static final String[] VALUES = new String[4];
static {
NamesHolder_iris_km_model_0.fill(VALUES);
}
static final class NamesHolder_iris_km_model_0 implements java.io.Serializable {
static final void fill(String[] sa) {
sa[0] = "C1";
sa[1] = "C2";
sa[2] = "C3";
sa[3] = "C4";
}
}
}
// Column means of training data
class iris_km_model_MEANS implements java.io.Serializable {
public static final double[] VALUES = new double[4];
static {
iris_km_model_MEANS_0.fill(VALUES);
}
static final class iris_km_model_MEANS_0 implements java.io.Serializable {
static final void fill(double[] sa) {
sa[0] = 5.843333333333333;
sa[1] = 3.053999999999999;
sa[2] = 3.758666666666667;
sa[3] = 1.1986666666666665;
}
}
}
// Reciprocal of column standard deviations of training data
class iris_km_model_MULTS implements java.io.Serializable {
public static final double[] VALUES = new double[4];
static {
iris_km_model_MULTS_0.fill(VALUES);
}
static final class iris_km_model_MULTS_0 implements java.io.Serializable {
static final void fill(double[] sa) {
sa[0] = 1.2076330213409388;
sa[1] = 2.3063033203973875;
sa[2] = 0.5667583466456685;
sa[3] = 1.3103399393571;
}
}
}
// Mode for categorical columns
class iris_km_model_MODES implements java.io.Serializable {
public static final int[] VALUES = new int[4];
static {
iris_km_model_MODES_0.fill(VALUES);
}
static final class iris_km_model_MODES_0 implements java.io.Serializable {
static final void fill(int[] sa) {
sa[0] = -1;
sa[1] = -1;
sa[2] = -1;
sa[3] = -1;
}
}
}
// Normalized cluster centers[K][features]
class iris_km_model_CENTERS implements java.io.Serializable {
public static final double[][] VALUES = new double[3][];
static {
iris_km_model_CENTERS_0.fill(VALUES);
}
static class iris_km_model_CENTERS_0_0 implements java.io.Serializable {
public static final double[] VALUES = new double[4];
static {
iris_km_model_CENTERS_0_0_0.fill(VALUES);
}
static final class iris_km_model_CENTERS_0_0_0 implements java.io.Serializable {
static final void fill(double[] sa) {
sa[0] = -0.0685873626223117;
sa[1] = -0.8873945545098232;
sa[2] = 0.3438624614956358;
sa[3] = 0.2839741837806723;
}
}
}
static class iris_km_model_CENTERS_0_1 implements java.io.Serializable {
public static final double[] VALUES = new double[4];
static {
iris_km_model_CENTERS_0_1_0.fill(VALUES);
}
static final class iris_km_model_CENTERS_0_1_0 implements java.io.Serializable {
static final void fill(double[] sa) {
sa[0] = 1.1276273336771014;
sa[1] = 0.08687075840163734;
sa[2] = 0.9821922147369432;
sa[3] = 0.9954215739316106;
}
}
}
static class iris_km_model_CENTERS_0_2 implements java.io.Serializable {
public static final double[] VALUES = new double[4];
static {
iris_km_model_CENTERS_0_2_0.fill(VALUES);
}
static final class iris_km_model_CENTERS_0_2_0 implements java.io.Serializable {
static final void fill(double[] sa) {
sa[0] = -1.0111913832028125;
sa[1] = 0.8394944086246519;
sa[2] = -1.3005214861029282;
sa[3] = -1.2509378621062437;
}
}
}
static final class iris_km_model_CENTERS_0 implements java.io.Serializable {
static final void fill(double[][] sa) {
sa[0] = iris_km_model_CENTERS_0_0.VALUES;
sa[1] = iris_km_model_CENTERS_0_1.VALUES;
sa[2] = iris_km_model_CENTERS_0_2.VALUES;
}
}
}
Create main
java program
Now you need to write the main program main.java
and save it with in the same directory with iris_km_model.java
. The program is supposed to take four values and return the predicted cluster.
Note that you need to:
provide the the pojo model name included in the file downloaded earlier in the previous step(
iris_km_model
).use the right class for your model, which is
ClusteringModelPrediction
here. You can see further details about classes on the POJO Model Javadoc page
import java.io.*;
import hex.genmodel.easy.RowData;
import hex.genmodel.easy.EasyPredictModelWrapper;
import hex.genmodel.easy.prediction.*;
public class main {
private static String modelClassName = "iris_km_model";
public static void main(String[] args) throws Exception {
hex.genmodel.GenModel rawModel;
rawModel = (hex.genmodel.GenModel) Class.forName(modelClassName).newInstance();
EasyPredictModelWrapper model = new EasyPredictModelWrapper(rawModel);
RowData row = new RowData();
row.put("C1", args[0]);
row.put("C2", args[1]);
row.put("C3", args[2]);
row.put("C4", args[3]);
ClusteringModelPrediction p = model.predictClustering(row);
System.out.printf("Input values: %s %s %s %s \n", args[0], args[1], args[2], args[3]);
System.out.printf("cluster: %s", p.cluster);
}
}
Compile java program
Now it is time to compile your program, but first you need to download h2o-genmodel.jar
in the same directory with iris_km_model.java
and main.java
.
$ cd saved_models
$ curl http://localhost:54321/3/h2o-genmodel.jar > h2o-genmodel.jar
Having the three files, you can compile your program as follows:
$ javac -cp h2o-genmodel.jar -J-Xmx2g -J-XX:MaxPermSize=128m iris_km_model.java main.java
If things work fine you will get no errors and you will find new .class
files generated in the same directory.
Make predictions
Now you are ready to use the compiled program and make predictions. For this, you should write java -cp ".;h2o-genmodel.jar" main
followed by the four expected inputs (corresponding to C1, C2, C3, C4) as follows:
$java -cp ".;h2o-genmodel.jar" main 6.4 3.2 5.3 2.3
The printed result will include whatever format you specified in the main.java
program. Here I set it to return the input values then the predicted cluster.
Input values: 6.4 3.2 5.3 2.3
cluster: 1
NOTE:
I used
;
in$java -cp ".;h2o-genmodel.jar"
as I am using Windows. Probably it will differ with other operating systems and you will need to use:
.There could be better ways to do the same thing, but this was a minimal example with some basics for demo purposes.
Session info
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] h2o_3.22.1.1
loaded via a namespace (and not attached):
[1] compiler_3.5.0 bookdown_0.9 htmltools_0.3.6 tools_3.5.0 RCurl_1.95-4.12 yaml_2.2.0
[7] Rcpp_1.0.1 rmarkdown_1.12 blogdown_0.11 knitr_1.22 jsonlite_1.6 xfun_0.5
[13] digest_0.6.18 bitops_1.0-6 evaluate_0.13