Testing a Java and Elasticsearch 5.0 application

2016-11-30 elasticsearch

A long time ago, I wrote this article in french explaining how to test a Java application talking to Elasticsearch. At that time (Elasticsearch 1.x), it was easy to start en embedded Elasticsearch, it was a oneliner:

Node node = NodeBuilder.nodeBuilder().node();

Starting with Elasticsearch 5.0, it’s forbidden to start an embedded Elasticsearch. The NodeBuilder class used above doesn’t exist anymore, and most of classes used to start Elasticsearch as a server are hidden and sealed. In short, it’s not as easy as before to test your Elasticsearch access layer, Java developers have lost their advantage of using the same language as Elasticsearch. Most solutions described in this article can probably apply to other languages: Ruby, Python, JavaScript…​

Let’s see which options are left.

Elasticsearch Testing framework

This test framework is used by the Elastic team to test Elasticsearch itself. Undercover, it uses the Randomized Testing which is also used to test Apache Lucene. This library can be downloaded as a Maven dependency:

<dependency>
  <groupId>org.elasticsearch.test</groupId>
  <artifactId>framework</artifactId>
  <version>5.0.1</version>
  <scope>test</scope>
</dependency>

It contains an ESIntegTestCase which is able to bootstrap an Elasticsearch for tests:

public class CustomerDaoTests extends ESIntegTestCase {

}

It looks simple and convenient, but for the average Java developer, this library raises multiple problems:

  • It brings numerous dependencies in your test environment (Log4J2, Commons *…​)

  • It doesn’t mix properly with usual libraries (see jar hell error below)

  • It enforces security by enabling Java’s Security Manager (see access denied error below)

  • It doesn’t support HTTP protocol (only internal transport procol)

java.lang.RuntimeException: found jar hell in test classpath

	at org.elasticsearch.bootstrap.BootstrapForTesting.<clinit>(BootstrapForTesting.java:90)
	at org.elasticsearch.test.ESTestCase.<clinit>(ESTestCase.java:145)


java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessDeclaredMembers")

	at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)

In short, this library is appropriate to develop an Elasticsearch plugin, but that’s all.

If we can not start Elasticsearch from inside the test, let’s start it from outside.

Elasticsearch scripting

From now on, we will start a real Elasticsearch process, we won’t use anymore a slimmed down version with specifics settings. Our tests will get more realistic, but also harder to set up.

In order to automate test execution, my first solution is to create a script to install and start Elasticsearch. This script can later be ran from CI configuration, before running the build script (Maven, Gradle…​).

Here is a short example:

elasticsearch-start.sh
#!/usr/bin/env bash

TARGET_DIR="$(dirname $0)/target"
ES_VERSION=5.0.0
ES_DIR=${TARGET_DIR}/elasticsearch-${ES_VERSION}
mkdir -p ${TARGET_DIR}

ES_TAR=${TARGET_DIR}/elasticsearch-${ES_VERSION}.tar.gz
ES_URL=https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-${ES_VERSION}.tar.gz
curl -o ${ES_TAR} ${ES_URL}                                                     (1)
tar -xzf ${ES_TAR} -C ${TARGET_DIR}                                             (2)

cd ${ES_DIR}
bin/elasticsearch -d -p pid                                                     (3)

sleep 10s
curl "http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=30s" (4)
1 Download Elasticsearch. We could also download it from a corporate repository (Nexus, web server…​).
2 Unzip the downloaded archive.
3 Start Elasticsearch as a background process and keep it’s PID so as to be able to stop it at the end.
4 Wait for Elasticsearch to be up and running.

The main drawback of this script is that it can not run on Windows: it is not portable. We could rewrite it using Ant, Python or Groovy/Gradle:

build.xml
<target name="elasticsearch-start" >
    <get src="https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-${es.version}.tar.gz"
         dest="${target.dir}/elasticsearch-${es.version}.tar.gz"
         verbose="true" skipexisting="true"
    />
    <untar src="${target.dir}/elasticsearch-${es.version}.tar.gz" dest="${target.dir}" compression="gzip"/>
    <exec executable="cmd" failonerror="true" osfamily="winnt" dir="${es.dir}">
        <arg value="/c"/>
        <arg value="bin/elasticsearch.bat"/>
    </exec>
    <exec executable="sh" failonerror="true" osfamily="unix" dir="${es.dir}">
        <arg value="bin/elasticsearch"/>
        <arg value="-d"/>
        <arg value="-p"/>
        <arg value="pid"/>
    </exec>
    <waitfor maxwait="30" maxwaitunit="second" checkevery="10" checkeveryunit="second">
        <http url="http://localhost:9200/_cluster/health?wait_for_status=yellow&amp;timeout=5s"/>
    </waitfor>
</target>

This Ant script does exactly the same as the above shell script (download, unzip, start and wait). I have to admit, this idea is not mine, it’s explained in detail on David Pilato’s blog. This Ant script could be called from Maven using the Antrun plugin or from Groovy/Gradle using AntBuilder.

I’ve created a sample Java project using Elasticsearch Java Rest client to access read and write data. Maven starts Elasticsearch, runs integration tests and finally stops Elasticsearch:

        <profile>
            <id>ant</id>
            <properties>
                <elasticsearch.url>http://localhost:9200</elasticsearch.url>
            </properties>
            <build>
                <plugins>
                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-dependency-plugin</artifactId>
                        <version>2.10</version>
                        <executions>
                            <execution>
                                <id>get-es</id>
                                <phase>pre-integration-test</phase>
                                <goals>
                                    <goal>copy</goal>
                                </goals>
                                <configuration>
                                    <artifactItems>
                                        <artifactItem>
                                            <groupId>org.elasticsearch.distribution.zip</groupId>
                                            <artifactId>elasticsearch</artifactId>
                                            <version>${elasticsearch.version}</version>
                                            <type>zip</type>
                                        </artifactItem>
                                    </artifactItems>
                                    <outputDirectory>${project.build.directory}/it</outputDirectory>
                                </configuration>
                            </execution>
                        </executions>
                    </plugin>
                    <plugin>
                        <artifactId>maven-antrun-plugin</artifactId>
                        <version>1.8</version>
                        <executions>
                            <execution>
                                <id>start-es</id>
                                <phase>pre-integration-test</phase>
                                <goals>
                                    <goal>run</goal>
                                </goals>
                                <configuration>
                                    <target>
                                        <property name="target.dir" value="${project.build.directory}"/>
                                        <property name="elasticsearch.version" value="${elasticsearch.version}"/>
                                        <ant antfile="${basedir}/it.xml" target="start"/>
                                    </target>
                                </configuration>
                            </execution>
                            <execution>
                                <id>stop-es</id>
                                <phase>post-integration-test</phase>
                                <goals>
                                    <goal>run</goal>
                                </goals>
                                <configuration>
                                    <target>
                                        <property name="target.dir" value="${project.build.directory}"/>
                                        <property name="elasticsearch.version" value="${elasticsearch.version}"/>
                                        <ant antfile="${basedir}/it.xml" target="stop"/>
                                    </target>
                                </configuration>
                            </execution>
                        </executions>
                    </plugin>
                    <plugin>
                        <artifactId>maven-failsafe-plugin</artifactId>
                        <version>2.19.1</version>
                        <executions>
                            <execution>
                                <goals>
                                    <goal>integration-test</goal>
                                    <goal>verify</goal>
                                </goals>
                            </execution>
                        </executions>
                    </plugin>
                </plugins>
            </build>
        </profile>
$ mvn -Pant install

------------------------------------------------------------------------
Building Testing with Elasticsearch 5.0 0.0.1-SNAPSHOT
------------------------------------------------------------------------

...

--- maven-antrun-plugin:1.8:run (start-es) @ test-elasticsearch5 ---
Executing tasks

main:

setup:

start:
     [echo] Starting Elasticsearch 5.0.1
     [echo] Started Elasticsearch with PID 6256
Executed tasks

--- maven-failsafe-plugin:2.19.1:integration-test (default) @ test-elasticsearch5 ---

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.github.gquintana.elasticsearch.ProductRepositoryIT
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.141 sec - in com.github.gquintana.elasticsearch.ProductRepositoryIT

Results :

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0


--- maven-antrun-plugin:1.8:run (stop-es) @ test-elasticsearch5 ---
Executing tasks

main:

stop:
     [echo] Stopping Elasticsearch with PID 6256
     [echo] Stopped Elasticsearch
Executed tasks
...

This solution based on scripting has several shortcomings:

  • It should be improved to avoid downloading and decompressing the archive again and again.

  • It should be completed with a script to stop Elasticsearch and do household chores (remove, data, logs).

  • It could become more complicated: it is sometimes needed to install plugins, tweak the elasticsearch.yml configuration file or set some environment variables.

  • As any piece of code, it should be loved and maintained.

Elasticsearch in a container

We can delegate the downloading, starting and stopping logic to Docker. Starting Elasticsearch in a container is nearly as easy as:

docker run -d --name elasticsearch-5.0 -v /usr/share/elasticsearch/data:$PWD/target/data -P elasticsearch:5.0.1
curl "http://172.17.0.1:9200/_cluster/health?wait_for_status=yellow&timeout=30s"

# Run tests here...

docker stop elasticsearch-5.0

We should still wait for Elasticsearch to be started before running tests. Once you’re familiar with Docker, you can run the Elasticsearch container using:

  • Docker Jenkins plugin

  • Docker Maven plugin:

            <profile>
                <id>docker</id>
                <properties>
                    <elasticsearch.url>http://172.17.0.1:9200</elasticsearch.url>
                </properties>
                <build>
                    <plugins>
                        <plugin>
                            <groupId>io.fabric8</groupId>
                            <artifactId>docker-maven-plugin</artifactId>
                            <version>0.18.1</version>
                            <configuration>
                                <images>
                                    <image>
                                        <alias>elasticsearch5</alias>
                                        <name>elasticsearch:5.0.1</name>
                                        <run>
                                            <volumes>
                                                <bind>
                                                    <volume>/usr/share/elasticsearch/data:${project.build.directory}/data</volume>
                                                </bind>
                                            </volumes>
                                            <env>
                                                <ES_JAVA_OPTS>-Xmx1g -Xms1g</ES_JAVA_OPTS>
                                            </env>
                                            <ports>
                                                <port>9200:9200</port>
                                            </ports>
                                            <wait>
                                                <http>
                                                    <url>${elasticsearch.url}/_cluster/health?wait_for_status=yellow&amp;timeout=30s</url>
                                                    <method>GET</method>
                                                    <status>200</status>
                                                </http>
                                            </wait>
                                        </run>
                                    </image>
                                </images>
                            </configuration>
                            <executions>
                                <execution>
                                    <id>docker-start</id>
                                    <phase>pre-integration-test</phase>
                                    <goals>
                                        <goal>start</goal>
                                    </goals>
                                </execution>
                                <execution>
                                    <id>docker-stop</id>
                                    <phase>post-integration-test</phase>
                                    <goals>
                                        <goal>stop</goal>
                                    </goals>
                                </execution>
                            </executions>
                        </plugin>
                        <plugin>
                            <artifactId>maven-failsafe-plugin</artifactId>
                            <version>2.19.1</version>
                            <executions>
                                <execution>
                                    <goals>
                                        <goal>integration-test</goal>
                                        <goal>verify</goal>
                                    </goals>
                                </execution>
                            </executions>
                        </plugin>
                    </plugins>
                </build>
            </profile>

However, running Elasticsearch in docker may not be as easy as it may seem at first sight. On many Linux boxes, the Elasticsearch container will stop immediately with this kind of error:

[2016-11-26T14:58:32,140][INFO ][o.e.b.BootstrapCheck     ] [3mI2H8T] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
ERROR: bootstrap checks failed
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

When Elasticsearch 5.0 is running inside a Docker container, it doesn’t to listen on localhost interface, but on a container interface. This network setting makes Elasticsearch think it is running in production mode. As a consequence, Elasticsearch does some additional bootstrap checks to avoid common production issues. Like on your production server, you’ll have to do some the system level tuning to allow it to start:

sudo sysctl -w vm.max_map_count=262144

If you don’t have sufficient privileges to change such setting, then you’re in trouble. I personally miss a setting to be able to disable bootstrap checking.

As my colleague Mickael Jeanroy pointed out, we could also start this Docker container from JUnit. There are indeed several libraries to mix Docker and JUnit: Docker JUnit Rule,another JUnit Docker Rule, Test containers. Most of these libraries are based on Spotify’s Docker client for Java.

With that kind of library, starting Elasticsearch becomes easy again:

@ClassRule
public static DockerRule elasticsearchRule = DockerRule.builder()
        .imageName("elasticsearch:5.0.1")
        .mountFrom("/usr/share/elasticsearch/data").to(dataDir())
        .env("ES_JAVA_OPTS","-Xmx1g -Xms1g")
        .expose("9200", "9200")
        .waitForMessage("started")
        .waitForHttpPing(9200)
        .build();