Agile Zone is brought to you in partnership with:

I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 637 posts at DZone. You can read more from them at their website. View Full User Profile

Set Up Solr and Get it Running

06.10.2010
| 22723 views |
  • submit to reddit
What is Solr? Lucene, but done right and on steroids. The official definition is:
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.


The main feature of Solr (or at least the most useful) is its REST-like Api, since instead of playing with Api and drivers to talk to Solr you can easily make HTTP requests and get results in JSON or XML (and it's pronounced solar, if you wonder.)


I would not say that this is a perfect REST interface which incarnates all the principles of HTTP 1.1, but the point is that data has a simple representation which goes back and forth from the client to the server, and no encapsulation in SOAP's nightmarish envelopes. Plus, it's human-readable since XML and JSON can be written easily for exploratory testing purposes.

An year ago, when I heard that CouchDB had a REST api, I said what an unuseful level of abstraction, why can't I just use a PHP extension/driver like with MySQL, sqlite and similar databases?

Now I see the potential of such an universal interface:

  • it is language agnostic due to XML or JSON usage, which can be interpreted by nearly everything nowadays. The metric usually is JavaScript: if you can understand it in JavaScript in a browser with all its limitations, you can do it everywhere. Of course, JavaScript supports natively both JSON (by eval(), even if it's not secure) and XML (with DOM).
  • it is data type agnostic due to HTTP, which can only transmit strings. There is no type safety, but great interoperability. Dynamic languages like PHP succeed so much also because the basic protocol has no strict types. If the front end is only going to print it, why shouldn't a string be enough?
  • it is more or less a standard protocol (although the data representation is not): if anyone that invents a database publishes his own binary communication protocol, we'll have lots more libraries.

Solr is written in Java, but you can access it with your language of choice if you need, simply by doing GET requests to search the index and POST ones to add documents.

Of course there is usually a library for every known programming language, that wraps the REST-like interface, but it is dead simple to build such a library for example with JavaScript, and it is not mandatory to use a library at all. And this lead us to another point: if Solr had a binary interface like MySQL's one how would you wrap that with Javascript? HTTP is universal since nearly everything now from computers to ovens to hair dryers can make HTTP requests.

Starting out

Simplicity of usage, starting from the protocol, is a key feature in comparison with Lucene, and it is indeed very easy to setup Solr and get it running.

Since Solr's primary interface is web-based, it needs a servlet container (reinventing the wheel and implementing a full HTTP stack was not a great idea). Solr has several servlets that handle different end points, like /update or /select.

However, it comes with a prepackaged example with Jetty (a small, lightweight servlet container) so that it runs out of the box, but you can deploy it also as a Tomcat web app if you want.

All you need to run Solr is uncompressing the release, going in the example/ folder and run 'java -jar start.jar'. Then you can post sample documents and make some queries for exploratory testing.

Of course the bundled schema is an example one, so you can quickly modify schema.xml to craft a custom one. Just add some <field> tags:

   <field name="id" type="string" indexed="true" stored="true" required="true" /> 

<field name="PageTitle" type="string" indexed="true" stored="true"/>
<field name="Artist" type="string" indexed="true" stored="true"/>
<field name="Title" type="string" indexed="true" stored="true"/>

Test automation with a Solr index

Since Solr is so simple to use, let's do something scary with it: test automation.

When I started integrating Solr in my multimedia search application project, I wrote an integration test with JUnit plus a SolrWrapper class in less than two pomodoros, a time that mostly went to learn the Api of SolrJ, the Java library that wraps the HTTP-based interface. I could also have done it simply with URL and HttpURLConnection native objects if I hadn't a library available.

package it.polimi.chansonnier.test;

import java.util.ArrayList;
import java.util.Collection;

import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.SolrInputDocument;


import junit.framework.TestCase;

public class SolrIntegrationTest extends TestCase {
private SolrWrapper solrWrapper;

public void testSolrInstanceCanBeStartedQueriedAndStopped() throws Exception {
solrWrapper = new SolrWrapper();
solrWrapper.start();

String url = "http://localhost:8983/solr";
CommonsHttpSolrServer server = new CommonsHttpSolrServer( url );
server.setParser(new XMLResponseParser());
server.deleteByQuery( "*:*" );// delete everything!
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( "id", "id1", 1.0f );
doc1.addField( "name", "doc1", 1.0f );
Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
docs.add( doc1 );
server.add( docs );
server.commit();

SolrQuery query = new SolrQuery();
query.setQuery( "*:*" );
//query.addSortField( "name", SolrQuery.ORDER.asc );
QueryResponse rsp = server.query( query );
SolrDocumentList docList = rsp.getResults();
assertEquals("[id1, doc1]", docList.get(0).values().toString());
}

public void tearDown() {
solrWrapper.stop();
}
}
package it.polimi.chansonnier.test;

import java.io.File;

public class SolrWrapper {
private Process solr;

public void start() throws Exception {
Runtime r = Runtime.getRuntime();
solr = r.exec("/usr/bin/java -jar start.jar", null, getSolrRoot());
Thread.sleep(5000);
}

public void stop() {
if (solr != null) {
solr.destroy();
}
}

private File getSolrRoot() throws Exception {
String root = System.getProperty("it.polimi.chansonnier.solr.root");
if (root == null) {
throw new Exception("Solr path is not specified, please add the property it.polimi.chansonnier.solr.root");
}
return new File(root);
}
}

Remember that an integration test by my definition (and Growing object-oriented software's one) involves only the behavior of an external entity, to make sure we understand how it works and that the contract we are programming to in our code is correct.

Now my acceptance tests start Solr and stop it at the end of the test, resetting its index at the start of every test method:

package it.polimi.chansonnier.test;

import java.io.IOException;

import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;

import junit.framework.TestCase;

public abstract class AcceptanceTest extends TestCase {
private SolrWrapper solrWrapper;
protected CommonsHttpSolrServer solrServer;

public void setUp() {
solrWrapper = new SolrWrapper();
try {
solrWrapper.start();
String url = "http://localhost:8983/solr";
solrServer = new CommonsHttpSolrServer( url );
solrServer.setParser(new XMLResponseParser());
solrServer.deleteByQuery( "*:*" );
} catch (Exception e) {
e.printStackTrace();
}
}

public void tearDown() {
solrWrapper.stop();
}
}

Now what?

Now that I have a running Solr instance for my application, I plan to use AJAX Solr, a JavaScript library that makes requests directly to Solr, to build a rich front end. AJAX Solr is the proof of how browsers have become powerful today: you can actually see it doing queries via Firebug (imagine doing that with relational databases as back ends).

I've always been scared of rich-client applications because of the difficulty of testing, but now tools have grown to support that. Of course I'll test-driven my web interface it with HttpUnit as I did with my plain HTML one; it uses Rhino to crawl JavaScript-powered pages and assert they're generated correctly.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Nicholas Choate replied on Thu, 2010/06/10 - 11:03am

I have a hard time figuring out the following:

AJAX Solr is the proof of how browsers have become powerful today: you can actually see it doing queries via Firebug (imagine doing that with relational databases as back ends)

 I have a relational database as my backend database, and I can see firebug make "queries" through javascript because I wrote a RESTful API over top of my database services.  Which is all that the Solr guys have done.  They have a ton of search code written and then they exposed pieces and parts via a RESTful service.  So the comment above makes no sense.... its the WEB SERVICE doing the work, not the backend.  

 I'm sure some will say, oh he's just a relational database fanboy.  To that, I'd say probably so, but you are probably a "No-SQL" fanboy, so let's call it even.  I just don't want anyone to mistake that its the RESTful-ness of these web services that enables the powerful usability without hurting your back from writing a ton of code.  Your backend is your backend, whether it is a relational database, a java-based search engine or even the mainframe.  There I said it.  You can expose services built on top of COBOL.  

 All that being said, I love Solr and AjaxSolr.  Its great, its awesome.  It does everything I want it to do and more.  If you haven't looked at it yet, I would definitely recommend it.

 

Giorgio Sironi replied on Thu, 2010/06/10 - 11:32am in response to: Nicholas Choate

I work with ORMs on a regular basis, so I am not against the use of a relational database either. The point was the universality of HTTP as a protocol that makes it good even for communicating with a "database" like Solr (really an index service and not a true database). From my point of view, Solr is the back end since I populate it with data directly from Java threads (via HTTP) and make queries from an Ajax front end (via HTTP) and do exploratory testing on the admin console (via HTTP). Browsers can't talk MySQL or Oracle protocol via XMLHttpRequest without an additional layer that performs the translation, thus here's the goodness of the browser + REST service pair. :)

Peter Karussell replied on Thu, 2010/06/10 - 2:53pm

Isn't it easier to use the in-built test case?

 public class SolrTest extends AbstractSolrTestCase {

    private SolrServer server;

    @Override
    public String getSchemaFile() {
        return "solr/conf/schema.xml";
    }

    @Override
    public String getSolrConfigFile() {
        return "solr/conf/solrconfig.xml";
    }

    @Before
    @Override
    public void setUp() throws Exception {
        super.setUp();        

        server = new EmbeddedSolrServer(h.getCoreContainer(), h.getCore().getName());
    }
... 

Giorgio Sironi replied on Thu, 2010/06/10 - 3:00pm in response to: Peter Karussell

I'm not sure this would work in my case for a few reasons:

- I am in an OSGi environment, and I kept Solr as an external app for now.

- I need already to extend other TestCase

- I need a real SolrServer since I'm testing that an HTML page makes HTTP requests to it via JavaScript.

I transitioned to keeping Solr running all the time and resetting it in every acceptance test setUp() now.

 

Peter Karussell replied on Thu, 2010/06/10 - 3:47pm

> I am in an OSGi environment, and I kept Solr as an external app for now.

solr is external for me too, but in the test it will use the provided xml files and start an instance in your /tmp folder. So this is more a unit-test scenario in my case ...

>I need already to extend other TestCase

you can of course use a Java-trick instantiate the testcase via:

 

tester = new AbstractSolrTestCase() {

    @Override
    public String getSchemaFile() {
        return "solr/conf/schema.xml";
    }

    @Override
    public String getSolrConfigFile() {
        return "solr/conf/solrconfig.xml";
    }
} 

 and then call tester.setUp in the setUp method

Giorgio Sironi replied on Fri, 2010/06/11 - 5:28am in response to: Peter Karussell

I will check it out, for now the reset query is working well.

Peter Karussell replied on Thu, 2010/11/11 - 1:01pm in response to: Giorgio Sironi

Just one side note: distributed search is not easily testable with my approach. Giorgios approach should work but there is even a third one with jettyrunner, found in the solr test classes itself:

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/test/org/apache/solr/SolrTestCaseJ4.java

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/test/org/apache/solr/BaseDistributedSearchTestCase.java

Kookee Gacho replied on Thu, 2012/06/14 - 6:42am

Apache Lucene and Apache Solr are both produced by the same Apache Software Foundation development team since the two projects were merged in 2010. It is common to refer to the technology or products as Lucene/Solr or Solr/Lucene.-Madison Pharmacy Associates

Carla Brian replied on Wed, 2012/07/04 - 7:20pm

I need to know more abut this one. I am new to this. I need more tutorials as well. - Mercy Ministries

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.