Set Up Solr and Get it Running
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
The main featu
re of Solr (or at least the most useful) is its REST-like Api, since instead of playing with Api and drivers to talk to Solr you can easily make HTTP requests and get results in JSON or XML (and it's pronounced solar, if you wonder.)
I would not say that this is a perfect REST interface which incarnates all the principles of HTTP 1.1, but the point is that data has a simple representation which goes back and forth from the client to the server, and no encapsulation in SOAP's nightmarish envelopes. Plus, it's human-readable since XML and JSON can be written easily for exploratory testing purposes.
An year ago, when I heard that CouchDB had a REST api, I said what an unuseful level of abstraction, why can't I just use a PHP extension/driver like with MySQL, sqlite and similar databases?
Now I see the potential of such an universal interface:
- it is language agnostic due to XML or JSON usage, which can be interpreted by nearly everything nowadays. The metric usually is JavaScript: if you can understand it in JavaScript in a browser with all its limitations, you can do it everywhere. Of course, JavaScript supports natively both JSON (by eval(), even if it's not secure) and XML (with DOM).
- it is data type agnostic due to HTTP, which can only transmit strings. There is no type safety, but great interoperability. Dynamic languages like PHP succeed so much also because the basic protocol has no strict types. If the front end is only going to print it, why shouldn't a string be enough?
- it is more or less a standard protocol (although the data representation is not): if anyone that invents a database publishes his own binary communication protocol, we'll have lots more libraries.
Solr is written in Java, but you can access it with your language of choice if you need, simply by doing GET requests to search the index and POST ones to add documents.
Of course there is usually a library for every known programming language, that wraps the REST-like interface, but it is dead simple to build such a library for example with JavaScript, and it is not mandatory to use a library at all. And this lead us to another point: if Solr had a binary interface like MySQL's one how would you wrap that with Javascript? HTTP is universal since nearly everything now from computers to ovens to hair dryers can make HTTP requests.
Starting out
Simplicity of usage, starting from the protocol, is a key feature in comparison with Lucene, and it is indeed very easy to setup Solr and get it running.
Since Solr's primary interface is web-based, it needs a servlet container (reinventing the wheel and implementing a full HTTP stack was not a great idea). Solr has several servlets that handle different end points, like /update or /select.
However, it comes with a prepackaged example with Jetty (a small, lightweight servlet container) so that it runs out of the box, but you can deploy it also as a Tomcat web app if you want.
All you need to run Solr is uncompressing the release, going in the example/ folder and run 'java -jar start.jar'. Then you can post sample documents and make some queries for exploratory testing.
Of course the bundled schema is an example one, so you can quickly modify schema.xml to craft a custom one. Just add some <field> tags:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="PageTitle" type="string" indexed="true" stored="true"/>
<field name="Artist" type="string" indexed="true" stored="true"/>
<field name="Title" type="string" indexed="true" stored="true"/>
Test automation with a Solr index
Since Solr is so simple to use, let's do something scary with it: test automation.
When I started integrating Solr in my multimedia search application project, I wrote an integration test with JUnit plus a SolrWrapper class in less than two pomodoros, a time that mostly went to learn the Api of SolrJ, the Java library that wraps the HTTP-based interface. I could also have done it simply with URL and HttpURLConnection native objects if I hadn't a library available.
package it.polimi.chansonnier.test;
import java.util.ArrayList;
import java.util.Collection;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.SolrInputDocument;
import junit.framework.TestCase;
public class SolrIntegrationTest extends TestCase {
private SolrWrapper solrWrapper;
public void testSolrInstanceCanBeStartedQueriedAndStopped() throws Exception {
solrWrapper = new SolrWrapper();
solrWrapper.start();
String url = "http://localhost:8983/solr";
CommonsHttpSolrServer server = new CommonsHttpSolrServer( url );
server.setParser(new XMLResponseParser());
server.deleteByQuery( "*:*" );// delete everything!
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( "id", "id1", 1.0f );
doc1.addField( "name", "doc1", 1.0f );
Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
docs.add( doc1 );
server.add( docs );
server.commit();
SolrQuery query = new SolrQuery();
query.setQuery( "*:*" );
//query.addSortField( "name", SolrQuery.ORDER.asc );
QueryResponse rsp = server.query( query );
SolrDocumentList docList = rsp.getResults();
assertEquals("[id1, doc1]", docList.get(0).values().toString());
}
public void tearDown() {
solrWrapper.stop();
}
}
package it.polimi.chansonnier.test;
import java.io.File;
public class SolrWrapper {
private Process solr;
public void start() throws Exception {
Runtime r = Runtime.getRuntime();
solr = r.exec("/usr/bin/java -jar start.jar", null, getSolrRoot());
Thread.sleep(5000);
}
public void stop() {
if (solr != null) {
solr.destroy();
}
}
private File getSolrRoot() throws Exception {
String root = System.getProperty("it.polimi.chansonnier.solr.root");
if (root == null) {
throw new Exception("Solr path is not specified, please add the property it.polimi.chansonnier.solr.root");
}
return new File(root);
}
}
Remember that an integration test by my definition (and Growing object-oriented software's one) involves only the behavior of an external entity, to make sure we understand how it works and that the contract we are programming to in our code is correct.
Now my acceptance tests start Solr and stop it at the end of the test, resetting its index at the start of every test method:
package it.polimi.chansonnier.test;
import java.io.IOException;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import junit.framework.TestCase;
public abstract class AcceptanceTest extends TestCase {
private SolrWrapper solrWrapper;
protected CommonsHttpSolrServer solrServer;
public void setUp() {
solrWrapper = new SolrWrapper();
try {
solrWrapper.start();
String url = "http://localhost:8983/solr";
solrServer = new CommonsHttpSolrServer( url );
solrServer.setParser(new XMLResponseParser());
solrServer.deleteByQuery( "*:*" );
} catch (Exception e) {
e.printStackTrace();
}
}
public void tearDown() {
solrWrapper.stop();
}
}
Now what?
Now that I have a running Solr instance for my application, I plan to use AJAX Solr, a JavaScript library that makes requests directly to Solr, to build a rich front end. AJAX Solr is the proof of how browsers have become powerful today: you can actually see it doing queries via Firebug (imagine doing that with relational databases as back ends).
I've always been scared of rich-client applications because of the difficulty of testing, but now tools have grown to support that. Of course I'll test-driven my web interface it with HttpUnit as I did with my plain HTML one; it uses Rhino to crawl JavaScript-powered pages and assert they're generated correctly.






Comments
Nicholas Choate replied on Thu, 2010/06/10 - 11:03am
I have a hard time figuring out the following:
AJAX Solr is the proof of how browsers have become powerful today: you can actually see it doing queries via Firebug (imagine doing that with relational databases as back ends)
I have a relational database as my backend database, and I can see firebug make "queries" through javascript because I wrote a RESTful API over top of my database services. Which is all that the Solr guys have done. They have a ton of search code written and then they exposed pieces and parts via a RESTful service. So the comment above makes no sense.... its the WEB SERVICE doing the work, not the backend.
I'm sure some will say, oh he's just a relational database fanboy. To that, I'd say probably so, but you are probably a "No-SQL" fanboy, so let's call it even. I just don't want anyone to mistake that its the RESTful-ness of these web services that enables the powerful usability without hurting your back from writing a ton of code. Your backend is your backend, whether it is a relational database, a java-based search engine or even the mainframe. There I said it. You can expose services built on top of COBOL.
All that being said, I love Solr and AjaxSolr. Its great, its awesome. It does everything I want it to do and more. If you haven't looked at it yet, I would definitely recommend it.
Giorgio Sironi replied on Thu, 2010/06/10 - 11:32am
in response to:
Nicholas Choate
Peter ___ replied on Thu, 2010/06/10 - 2:53pm
Isn't it easier to use the in-built test case?
public class SolrTest extends AbstractSolrTestCase {
private SolrServer server;
@Override
public String getSchemaFile() {
return "solr/conf/schema.xml";
}
@Override
public String getSolrConfigFile() {
return "solr/conf/solrconfig.xml";
}
@Before
@Override
public void setUp() throws Exception {
super.setUp();
server = new EmbeddedSolrServer(h.getCoreContainer(), h.getCore().getName());
}
Giorgio Sironi replied on Thu, 2010/06/10 - 3:00pm
in response to:
Peter ___
I'm not sure this would work in my case for a few reasons:
- I am in an OSGi environment, and I kept Solr as an external app for now.
- I need already to extend other TestCase
- I need a real SolrServer since I'm testing that an HTML page makes HTTP requests to it via JavaScript.
I transitioned to keeping Solr running all the time and resetting it in every acceptance test setUp() now.
Peter ___ replied on Thu, 2010/06/10 - 3:47pm
> I am in an OSGi environment, and I kept Solr as an external app for now.
solr is external for me too, but in the test it will use the provided xml files and start an instance in your /tmp folder. So this is more a unit-test scenario in my case ...
>I need already to extend other TestCase
you can of course use a Java-trick instantiate the testcase via:
tester = new AbstractSolrTestCase() {
@Override
public String getSchemaFile() {
return "solr/conf/schema.xml";
}
@Override
public String getSolrConfigFile() {
return "solr/conf/solrconfig.xml";
}
and then call tester.setUp in the setUp method
Giorgio Sironi replied on Fri, 2010/06/11 - 5:28am
in response to:
Peter ___
Peter ___ replied on Thu, 2010/11/11 - 1:01pm
in response to:
Giorgio Sironi
Just one side note: distributed search is not easily testable with my approach. Giorgios approach should work but there is even a third one with jettyrunner, found in the solr test classes itself:
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/test/org/apache/solr/SolrTestCaseJ4.java
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/test/org/apache/solr/BaseDistributedSearchTestCase.java
Kookee Gacho replied on Thu, 2012/06/14 - 6:42am
Carla Brian replied on Wed, 2012/07/04 - 7:20pm