HTML5 Zone is brought to you in partnership with:

David Catuhe is a Microsoft HTML5 Evangelist based in Paris, France. He does much of his research on Windows 8, Kinect, and the various HTML5 standards and JS libraries. David is a DZone MVB and is not an employee of DZone and has posted 22 posts at DZone. You can read more from them at their website. View Full User Profile

Creating a Local Full Text Search Index

12.21.2012
| 2111 views |
  • submit to reddit

One issue I have with UrzaGatherer is related to the search. As you may know, I have a huge list of cards saved locally and my application allows the user to search specific cards using a part of their name. But browsing more than 20 000 cards using a text search can be really expensive mainly on low-end hardware.

Currently the code looks like something like that:

cards = UrzaGatherer.CardsList.createFiltered(queryFunction)

I use the WinJS.Binding.List object to create a filtered view using my search pattern.

The filter function uses a simple indexOf function:

var queryFunction = function (card) {
    if (card.name.indexOf(textSearch) !== -1) {
        return true;
    }

    return false;
};

But obviously, it can take ages to perform a search using this solution. The complexity is almost O(n * m) which can be simplified to O(n²) (where n is the number of cards and m the average length of cards name).

So the question is: How can I optimize my search?

Building a full text search tree

One solution can be found with a search tree. This kind of structure allows you to perform a search with a complexity of O(n) where n is the average length of a card’s name.

You have to build the tree by feeding it with the strings you want to search. For each string, the tree will generate a branch with all the characters then a branch with n-1 character (starting after the first one) and so on.

For example, if we use the “urza” string the tree will look like that:

image

The leafs contain the id of the associated card.

If I add a new string like “Pizza” to my previous tree the resulting tree is:

image

Please note that some leafs can contain many cards (like for “ZA” and “A”)

The related code is pretty simple:

var root;

var processString = function(string, offset, node, object) {
    if (!string || offset == string.length) {
        return;
    }

    var currentNode = node;

    for (var index = offset; index < string.length; index++) {
        var c = string[index];

        if (currentNode[c] === undefined) {
            currentNode[c] = {};
        }

        currentNode = currentNode[c];
    }

    if (currentNode.ref == undefined)
        currentNode.ref = [];

    if (currentNode.ref.indexOf(object.id) == -1) {
        currentNode.ref.push(object.id);
    }

    processString(string, offset + 1, root, object);
};

var addString = function (string, object) {
    if (!string)
        return;
    
    if (!root)
        root = { };

    processString(string.toLowerCase(), 0, root, object);
};

Searching using the full text search tree

The search function is then a simple tree traversal:

var concatArray = function (source, data) {
    for (var index = 0; index < data.length; index++) {
        source[data[index]] = {};
    }
};

var gatherResults = function (node, results) {
    for (var prop in node) {
        if (prop == "ref")
            concatArray(results, node[prop]);
        else
            gatherResults(node[prop], results);
    }
};

var searchString = function (string) {
    var currentNode = root;

    for (var index = 0; index < string.length; index++) {
        var c = string[index];

        if (currentNode[c] === undefined)
            return {};

        currentNode = currentNode[c];
    }

    var results = {};
    gatherResults(currentNode, results);

    return results;
};

When the searched string is completely found, the algorithm gathers all the children leafs to produce the final result.

You can also persist your tree easily with JSON:

var persistIndex = function(filename) {
    var data = JSON.stringify(root);
    return Windows.Storage.ApplicationData.current.localFolder.createFileAsync(filename, 
Windows.Storage.CreationCollisionOption.replaceExisting)
        .then(function (file) {
        return Windows.Storage.FileIO.writeTextAsync(file, data);
    });
};

var resetIndex = function() {
    root = undefined;
};

var loadIndex = function(filename) {
    return Windows.ApplicationModel.Package.current.installedLocation.getFolderAsync("data")
        .then(function(localFolder) {
        return localFolder.getFileAsync(filename).then(function(file) {
            return Windows.Storage.FileIO.readTextAsync(file).then(function(data) {
                if (data) {
                    root = JSON.parse(data);
                }
            });
        });
    });
};

For instance, the complete index for my 20 000 cards weights 6 Mb.

Results

Blazing fast !! Nothing more Sourire. Using my search tree I can search through the entire collection in less than 300ms where before it can take up to 20s for the same result.

But beware: this optimization is costly in terms of memory consumption (due to the index).

At the end of the day, it is a good tool for you if you want to search your data without using a backend server (or when you are offline).



Published at DZone with permission of David Catuhe, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)