NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Matches are the New Hotness

10.22.2012
| 3731 views |
  • submit to reddit

                          

How do you help a person without a job find one online? A search screen. How do you help a person find love online? A search screen. How do you find which camera to buy online? A search screen. How do you help a sick person self diagnose online? I have no idea, I go to the doctor. Doesn’t matter, what I want to tell you is that there is another way.

                                                      

Now, search is great. It usually helps people find what they’re looking for… but sometimes they have to dig through tons of stuff they don’t really want. Why? Because people can usually think of what they want, but not of what they don’t want to come back. So you end up with a tons of results that are not very relevant to your user…. and unless you are one of the major search engines, your search is not very smart.

Does your search know about your user? Does it know what your user has permissions to see?
How does it tie into personalized recommendations? Does it take advantage of the things we already know about a user? It’s this last question that’s most important, and I want to show you how you can use a graph database like Neo4j to help your users find what they want differently.

Pretend you are looking for a job. Think of the things you put on your resume. Your location, your education, your work history, your skills, your references, etc. Now pretend you ARE a job post. Yes, I know I am asking you to be an inanimate object. Just give it a shot. What do you want in a candidate? Someone near your location, with a certain level of education, and mix of skills, preferably vetted by someone you respect, etc.

Both the job candidate and job post are thinking about the same things, but if you look at a resume and a job description, you will realize they aren’t speaking the same language. Why not? It’s so obvious it has been driving me crazy for years and was one of the reasons I built Vouched and got into this Graph Database stuff in the first place. So let’s solve this problem with a graph.

I am going to make Candidates and Job Posts speak the same language by connecting them to location and skill nodes. I am keeping things simple, anything that has some relationship to both people and job posts could be used.

Mary lives in California, knows C# and knows PostgreSQL. There is a job post that requires a candidate be in California, know C# and know PostgreSQL. We connect these two together in our graph like so:

Now we can use our graph to get from Mary to this Job or any job where she would be a great candidate with the following Cypher query.

START me=node:users_index(name={user})
MATCH skills<-[:has]-me-[:lives_in]->city<-[:in_location]-job-[:requires]->requirements
WHERE me-[:has]->()<-[:requires]-job
WITH DISTINCT city.name AS city_name, job.name AS job_name,
LENGTH(me-[:has]->()<-[:requires]-job) AS matching_skills,
LENGTH(job-[:requires]->()) AS job_requires,
COLLECT(DISTINCT requirements.name) AS req_names, COLLECT(DISTINCT skills.name) AS skill_names
RETURN city_name, job_name, FILTER(name IN req_names WHERE NOT name IN skill_names) AS missing
ORDER BY matching_skills / job_requires DESC, job_requires
LIMIT 10

That looks a little complicated, let’s break it down piece by piece.

START me=node:users_index(name={user})

We start with the things we know, and right now all we know is we have a user named Mary that we can look up in our users_index and use as our starting point in the traversal. This query will be used by other users, so we parametrize it with {user}.

MATCH skills<-[:has]-me-[:lives_in]->city<-[:in_location]-job-[:requires]->requirements

Then we look for patterns in the graph that look like this MATCH segment above. Cypher knows “me”, it doesn’t know anything else, but as it traverses relationships from “me” it assigns nodes and relationships along the way as we name them. This is a little different if you are used to SQL where all your tables already have a name. Here I am naming all nodes that have an outgoing relationship of type “has” from “me” as skills. Because that is what they are in my mind, I could have called those nodes anything else, it doesn’t matter to Neo4j. I do the same for other nodes along the pattern.

WHERE me-[:has]->()<-[:requires]-job

I only want to match jobs WHERE I share at least one skill that it requires. Notice we are using a pattern as our where clause. We don’t name the nodes in between because at this point we only care that one exists.

WITH DISTINCT city.name AS city_name, job.name AS job_name,

We can chain queries together using the WITH clause which pipes the results from one query to the next. The DISTINCT clause returns only the unique records found. Here we start by passing the name property of both city and job.

LENGTH(me-[:has]->()<-[:requires]-job) AS matching_skills,

We continue to capture results that we want to pass to the next query. Here we are using the LENGTH clause not to find the length of the path between a user and jobs since we know that’s just 2 hops away, but instead using it to find the number of paths that exist between them.

LENGTH(job-[:requires]->()) AS job_requires,

We use the same trick again to get the number of skills required by each job.

COLLECT(DISTINCT requirements.name) AS req_names, COLLECT(DISTINCT skills.name) AS skill_names

Then we COLLECT the names of the requirements and skills into two arrays.

RETURN city_name, job_name, FILTER(name IN req_names WHERE NOT name IN skill_names) AS missing

We then RETURN the values we are interested in, and use the FILTER clause to return only the names of the skills of a job we do not know.

ORDER BY matching_skills / job_requires DESC, job_requires

We then ORDER BY the percentage of skills the user matches for the job in DESCending order, and use the number of required skills of the job as a tie breaker.

LIMIT 10

Lastly we LIMIT the number of results to the top 10.

We now have data we can display to a user.

A user would then be able to see what jobs they qualify for and what jobs they almost qualify for. Maybe they do know those skills and just forgot to mention it in their profile, or maybe it’s something they can learn quickly… or they can lie.

If you implement this in such a way that you get the user’s skills and location up front, you don’t even need to provide a search screen, and instead go right into results.

You can see it running live on Heroku and as always, the code is available on Github.

If you think you might have a use for this solution don’t hesitate to get in touch and I’ll be happy help.

 

 

 

Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)