Monday, 25 March 2013

Best way of searching by word and sorting the result by number of matching in MySql database

This is a wonderful search technique on MySql data base, I came across while I was in my last internship program at WowWay labs Pvt. Ltd.

The Problem statement was something like this:
 I had a table with all the product details and along with name of the products. I have to search the table by field product_name, and the search result should be prominent, like, if you search "Nokia Lumia 800", then at first all the Nokia Lumia 800, then all the Nokia Lumia products and after that all the Nokia product, as per the requirement. There was lot of algorithms to follow, but none of them came with the exact result all the time, some flaws were there. After working a lot on it I found out about this:
Suppose we got a string "Nokia Lumia 800"
We can simply split it into three part that is, "Nokia", "Lumia", "800", then search for each and every token in product_name, the maximum match found will be sorted to the upper level. For example, if I have a product with product_name "Nokia Lumia 800", it got 3 match found, if there's product named "Nokia Lumia 510", then 2 match found and so on.

So the sql query for this operation will be:

SELECT * FROM (
                                SELECT * ,(
                                       CASE WHEN product_name LIKE '%Nokia%' THEN 3 ELSE 0 END                                   + CASE WHEN product_name LIKE '%Lumia%' THEN 2 ELSE 0 END
                                    + CASE WHEN product_name LIKE '%800%' THEN 1 ELSE 0 END
                                  ) AS numMatches
                                FROM Production ) AS t
                  WHERE numMatches >= 1
 ORDER BY numMatches DESC;

What it is doing basically, creating a virtual field "numMatches", where it's storing the number of matching values for every corresponding product_name.
In this case, the product_name got a match with "Nokia" it'll add 3 to the numMatches, in the same way matching "Lumia" will give it a got for 2 more credit total 3+2 = 5, and for "800" will give 1 credit and all total 5+1 = 6. Now the table will look like something:

product_name                                     numMatches

Nokia Lumia 800                                      6
Nokia Lumia 500                                      5
Nokia Lumia 510                                      5
Nokia 100                                                1

so on..

So basically who have at least one match that can fill up the result but in a order, who has maximum string matched comes fast.

This is faster than any other easy approach and this is also easy indeed. You can modify the algorithm as per your requirement. Hope this will be helpful to you.

Thank you 

No comments:

Post a Comment