What approaches can I take to figure out the "relevancy" of certain terms in a string?

by Ben   Last Updated January 14, 2018 09:05 AM

I'm not even sure "relevancy" is the most accurate word, so I'll just describe the problem:

I'm building an app that needs to somehow parse product descriptions from a popular website (let's just say it's Amazon) and figure out which certifications the product has based on the text in the description alone. The descriptions for these products are not always written the same way (because they're written by different companies), but do always contain certain keywords that I'm looking for -- and the keywords have to be "close together" in the description in order to be considered for the resultset.

For example, given the following CSV data:

ProductName,ProductDescription
Product1,Product1 is a really cool product that is certified for Certification1 on Region1
Product2,Product2 has Region2 which has Certification3 and Region3 with Certification4. It also has Certification5

I'd want to generate the following output:

{  
   "Product1":{  
      "Region1":"Certification1",
      "UnknownRegions": []
   },
   "Product2":{  
      "Region2":"Certification3",
      "Region3":"Certification4",
      "UnknownRegions":[  
         "Certification5"
      ]
   }
}

I have almost no idea how to solve this problem, other than one thought: can some NLP algorithm help me to achieve the desired output above? If so, which one? I've heard of a technique called Named Entity Extraction but I don't know if it applies here or not.

Any advice is much appreciated here. Thank you in advance!



Related Questions


I need a listening files transcript writer

Updated February 08, 2018 04:05 AM


Transliteration NLP

Updated June 26, 2018 23:05 PM

Why was the static keyword used here?

Updated September 23, 2016 09:02 AM