This is a topic that many people are looking for. thevoltreport.com is a channel providing useful information about learning, life, digital marketing and online courses …. it will help you have an overview and solid multi-faceted knowledge . Today, thevoltreport.com would like to introduce to you IR4.1 Vocabulary mismatch in IR. Following along are instructions in the video below:
suppose we have a query um and the queries us and Ralphie operations so viewers back this was all over the news so suppose that this is what the user is searching for and this is a document this is a new story and we want to see we want to measure the similarity of the query to that news story now will that means story be relevant to the query does it talk about the right stuff yes from the beginning very much so so its clear that it is relevant so its relevant so you would want to see a high similarity between this query and this document so how do we do it well weve just gone over this right we have our friend the vector space model and the vector space model says take
the query convert the query into a vector and were going to use individual words so thats what the letter is going to look like and these are the frequencies take the document convert the document into a vector again we have a bunch of single words and these are the frequencies of those words in the document you get it quite easily by splitting splitting the text on space as punctuation well talk about tokenization war in the end this lecture but for now we got a query we got a document and we know what to do right we have our trusty tf-idf weighted sum so we just need to compute this quantity and computing it is really really easy right you could sketch it in Python in three lines of code so what you need
to do is compute the IDFs though the idfs into into a dictionary and then what you do is for every token in your query you take your similarity and you increment it by a tf-idf write an IDF is a dictionary that contains IDF values for every word but youve pre computed and TF is just these these these frequencies so it all looks nice and simple until you realize that there is a problem and the problem is this when youre doing that token W in the query better be the same exact token as this WTF and they are not so we have the query has us written as youd all air starts and the document has us as as without the dots so theyre not the same thing the query has the word ends
and the document has the word end has it three times but that doesnt help there are different strings so they wont match Gaddafi is spelled in two different ways its spelled with a Q in the query and its filled with G in the in the document and those wont match and of course the query mentions the operation and the document mentions a mission now you as a human know that an operation and mission mean the same thing but Python doesnt know that and you know that Qaddafi and Gaddafi probably you refer to the same guy because they sound similar but again Python doesnt know that so what this what this lecture is about and its actually going to be a sequence of lectures well talk about how to overcome all of these difficulties
Vocabulary Mismatch, Web Search Engine (Website Category)
Thank you for watching all the articles on the topic IR4.1 Vocabulary mismatch in IR. All shares of thevoltreport.com are very good. We hope you are satisfied with the article. For any questions, please leave a comment below. Hopefully you guys support our website even more.