I recently changed my search functionality to use trigrams with NGrams. The only issue is, I’m getting back results that contain one part of my input string before I get the relevant result. I am not searching one word at a time, I am trying to search a list of FAQs with questions as the input string. Sometimes I may enter the exact question as my input query that my index but still I wont get the desired result until much later in the resulting array because other questions in the database have similar keywords.
Here is my FQL function. (The _ params are only there because its required by GraphQL on paginated functions.)
Query(
Lambda(
["search_input", "_", "__", "___",],
Map(
Paginate(
Union(
Map(
NGram(Var("search_input"), 3, 3),
Lambda(
'nGramInput',
Match(Index('search_qa_fuzzy'), Var('nGramInput'))
)
)
)
),
Lambda('ref', Get(Var('ref')))
)
)
)
Here is my index
CreateIndex({
name: 'search_qa_fuzzy',
source: {
collection: Collection('freq_asked_questions'),
fields: {
ngrams: Query(
Lambda(
'questionDoc',
Distinct(
NGram(
LowerCase(
Select(['data', 'question'], Var('questionDoc'))
),
3,
3,
)
)
)
)
}
},
terms: [
{
binding: 'ngrams'
}
]
})
For example purposes, say the 5 questions I have in my database are:
“How big is the Earth?”
“How many people live on Earth?”
“What planets are further than Earth to the sun?”
“Is the Earth over 2 billion years old”
“How long ago did dinosaurs live on Earth?”
Currently if I enter this question as my search term: “How long ago did dinosaurs live on Earth?”, it will show up last in my results, despite being an exact match to one of my questions. The other questions will populate as they also contain the word “Earth”.
The only thing I can think of would be to count how many Ngrams in my input query have the most matches to the final result and sort the results by the highest matching Ngrams count. (Essentially creating a ranking system) I would believe this would make my most relevant results appear first as the most matches for the amount of Ngrams should more resemble the question than other results.
i.e. all other results may have a count of 1 because only “Earth” matches while the exact question would have a count of 7 because all words match (Of course these numbers are not accurate because it would not be the words themselves matching but instead substrings of 3 characters each for each Ngram of the words, but you get the point)
I just have no idea how to do this in FQL. Any ideas?