FQL Query for Prefix of Search term

I’m trying to write a query to retrieve documents which are tagged with a prefix
corresponding to possible search strings. To put it concretely:

  • I have transactions with hierarchical categories of the form
    {category: ["Healthcare", "Medecine", "Family Medicine"]}.
  • I’m storing mappings which correspond to a subtree of the hierarchy, such
    as:
    • {prefix: ["Healthcare", "Medecine"], color: 'Yellow'}
    • {prefix: ["Shops", "Tobacco"], color: 'Green'}
    • {prefix: ["Bank Fees"], color: 'Red'}
  • When a new transaction comes in, I’m trying to find the colour which matches
    its full category.

For now, I’ve solved the problem by indexing on a string derived from the prefix
arrays ("Healthcare#Medecine"). When I need to retrieve a colour, I try to
index on all possible prefixes of the category like so:

Get(
  Select(
    [0],
    Filter(
      Map(
        [
          'Healthcare#Medecine#Family Medicine',
          'Healthcare#Medecine',
          'Healthcare',
        ],
        (x) => Match(Index('category_by_prefix'), x)
      ),
      (x) => Exists(x)
    ),
    null
  )
);

This works, but it feels like there ought to be a solution that doesn’t involve
making multiple reads each time. Any ideas?

I would look into using Intersection for the query. I also have a few cases myself were I stringify a list in an index binding so that would be another option that would allow you to lookup by the full category.

I’m not following 100%.
It seems to me though that you could preprocess these prefixed that you want to match on and save them in an index binding?

Like the Ngram solution here… only you would do it with prefixes instead.

Sorry, I’m sure that there’s a better way of explaining myself but I haven’t found it yet!

I actually already have an index binding so that a document with the category “prefix” ["Healthcare", "Medecine"] is indexable as Healthcare#Medecine.

The problem is that Healthcare#Medecine is a category “prefix” - representing not just one category but a whole subtree of categories (Healthcare#Medecine#Family Medecine, Healthcare#Medecine#Alternative Medecine, …).

Given a “full” category (such as [A, B, C, D]) I want to retrieve documents with a matching prefix (such as [A,B] or [A,B,C] but not [A,D], for example). Currently I’m doing it by searching separately for every possible prefix (i.e. ABCD, ABC, AB, and A in this example), but I was hoping that there would be a more efficient solution - possibly including range queries or something.

I also just wanted to add this to be explicit: the possible “postfixes” (not sure that’s a word!) are not known at the time the document is created. By “postfix” I mean any string that can be constructed from a given prefix. e.g. for a prefix “ABC” the possible postfixes would be “ABCD”, “ABCDDD”, “ABCPizza”, etc.

Given that, I can’t really adopt the ngram approach of precomputing every possible search string. Because they’re prefixes though, it intuitively feels like there’s a solution using sorted ranges, but maybe that’s just me being too optimistic!

Sorry for the late response, I only check in periodically on the forums, since I had a lot of work lately.

Quick explanation of how it would work
Yes you can write a binding that transforms documents that have the following tags in:

doc1: { ref: 'hypotheticref1', data: { tag: 'Healthcare#Medecine#Family Medecine', ...} }
doc2: { ref: 'hypotheticref2',  data: { tag: 'Healthcare#Medecine#Alternative Medecine', ...}}

You would then write a binding on the tag that introduces a new attribute… let’s say: prefixes which transforms that tag in a set of prefixes.

That set of prefixes is of course an array, you can index the values within an array so that’s fine. You could put that binding in terms and the reference in values. If I would invent the structure on how such an index might look… you would get something like that:

[ 
  // doc1 entries
   { values: ['hypotheticref1'], terms: ['Healthcare']},
   { values: ['hypotheticref1'] terms: ['Healthcare#Medecine']},
   { values: ['hypotheticref1'] terms: ['Healthcare#Medecine#Family Medecine']},

   // doc2 entries
   { values: ['hypotheticref2'], terms: ['Healthcare']},
   { values: ['hypotheticref2'] terms: ['Healthcare#Medecine']},
   { values: ['hypotheticref2'] terms: ['Healthcare#Medecine#Alternative Medecine']},
]

You can then easily write a query that searches for the term ‘Healthcare#Medecine’ and get both the reference of hypotheticref1 and hypotheticref2

What I assume you are struggling with ( might be wrong)
Fauna’s FQL does not have loops at the moment. That makes it quite cumbersome to retrieve prefixes like that. A workaround is to start with an array (of course, the amount of prefixes is then limited but I assume that’ll be fine for your usecase) and loop over that. For example:

q.Map(
        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
        Lambda('index', <FQL to split on '#' and take the first n elements>)
 ),

We also use that in the fwitter example we wrote: https://github.com/fauna-brecht/fwitter/blob/master/src/fauna/setup/searching.js

If you need inspiration for a StringSplit, Eigil has a cool NPM library that includes that: https://github.com/shiftx/faunadb-fql-lib/blob/master/src/functions/StringSplit.ts

The other function in his library (Slice, which we will soon have in native FQL) will also help you significantly: https://github.com/shiftx/faunadb-fql-lib/blob/master/src/functions/Slice.ts