Query an index against an n-sized input-array

Hello everybody,

I’m trying to work out a query for multiple conditions.

One condition is to check a particular index against each item in an n-sized “input-array”, i.e. ["cafe", "pub", "restaurant", "foo", "bar", "etc..."].

The following example code is working, but note the repetitive and cumbersome code inside the Union(), which obviously wouldn’t scale for an n-sized input-array.

Note that each document also has an n-sized “Categories” array, e.g. "categories":["cafe","restaurant"], that I want to check the input-array against.

The goal of the code inside the Union() is to retrieve all documents that contain any value from the input-array in their categories-array, if that makes sense.

Finally, a document should only appear once in the final output, even if it has multiple matches throughout the categories-array vs input-array check.

I’ve tried various ways of Map()ing over the input-array, but I’m just going in circles and wasn’t able to get the query working in a scalable way.

Paginate(
  Intersection(
    Join(
      Range(Match(Index("businesses_by_latitude")),[-38.020203088869685],[-37.66123013113332]),
      Lambda(["value", "ref"], Match(Index("businesses_by_ref"), Var("ref")))
    ),
    Join(
      Range(Match(Index("businesses_by_longitude")),[144.70098123095727],[145.1919327690424]),
      Lambda(["value", "ref"], Match(Index("businesses_by_ref"), Var("ref")))
    ),
    Union(
      Join(
        Match(Index("businesses_by_category"), ["cafe"]),
        Lambda(["ref"], Match(Index("businesses_by_ref"), Var("ref")))
      ),
      Join(
        Match(Index("businesses_by_category"), ["pub"]),
        Lambda(["ref"], Match(Index("businesses_by_ref"), Var("ref")))
      ),
      Join(
        Match(Index("businesses_by_category"), ["restaurant"]),
        Lambda(["ref"], Match(Index("businesses_by_ref"), Var("ref")))
      )
    )
  )
)

For reference, the following are the indexes that are being used in above query:

{
  name: "businesses_by_ref",
  unique: false,
  serialized: true,
  source: "businesses",
  terms: [
    {
      field: ["ref"]
    }
  ],
  values: [
    {
      field: ["ref", "id"]
    },
    {
      field: ["data", "latitude"]
    },
    {
      field: ["data", "longitude"]
    }
  ]
}
{
  name: "businesses_by_category",
  unique: false,
  serialized: true,
  source: "businesses",
  terms: [
    {
      field: ["data", "categories"]
    }
  ],
  values: [
    {
      field: ["ref"]
    }
  ]
}
{
  name: "businesses_by_latitude",
  unique: false,
  serialized: true,
  source: "businesses",
  values: [
    {
      field: ["data", "latitude"]
    },
    {
      field: ["ref"]
    }
  ]
}
{
  name: "businesses_by_longitude",
  unique: false,
  serialized: true,
  source: "businesses",
  values: [
    {
      field: ["data", "longitude"]
    },
    {
      field: ["ref"]
    }
  ]
}

Note that I raised the initial question on Slack yesterday (ref), and Brecht suggested I post here if I’m still having trouble…

Just in case other people come across a similar question, this is how I resolved the issue:

I ended up changing the data type of categories to a single string with comma-separated values, instead of an array of strings, in order to pull down this ‘fake’ array in one line.

(Had I left categories to be an array, I would have received multiple lines per document; one for each value inside of the array). The filtering I was after is now happening inside my app using plain js.

That being said, if this filtering is possible through FQL as per my original post, I’d still be very interesting to know how to actually do that. My workaround is fine because I only have a small data set.

Just to be clear, what you are trying to do is a kind of and + or query right? I think it makes sense that you clarify exactly what the Union part is or should be doing. Does it simply check whether a specific document is either a cafe, a pub or a restaurant?

But I do think this is indeed what I described above, then one question remains. How long will this array become? Do you dynamically select a few elements you want to search for, how many would you select?

Besides of that, you have indeed based yourself heavily on this specific stackoverflow post. That’s great since it’s one element that you need but realize that it’s targeted towards multiple range conditions. Which is good since you also need that. It seems however that you are applying this as well to exact matches.

More specifically this part

You can drop the join completely here since the values of your index only contains ref. The join is actually dropping the other values besides of the ref but since the only value it contains is a ref, it’s a noop. In essence this could already reduce your query to:

Paginate(
  Intersection(
    Join(
      Range(Match(Index("businesses_by_latitude")),[-38.020203088869685],[-37.66123013113332]),
      Lambda(["value", "ref"], Match(Index("businesses_by_ref"), Var("ref")))
    ),
    Join(
      Range(Match(Index("businesses_by_longitude")),[144.70098123095727],[145.1919327690424]),
      Lambda(["value", "ref"], Match(Index("businesses_by_ref"), Var("ref")))
    ),
    Union(
      Match(Index("businesses_by_category"), "cafe"),
      Match(Index("businesses_by_category"), "pub"),
      Match(Index("businesses_by_category"), "restaurant")
    )
  )
)

Much better already right? (you can also drop the arrays if you are only matching on one element).

I think there are two solutions to your problem.

1. Use the host language from the driver

(if you can? not sure you are writing a UDF or are writing FQL in a driver)

Realize that FQL functions are just functions. Imagine I’m using JavaScript, then you can manipulate them with JS to construct your query. For example, something in the vein of (did not run the code, be wary for syntax errors):

const matches = ['cafe', 'pub', 'restaurant']
const args = matches.map(m => Match(Index("businesses_by_category"), m))
...
Union(...args);
...

This is, of course, constructing a query. If you need your query to be predefined yet take dynamic parameters (e.g. in a UDF), it won’t work. But there is a more beautiful solution.

2. Map + Union

If fact, you were quite close with the Map idea. What I suspect that you didn’t realize is that Union is intelligent and acts a bit like a flattener.

So we can match with Map.

Map(
     ["restaurant", "pub"], 
     Lambda(["m"], Match(Index("businesses_by_category"), Var("m")))
)

And you’ll get an array of sets:

[
  {
    "@set": {
      match: Index("businesses_by_category"),
      terms: "restaurant"
    }
  },
  {
    "@set": {
      match: Index("businesses_by_category"),
      terms: "pub"
    }
  }
]

The cool thing is that Union can work both with a set as with an array of sets.

Paginate(Union(
  Map(["restaurant", "pub"], Lambda(["m"], Match(Index("businesses_by_category"), Var("m"))))
))

And with paginate in front it will simply return the same format as you had before.

{
  data: [
    Ref(Collection("businesses"), "274099702193979909"),
    Ref(Collection("businesses"), "274099775405556230"),
    Ref(Collection("businesses"), "274099791398437381")
  ]
}

Finally:

I don’t really get this request. The query you currently have already does that right? I suspect that you might expect Union to throw out doubles, however Union is just like a Union in Set theory. FQL is heavily based on that (we also call these non-materialized results ‘sets’)

Union(['A', 'B', 'C'], ['B', 'C'], ['B', 'C', 'D']),
>   [ 'A', 'B', 'C', 'D' ],