How to optimally use FQL to convert from SQL that has OR and/or AND with <> in the WHERE clause?

I’m having a hard time understanding how to use multiple WHERE clause conditions that include an OR and/or AND and do value comparisons in SQL to make the equivalent request in FQL in an efficient way.

Here’s an example that I’d like to convert:

SELECT * FROM sometable WHERE docType = "xyz" AND
     docID = "123-123-123-123" AND 
     ((isRead = false AND status <> "pending") OR (isShipping = true)) 
     ORDER BY created DESC;

The most efficient way to perform the search that you want is to use Indexes. Combining multiple Indexes can take some planning to get the order of operations correct.

theory for WHERE clause

One thing you did not bring up is inequalities (i.e. searching for less-than or greater-than), but it’s important to note it. See this Stack Overflow post for a good discussion on that.

What we need is a separate Index with the respective term for every equality condition (= or <>), and an Index with the respective value for every inequality condition (<, >, <=, >=). Then we can use Set operations to combine the Indexes (See Set functions in the FQL cheat sheet for reference).

In general the following WHERE clauses will roughly translate into using FQL functions

  • ANDIntersection
  • ORUnion
  • NOTDifference
  • any inequalities → Range

theory for ORDERED BY clause

For ORDER BY, sorting with FQL requires that either all Indexes output the specific value you wish to sort on, or by using the Join function to apply one additional Index to sort.

How to apply to your query

Let’s work from the innermost parts out. We’ll create the necessary Indexes and then build up the query piece by piece.

isRead = false

CreateIndex({
  name: "sometable_by_isRead",
  source: Collection("SomeCollection"),
  terms: [{ field: ["data", "isRead"] }]
})
isRead = false --> 

Match(Index("sometable_by_isRead"), false)

_ AND status <> “pending”

CreateIndex({
  name: "sometable_by_status",
  source: Collection("SomeCollection"),
  terms: [{ field: ["data", "status"] }]
})

Consider that _ <> _ is equivalent to NOT _ = _.

_ AND status <> "pending" -->
_ AND NOT status = "pending" --> 

Difference(_, Match(Index("sometable_by_status"), "pending"))

// including the previous Index is
Difference(
  Match(Index("sometable_by_isRead"), false), 
  Match(Index("sometable_by_status"), "pending")
)

_ OR (isShipping = true)

CreateIndex({
  name: "sometable_by_isShipping",
  source: Collection("SomeCollection"),
  terms: [{ field: ["data", "isShipping"] }]
})
_ OR (isShipping = true) -->

Union(_, Match(Index("sometable_by_isShipping"), true))

// including the previous Index is
Union(
  Difference(
    Match(Index("sometable_by_isRead"), false), 
    Match(Index("sometable_by_status"), "pending")
  ),
  Match(Index("sometable_by_isShipping"), true)
)

docType = “xyz” AND docID = “123-123-123-123” AND _

CreateIndex({
  name: "sometable_by_docType",
  source: Collection("SomeCollection"),
  terms: [{ field: ["data", "docType"] }]
})

CreateIndex({
  name: "sometable_by_docID",
  source: Collection("SomeCollection"),
  terms: [{ field: ["data", "docID"] }]
})
docType = "xyz" AND docID = "123-123-123-123" AND _ -->

Intersection(
  Match(Index("sometable_by_docType"), "XYZ"),
  Match(Index("sometable_by_docID"), "123-123-123-123"),
  _
)

// including the previous Index is
Intersection(
  Match(Index("sometable_by_docType"), "XYZ"),
  Match(Index("sometable_by_docID"), "123-123-123-123"),
  Union(
    Difference(
      Match(Index("sometable_by_isRead"), false), 
      Match(Index("sometable_by_status"), "pending")
    ),
    Match(Index("sometable_by_isShipping"), true)
  )
)

ORDER BY created DESC

For ORDER BY created DESC we will join with another Index.

CreateIndex({
  name: "sometable_by_ref__created_desc",
  source: Collection("SomeCollection"),
  terms: [{ field: ["ref"] }],
  values: [
    { field: ["data", "created"], reverse: true }, // this value to sort by
    { field: ["ref"] },                            // this ref to obtain the full Document
  ]
})
Join(
  Intersection(
    Match(Index("sometable_by_docType"), "XYZ"),
    Match(Index("sometable_by_docID"), "123-123-123-123"),
    Union(
      Difference(
        Match(Index("sometable_by_isRead"), false), 
        Match(Index("sometable_by_status"), "pending")
      ),
      Match(Index("sometable_by_isShipping"), true)
    )
  ),
  Index("sometable_by_ref__created_desc")
)

SELECT *

For the SELECT * part, we will use Map to transform our results into Documents.

Map(
  Paginate(
    Join(
      Intersection(
        Match(Index("sometable_by_docType"), "XYZ"),
        Match(Index("sometable_by_docID"), "123-123-123-123"),
        Union(
          Difference(
            Match(Index("sometable_by_isRead"), false), 
            Match(Index("sometable_by_status"), "pending")
          ),
          Match(Index("sometable_by_isShipping"), true)
        )
      ),
      Index("sometable_by_ref__created_desc")
    )
  ),
  Lambda(["created", "ref"], Get(Var("ref")))
)
1 Like

@ptpaterson thank you so much! This is immensely helpful and your explanation of SQL to Fauna equivalents helps so much with the learning curve. The examples and further explanations are super helpful too! I’ve read this multiple times and it’s starting to sink in and I’m sure I’ll be coming back to this many times as my guide! :slight_smile:

I did run in to 2 issues when trying to follow above.

  1. When creating the Index that will be used for the Join I get the following error:
Error: [
  {
    "position": [
      "create_index"
    ],
    "code": "validation failed",
    "description": "document data is not valid.",
    "failures": [
      {
        "field": [
          "source"
        ],
        "code": "value required",
        "description": "Required value not found."
      }
    ]
  }
]

I think I solved the error correctly by adding the source: Collection("SomeCollection") where I want the Index to be created? Adding that did create the index successfully in the specified collection:

CreateIndex({
  name: "sometable_by_ref__created_desc",
  source: Collection("SomeCollection"),
  terms: [{ field: ["ref"] }],
  values: [
    { field: ["data", "created"], reverse: true },
    { field: ["ref"] },
  ]
})
  1. With the above index created successfully, and trying the full code with the Join, the moment I introduce the Join I get the following error:
Error: [
  {
    "position": [
      "collection"
    ],
    "code": "invalid argument",
    "description": "Array or Page expected, Set provided."
  }
]

Also, is it correct understanding that I must use the shell or driver to create the Index with setting reverse to true as I did not see it in the web UI for creating index?

1)

Oh my… yes I left the source field out of all of the CreateIndex functions :person_facepalming:. That is a good catch – I’ve updated my post.

2)

I also forgot to wrap the query in a Paginate function. I’ve updated the post again. The Map function expects to be passed an Array or a Page. Paginate returns a Page. Join takes a Set and returns another Set.

3)

That is correct, the Dashboard does not have an option to set reverse on your values, so you will need to use the shell.

@bomletepso I should also highlight that creating separate Indexes for each individual criteria is helpful for flexible query operations, but can be optimized if you are doing the same search regularly.

Intersection can be expensive. If it’s possible, combine multiple criteria into fewer indexes. That is, use multiple terms in a single Index and use that instead:

CreateIndex({
  name: "sometable_by_docType_and_docID",
  source: Collection("SomeCollection"),
  terms: [
    { field: ["data", "docType"] },
    { field: ["data", "docID"] }
  ]
})

It really depends on your use case, the arity of your Index terms, etc.

An optimization example

For your query we could even do some boolean algebra to reshape what you have,

WHERE a AND b AND ((c AND NOT d) OR e)

to

WHERE (a AND b AND c AND NOT d) OR (a AND b AND c AND e)

We can use this fact to pare down what we came up with before. I can’t tell you for certain if it would perform better – in general, the fewer Set operations (Intersection, Union, etc.) you have, the better! Again, it ultimately depends on your data, and you should to run your own tests to find out.

The main point, though, is that you have options. Importantly, you have the power to make really good optimizations for your specific data.

CreateIndex({
  name: "special_search_by_isRead",
  source: Collection("SomeCollection"),
  terms: [
    { field: ["data", "docType"] },
    { field: ["data", "docID"] },
    { field: ["data", "isRead"] },
  ]
})

CreateIndex({
  name: "special_search_by_isShipping",
  source: Collection("SomeCollection"),
  terms: [
    { field: ["data", "docType"] },
    { field: ["data", "docID"] },
    { field: ["data", "isShipping"] },
  ]
})

With those added Indexes, we can do the following query, and eliminate the Intersection function!

Map(
  Paginate(
    Join(
      Union(
        Difference(
          Match(Index("special_search_by_isRead"), ["XYZ", "123-123-123-123", false]), 
          Match(Index("sometable_by_status"), "pending")
        ),
        Match(Index("special_search_by_isShipping"), ["XYZ", "123-123-123-123", true]), 
      ),
      Index("sometable_by_ref__created_desc")
    )
  ),
  Lambda(["created", "ref"], Get(Var("ref")))
)

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.