SQL group by, having counterpart in FaunaDB

Hi,

I’m evaluating and trying to understand idiomatic usages of FaunaDB. Maybe I’m not yet thinking in the “Fauna way” but I couldn’t figure out how I would structure very classical aggregation queries in SQL world on FaunaDB.

The example in documentation is straightforward and takes advantage of sorting of indexes. But when it comes to sum or count I can’t make it work.

As an example, the following would give the number of customers per country:

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;

What’s the FQL equivalent of this query?

FaunaDB is not declarative so you’ll have to write a little bit more. There are reasons for that, the query you write is the exact query plan which is quite useful in pay-as-you-go to avoid surprises on the bill :). The verbosity makes it daunting at first but you know how to code, you’ll see that very very complex queries (especially those with conditional logic) are easier in FQL.

That means you will essentially think in the same way as if you would write it in regular code.
And in regular code, you would get all countries and then map over them and get the count per country. Then filter out all of the counts that are bigger than 5. You already know how to code and just have to apply the same principles.

We can start of simple. In my case I did (as I would do that) model the countries in a different collection. That way you don’t have to reduce the collection to unique countries anymore (group by is going to be more efficient) and it’s just clean design as well. In other words:


If you prefer to keep all in one collection, no problem, Do a match on the customers and call Distinct() on those and then you have your starting point to start mapping over as below.

Step 1:

We’ll use an index to get Customers by country.

{
  name: "customers_by_country",
  unique: false,
  serialized: true,
  source: "Customers",
  terms: [
    {
      field: ["data", "country"]
    }
  ]
}
Map(
  Paginate(Documents(Collection("Countries"))),
  Lambda(
    ["country"],
    Let(
      {
        customerRefs: Match(Index("customers_by_country"), Var("country"))
      },
      {
        customers: Var("customerRefs"),
        country: Var("country")
      }
    )
  )
)

This will result in:

{
  data: [
    {
      customers: {
        "@set": {
          match: Index("customers_by_country"),
          terms: Ref(Collection("Countries"), "272588574829838853")
        }
      },
      country: Ref(Collection("Countries"), "272588574829838853")
    },
    {
      customers: {
        "@set": {
          match: Index("customers_by_country"),
          terms: Ref(Collection("Countries"), "272588585684697607")
        }
      },
      country: Ref(Collection("Countries"), "272588585684697607")
    }
  ]
}

Sets since we did not Paginate on them yet. The country is a reference since we ddidn’t Get it yet, let’s go further! :slight_smile:

Step 2

Next up is counting, according to our docs Count() accepts a set, great we can directly apply the Count instead of paginating first! As you will see, Let() is your friend in structuring a query (we didn’t need to do that in step 1 but it makes things so much simpler once you start increasing the query complexity).

Map(
  Paginate(Documents(Collection("Countries"))),
  Lambda(
    ["country"],
    Let(
      {
        customerRefs: Match(Index("customers_by_country"), Var("country")),
        customerCount: Count(Var("customerRefs"))
      },
      {
        count: Var("customerCount"),
        country: Var("country")
      }
    )
  )
)

Result looks already closer to what we want:

{
  data: [
    {
      count: 2,
      country: Ref(Collection("Countries"), "272588574829838853")
    },
    {
      count: 1,
      country: Ref(Collection("Countries"), "272588585684697607")
    }
  ]
}

Step 3

And then we can continue and use Filter() which also takes a set (among others)… however, we’ll apply it on the result of the map that executes the count so we’re essentially already working on pages since Map only works on pages. The enforcement of pagination is different from what you are used to in SQL. FaunaDB considers that a good idea in a massive scalable pay-as-you-go database to protect the user from full tablescans in this way. If you need more data you can increase pagesize or request the next page.

For this we need to look carefully at the previous result. The ‘data’ in there is what you’ll typically see when you deal with pages but in this case we’re mapping over the elements of the page so you don’t need to take that into account.
To get the ‘count’ we’ll use Selec()t()

Filter(
  Map(
    Paginate(Documents(Collection("Countries"))),
    Lambda(
      ["country"],
      Let(
        {
          customerRefs: Match(Index("customers_by_country"), Var("country")),
          customerCount:  Count(Var("customerRefs"))
        },
        {
          count: Var("customerCount"),
          country: Var("country")
        }
      )
    )
  ),
  Lambda(['countRes'], GT(Select(['count'], Var('countRes')), 5))
)

In my case… that returns nothing so let me add some documents quickly…


{
  data: [
    {
      count: 7,
      country: Ref(Collection("Countries"), "272588574829838853")
    }
  ]
}

We can now choose to Map over that again and get the countries since I assume you’ll want more than the ref. We could have done that before but it’s a bit silly to get documents that we will filter out right?

Map(
  Filter(
    Map(
      Paginate(Documents(Collection("Countries"))),
      Lambda(
        ["country"],
        Let(
          {
            customerRefs: Match(Index("customers_by_country"), Var("country")),
            customerCount: Count(Var("customerRefs"))
          },
          {
            count: Var("customerCount"),
            country: Var("country")
          }
        )
      )
    ),
    Lambda(["countRes"], GT(Select(["count"], Var("countRes")), 5))
  ),
  Lambda(["countRes"], {
    country: Get(Select(["country"], Var("countRes"))),
    count: Select(["count"], Var("countRes"))
  })
 )

Too verbose? If you use a language driver, you can always split it up and reuse the queries. Since we are creating queries via function composition we can do a lot of reuse.


var GetCountAndCountryRef = Map(
  Paginate(Documents(Collection("Countries"))),
  Lambda(
    ["country"],
    Let(
      {
        customerRefs: Match(Index("customers_by_country"), Var("country")),
        customerCount: Count(Var("customerRefs"))
      },
      {
        count: Var("customerCount"),
        country: Var("country")
      }
    )
  )
)

Map(
  Filter(
    GetCountAndCountryRef,
    Lambda(["countRes"], GT(Select(["count"], Var("countRes")), 5))
  ),
  Lambda(["countRes"], {
    country: Get(Select(["country"], Var("countRes"))),
    count: Select(["count"], Var("countRes"))
  })
 )

Granted, there are many things we can still do to reduce the verbosity of FQL and we will ;).

1 Like

Hey, I would be pretty satisfied with an answer but you’ve written a blog post instead :slight_smile:

Thank you very much! That helps a lot.

1 Like