Index binding: Transform value n times (Uber H3 algorithm)

Hey everyone,

I’m trying to implement Uber’s H3 Geospatial indexing system (https://h3geo.org/) within Fauna.

For each LngLat, I need to create a binding – parentH3s – which is an array of all the parent H3s pertaining to each LngLat.

To create the index, I loop over all the resolutions I would like to calculate [0, ..., 11], then for each resolution, I calculate the parent H3: which involves iterating over an array [resolution + 1, ..., 12], and transforming the output on each iteration to get the final parentH3.

The issue is the Reduce function – because this can’t be used in an index binding. I’ve also tried recursion, using a Foreach + Merge to update an existing Var, and a couple of other things, but I can’t seem to get it working.

Here’s an example of what I’d like to achieve:

export default {
  name: 'lngLatByParentH3s',
  source: {
    collection: Collection('LngLat'),
    fields: {
      parentH3s: Query(
        Lambda(
          'doc',
          Map(
            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], // Resolutions of parent H3s to generate for each document
            Lambda(
              'res',
              Reduce(
                Lambda(['nextH3', 'nextRes'], UpdateH3(Var('nextH3'), Var('nextRes'))), // Update H3 until end of loop
                SetInitialH3(ToInteger(Select('h3', Var('doc'))), Var('res')), // Set initial value
                Drop(Add(Var('res'), 1), [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) // Drop res + 1 from start of array
              ),
            ),
          ),
        ),
      ),
      // Final result should look like this:
      // [ '578938781177613823', '583442380804984319', '587945980432354815', ..., '628478377078689279' ]
    },
  },
  terms: [{
    binding: 'parentH3s'
  }],
}

Are there any other ways I can achieve this?

The short answer is to pre-calculate all the constants that you can. Since you cannot Call a UDF, then recursive functions are also not possible, so you need to “unroll” what that recursive function would have meant.

Let’s see what it takes to determine parents for an H3 index.

H3 format

Here is the binary format for an H3 index, which is represented with a 64-bit integer.

// Reserved
1000000000000000000000000000000000000000000000000000000000000000

// Index Mode
0111100000000000000000000000000000000000000000000000000000000000

// Mode dependent
0000011100000000000000000000000000000000000000000000000000000000

// resolution
0000000011110000000000000000000000000000000000000000000000000000

// base cell
0000000000001111111000000000000000000000000000000000000000000000

// resolution digits
0000000000000000000111000000000000000000000000000000000000000000
0000000000000000000000111000000000000000000000000000000000000000
0000000000000000000000000111000000000000000000000000000000000000
0000000000000000000000000000111000000000000000000000000000000000
0000000000000000000000000000000111000000000000000000000000000000
0000000000000000000000000000000000111000000000000000000000000000
0000000000000000000000000000000000000111000000000000000000000000
0000000000000000000000000000000000000000111000000000000000000000
0000000000000000000000000000000000000000000111000000000000000000
0000000000000000000000000000000000000000000000111000000000000000
0000000000000000000000000000000000000000000000000111000000000000
0000000000000000000000000000000000000000000000000000111000000000
0000000000000000000000000000000000000000000000000000000111000000
0000000000000000000000000000000000000000000000000000000000111000
0000000000000000000000000000000000000000000000000000000000000111

Useful Constants

We can obtain some useful constants so we can manipulate the H3 index. We need to use the Decimal number with fauna, so the decimal is recorded to the right of the binary.

// resolution bit mask RMASK
0000000011110000000000000000000000000000000000000000000000000000   67553994410557440

// resolution clear mask RCLEAR
0111111100001111111111111111111111111111111111111111111111111111   9155818042444218367

// resolutions RES[x]
[
0000000000000000000000000000000000000000000000000000000000000000   0
0000000000010000000000000000000000000000000000000000000000000000   4503599627370496
0000000000100000000000000000000000000000000000000000000000000000   9007199254740992
0000000000110000000000000000000000000000000000000000000000000000   13510798882111488
0000000001000000000000000000000000000000000000000000000000000000   18014398509481984
0000000001010000000000000000000000000000000000000000000000000000   22517998136852480
0000000001100000000000000000000000000000000000000000000000000000   27021597764222976
0000000001110000000000000000000000000000000000000000000000000000   31525197391593472
0000000010000000000000000000000000000000000000000000000000000000   36028797018963968
0000000010010000000000000000000000000000000000000000000000000000   40532396646334464
0000000010100000000000000000000000000000000000000000000000000000   45035996273704960
0000000010110000000000000000000000000000000000000000000000000000   49539595901075456
0000000011000000000000000000000000000000000000000000000000000000   54043195528445952
0000000011010000000000000000000000000000000000000000000000000000   58546795155816448
0000000011100000000000000000000000000000000000000000000000000000   63050394783186944
0000000011110000000000000000000000000000000000000000000000000000   67553994410557440
]

// resolution digit filler FILL[x]
[
0000000000000000000111111111111111111111111111111111111111111111   35184372088831
0000000000000000000000111111111111111111111111111111111111111111   4398046511103
0000000000000000000000000111111111111111111111111111111111111111   549755813887
0000000000000000000000000000111111111111111111111111111111111111   68719476735
0000000000000000000000000000000111111111111111111111111111111111   8589934591
0000000000000000000000000000000000111111111111111111111111111111   1073741823
0000000000000000000000000000000000000111111111111111111111111111   134217727
0000000000000000000000000000000000000000111111111111111111111111   16777215
0000000000000000000000000000000000000000000111111111111111111111   2097151
0000000000000000000000000000000000000000000000111111111111111111   262143
0000000000000000000000000000000000000000000000000111111111111111   32767
0000000000000000000000000000000000000000000000000000111111111111   4095
0000000000000000000000000000000000000000000000000000000111111111   511
0000000000000000000000000000000000000000000000000000000000111111   63
0000000000000000000000000000000000000000000000000000000000000111   7
]

Pseudo calculations

Given an index with resolution n, here are some steps to calculate the parents at resolutions lower than n

h3[n] = some_index

h3_cleared_res = h3[n] & RCLEAR

h3[n-1] = h3_cleared_res | RES[n-1] | FILL[n-1]
h3[n-2] = h3_cleared_res | RES[n-2] | FILL[n-2]
h3[n-3] = h3_cleared_res | RES[n-3] | FILL[n-3]
...
h3[2]   = h3_cleared_res | RES[2]   | FILL[2]
h3[1]   = h3_cleared_res | RES[1]   | FILL[1]
h3[0]   = h3_cleared_res | RES[0]   | FILL[0]

As long as we know the current resolution, we can directly calculate valid parent H3 indexes for a given index.

Testing in FQL and estimating cost

I like to make regular request that mock an Index binding well before I create an Index with bindings. This lets me iterate quickly and debug errors. It also lets me inspect how much each query costs, so I can estimate how many Compute Ops it will cost to calculate the binding.

Code

Define constants

set the resolution bit and clear masks

      RMASK: 67553994410557440,
      RCLEAR: 9155818042444218367,

To avoid unnecessary use of the Select function (which would mean more compute ops), put all of the RES and FILL constants into a table

      PARENT_CONSTS: [
        [63050394783186944, 7],
        [58546795155816448, 63],
        [54043195528445952, 511],
        [49539595901075456, 4095],
        [45035996273704960, 32767],
        [40532396646334464, 262143],
        [36028797018963968, 2097151],
        [31525197391593472, 16777215],
        [27021597764222976, 134217727],
        [22517998136852480, 1073741823],
        [18014398509481984, 8589934591],
        [13510798882111488, 68719476735],
        [9007199254740992, 549755813887],
        [4503599627370496, 4398046511103],
        [0, 35184372088831],
      ],

Filter the table

We need to remove the resolutions from the table that would result in invalid indexes. That is, we want to only calculate resolutions that are lower than the given index.

First, get the index from the doc. Using your example that could look like this

      h3: Select(["data", "h3"], Var("doc")),

Next, calculate the current resolution

      current_res: BitAnd(Var("h3"), Var("RMASK")),

Now we have enough to filter the table

      parent_consts_filtered: Filter(
        Var("PARENT_CONSTS"),
        Lambda(["res", "_"], LT(Var("res"), Var("current_res")))
      ),

Calculate the parent indexes

To finish the math, we need to clear the resolution bit

      h3_nores: BitAnd(Var("h3"), Var("RCLEAR")),

And we can reuse that value to calculate all of the parents

      parents: Map(
        Var("parent_consts_filtered"),
        Lambda(
          ["res", "fill"],
          BitOr(Var("h3_nores"), Var("res"), Var("fill"))
        )
      )

Put it all together

Let(
  {
    doc: { data: { h3: "644722037633318912" } },
  },
  // START MOCK BINDING
  Let(
    {
      // Define constants
      RMASK: 67553994410557440,
      RCLEAR: 9155818042444218367,
      PARENT_CONSTS: [
        [63050394783186944, 7],
        [58546795155816448, 63],
        [54043195528445952, 511],
        [49539595901075456, 4095],
        [45035996273704960, 32767],
        [40532396646334464, 262143],
        [36028797018963968, 2097151],
        [31525197391593472, 16777215],
        [27021597764222976, 134217727],
        [22517998136852480, 1073741823],
        [18014398509481984, 8589934591],
        [13510798882111488, 68719476735],
        [9007199254740992, 549755813887],
        [4503599627370496, 4398046511103],
        [0, 35184372088831],
      ],

      // Filter the table
      h3: ToInteger(Select(["data", "h3"], Var("doc"))), // don't forget `data` in the Select path
      current_res: BitAnd(Var("h3"), Var("RMASK")),
      parent_consts_filtered: Filter(
        Var("PARENT_CONSTS"),
        Lambda(["res", "_"], LT(Var("res"), Var("current_res")))
      ),

      // Calculate the parent indexes
      h3_nores: BitAnd(Var("h3"), Var("RCLEAR")),
      parents: Map(
        Var("parent_consts_filtered"),
        Lambda(
          ["res", "fill"],
          ToString(BitOr(Var("h3_nores"), Var("res"), Var("fill")))
        )
      ),
    },
    Var("parents")
  )
  // END MOCK BINDING
)

JAVASCRIPT WARNING

Javascript will immediately turn all of your integers into floating point values, so precision is lost before you send the request to Fauna. Furthermore, javascript will also turn the numbers returned from Fauna into floating point values, so you will lose precision again once you receive the result.

The request above will work with other drivers without issue, but beware of javascript numbers!!

workaround for javascript

You can pass the values to Fauna as strings and make sure Fauna also returns strings back to you.

Let(
  {
    doc: { data: { h3: "644722037633318912" } },
  },
  // START MOCK BINDING
  Let(
    {
      // Define constants
      RMASK: ToInteger("67553994410557440"),
      RCLEAR: ToInteger("9155818042444218367"),
      PARENT_CONSTS: [
        [ToInteger("63050394783186944"), ToInteger("7")],
        [ToInteger("58546795155816448"), ToInteger("63")],
        [ToInteger("54043195528445952"), ToInteger("511")],
        [ToInteger("49539595901075456"), ToInteger("4095")],
        [ToInteger("45035996273704960"), ToInteger("32767")],
        [ToInteger("40532396646334464"), ToInteger("262143")],
        [ToInteger("36028797018963968"), ToInteger("2097151")],
        [ToInteger("31525197391593472"), ToInteger("16777215")],
        [ToInteger("27021597764222976"), ToInteger("134217727")],
        [ToInteger("22517998136852480"), ToInteger("1073741823")],
        [ToInteger("18014398509481984"), ToInteger("8589934591")],
        [ToInteger("13510798882111488"), ToInteger("68719476735")],
        [ToInteger("9007199254740992"), ToInteger("549755813887")],
        [ToInteger("4503599627370496"), ToInteger("4398046511103")],
        [ToInteger("0"), ToInteger("35184372088831")],
      ],

      // Filter the table
      h3: ToInteger(Select(["data", "h3"], Var("doc"))),
      current_res: BitAnd(Var("h3"), Var("RMASK")),
      parent_consts_filtered: Filter(
        Var("PARENT_CONSTS"),
        Lambda(["res", "_"], LT(Var("res"), Var("current_res")))
      ),

      // Calculate the parent indexes
      h3_nores: BitAnd(Var("h3"), Var("RCLEAR")),
      parents: Map(
        Var("parent_consts_filtered"),
        Lambda(
          ["res", "fill"],
          ToString(BitOr(Var("h3_nores"), Var("res"), Var("fill")))
        )
      ),
    },
    Var("parents")
  )
  // END MOCK BINDING
)

If you want the extra performance by avoiding unnecessary function calls, you can use a non-JS driver to create the Index, but convert the final results to strings, so that javascript can consume it later. In this example, the H3 index is stored as a string, and the parent indexes are converted to strings as well.

Cost Analysis

When I run the javascript version in the shell, I see that it costs 3 compute ops.

When I run without the ToInteger / ToString workaround, it costs only 1 Compute Op in the best case (Index starts with low resolution), and 2 Compute Ops in the worst case (has max resolution = 15).

This makes sense since in order to set up the query we call ToInteger 30 times and convert the output up to 15 times with ToString. That’s 45 FQL verbs that we can avoid by avoiding javascript. (50 FQL verbs = 1 Compute Op charged)

For write operations, remember that using an array as an index term or value means that an Index entry is created for every item in the array.

Create the Index

Now that we have tested code, let’s go ahead and make an index

CreateIndex({
  name: "lngLatByParentH3s",
  source: {
    collection: Collection("LngLat"),
    fields: {
      parentH3s: Query(
        Lambda(
          "doc",

          Let(
            {
              RMASK: 67553994410557440,
              RCLEAR: 9155818042444218367,
              PARENT_CONSTS: [
                [63050394783186944, 7],
                [58546795155816448, 63],
                [54043195528445952, 511],
                [49539595901075456, 4095],
                [45035996273704960, 32767],
                [40532396646334464, 262143],
                [36028797018963968, 2097151],
                [31525197391593472, 16777215],
                [27021597764222976, 134217727],
                [22517998136852480, 1073741823],
                [18014398509481984, 8589934591],
                [13510798882111488, 68719476735],
                [9007199254740992, 549755813887],
                [4503599627370496, 4398046511103],
                [0, 35184372088831],
              ],

              h3: Select(["data", "h3"], Var("doc")),
              current_res: BitAnd(Var("h3"), Var("RMASK")),
              parent_consts_filtered: Filter(
                Var("PARENT_CONSTS"),
                Lambda(
                  ["res", "_"],
                  LT(Var("res"), Var("current_res"))
                )
              ),

              h3_nores: BitAnd(Var("h3"), Var("RCLEAR")),
              parents: Map(
                Var("parent_consts_filtered"),
                Lambda(
                  ["res", "fill"],
                  ToString(
                    BitOr(Var("h3_nores"), Var("res"), Var("fill"))
                  )
                )
              ),
            },
            Var("parents")
          )
        )
      ),
    },
  },
  terms: [
    {
      binding: "parentH3s",
    },
  ],
})

Test it out

create a document

image

Note the following costs:

  • 2 Compute Ops: Fauna has to compute the binding for the new document
  • 3 Write Ops: Fauna has to write index entries for each of the parent H3 indexes

And search for it! :tada::tada::tada:

image

Cost notes:

  • 2 Read Ops: 1 for paginating the index, and 1 for Get’ing the ref.
2 Likes

Paul, I don’t have much to say – you are simply amazing :laughing:

Now I think of it, I probably could have solved this with Filter in my original solution, but your method improves upon mine in many ways (bitwise operations have never been my strong suit lol). And really appreciate the insight into cost analysis!

It’s working great – can’t thank you enough! Certainly the most detailed answer I’ve ever gotten for anything :laughing:

Fauna is lucky to have you! :slightly_smiling_face:

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.