Bulk Upsert - query help needed :)

Hey there,

started today with Netlify & FaunaDb, while I’m sure it will be great once it’s running, currently I feel super overwhelmed with all the FaunaDB lingo :smiley:

I am trying to upsert data received from an API call, inside a Netlify function.

Data received in POST request:

[
      {
        key: "f90b38d9-77d7-47f1-920c-79b2b4bb2c8d",
        value: { 
            "542e2364-5911-4b7e-b5f8-e29a5461cca0": 0,
            "652e2364-5911-4b7e-b5f8-e29a5461sxa2": 0,
        },
      },
      {
        key: "f90b38d9-77d7-47f1-920c-79b2b4bb2c8d",
        value: { 
            "542e2364-5911-4b7e-b5f8-e29a5461cca0": 0,
            "652e2364-5911-4b7e-b5f8-e29a5461sxa2": 0,
        },
      }
    ]

This data is variable in length, so it can be one object, or many.

Now I’m trying to save this (With my extremely limited FaunDB knowledge),
in a collection, so that I can later on query it by looking up the key uuid.

I’ve found a post in here #488 that explains how to do it, I’ve adjusted it, but no dice.
Getting this error:

description: 'Value not found at path [data,key].',
  requestResult: RequestResult {
    method: 'POST',
    path: '',
    query: null,
    requestRaw: '{"map":{"lambda":["d"],"expr":{"if":{"exists":{"match":{"index":"landing_page_redirects_search_by_uuid"},"terms":{"select":["data","key"],"from":{"var":"d"}}}},"then":{"replace":{"select":"key","from":{"get":{"match":{"index":"landing_page_redirects_search_by_uuid"},"terms":{"select":["data","key"],"from":{"var":"d"}}}}},"params":{"var":"d"}},"else":{"create":{"collection":"landing_page_redirects"},"params":{"var":"d"}}}},"collection":[{"object":{"key":"f90b38d9-77d7-47f1-920c-79b2b4bb2c8d","value":{"object":{"542e2364-5911-4b7e-b5f8-e29a5461cca0":0}}}}]}',
    requestContent: Expr { raw: [Object] },
    responseRaw: '{"errors":[{"position":["map","expr","if","exists","terms","from"],"code":"value not found","description":"Value not found at path [data,key]."}]}',

When using this method:

const multiUpsert = (data) => {
    return q.Map(
      data,
      q.Lambda(
        ["d"],
        q.If(
          q.Exists(
            q.Match(
              q.Index(indexMap[params.collection]),
              q.Select(["data", "key"], q.Var("d"))
            )
          ),
          q.Replace(
            q.Select(
              "key",
              q.Get(
                q.Match(
                  q.Index(indexMap[params.collection]),
                  q.Select(["data", "key"], q.Var("d"))
                )
              )
            ),
            q.Var("d")
          ),
          q.Create(q.Collection(collectionMap[params.collection]), q.Var("d"))
        )
      )
    );
  };

  let res;
  try {
    res = await client.query(multiUpsert(data));
    console.log(res);

    return {
      statusCode: 200,
      body: JSON.stringify("ok"),
    };
  } catch (err) {
    console.error(err);

    return {
      statusCode: 500,
      body: JSON.stringify("Could not perform this action"),
    };
  }

And this is my schema:

client
.query(
      CreateCollection({ name: "landing_page_redirects", history_days: 0 })
    )
    .then(() => {
      client.query(
        CreateCollection({ name: "offer_redirects", history_days: 0 })
      );
    })
    .then(() => {
      return client.query(
        CreateIndex({
          name: "landing_page_redirects_search_by_uuid",
          permissions: { read: "public" },
          source: Collection("landing_page_redirects"),
          terms: [{ field: ["data", "key"] }],
          values: [{ field: ["data", "key"] }, { field: ["data", "value"] }],
          unique: true,
        })
      );
    })
    .then(() => {
      return client.query(
        CreateIndex({
          name: "offer_redirects_search_by_uuid",
          permissions: { read: "public" },
          source: Collection("offer_redirects"),
          terms: [{ field: ["data", "key"] }],
          values: [{ field: ["data", "key"] }, { field: ["data", "value"] }],
          unique: true,
        })
      );
    })

Where am I going wrong here? (I know, probably everywhere haha)

All Documents in Fauna have several top-level fields. Regular user Documents have this shape:

{
  ref: Ref(Collection(COLLECTION_NAME), ID),
  ts: 163460972345697,
  data: {
    /* user data */
  }
}

When you do this:

it tries to set top level fields key and value but there are none. Wrap your data in a data field.

q.Create(q.Collection(collectionMap[params.collection]), {
  data: q.Var("d")
})

As a side note, I don’t think your Indexes will do what you want. Objects are not indexed, so setting your Index values to { field: ["data", "value"] } will cause the values to resolve to null. For example, I expect your results will look something like this:

Paginate(Match(
  Index("landing_page_redirects_search_by_uuid"),
  "f90b38d9-77d7-47f1-920c-79b2b4bb2c8d"
))

// expected results
{
  data: [ 
    ["f90b38d9-77d7-47f1-920c-79b2b4bb2c8d", null],
    ["f90b38d9-77d7-47f1-920c-79b2b4bb2c8d", null]
  ]
}

Hey @ptpaterson,

thanks a lot for your pm as well as helping me out here :slight_smile:

I have re-written the code slightly (atm it’s honestly 99% trial & error haha)
And it works now for the create portion, but as soon as a document exists already,
I get an error.

const multiUpsert = (data) => {
    return q.Map(
      data,
      q.Lambda(
        ["d"],
        q.If(
          q.Exists(
            q.Match(
              q.Index(indexMap[params.collection]),
              q.Select(["data", "key"], {
                data: q.Var("d"),
              })
            )
          ),
          q.Replace(
            q.Select(
              ["data", "key"],
              q.Get(
                q.Match(
                  q.Index(indexMap[params.collection]),
                  q.Select(["data", "key"], {
                    data: q.Var("d"),
                  })
                )
              )
            ),
            q.Var("d")
          ),
          q.Create(q.Collection(collectionMap[params.collection]), {
            data: q.Var("d"),
          })
        )
      )
    );
  };

  let res;
  try {
    res = await client.query(multiUpsert(data));
    console.log(res);

    return {
      statusCode: 200,
      body: JSON.stringify("ok"),
    };
  } catch (err) {
    console.error(err);

    return {
      statusCode: 500,
      body: JSON.stringify("Could not perform this action"),
    };
  }

error:

 description: 'Ref expected, String provided.',
  requestResult: RequestResult {
    method: 'POST',
    path: '',
    query: null,
    requestRaw: '{"map":{"lambda":["d"],"expr":{"if":{"exists":{"match":{"index":"landing_page_redirects_search_by_uuid"},"terms":{"select":["data","key"],"from":{"object":{"data":{"var":"d"}}}}}},"then":{"replace":{"select":["data","key"],"from":{"get":{"match":{"index":"landing_page_redirects_search_by_uuid"},"terms":{"select":["data","key"],"from":{"object":{"data":{"var":"d"}}}}}}},"params":{"var":"d"}},"else":{"create":{"collection":"landing_page_redirects"},"params":{"object":{"data":{"var":"d"}}}}}},"collection":[{"object":{"key":"f90b38d9-77d7-47f1-920c-79b2b4bb2c8d","value":{"object":{"542e2364-5911-4b7e-b5f8-e29a5461cca0":0}}}}]}',
    requestContent: Expr { raw: [Object] },
    responseRaw: '{"errors":[{"position":["map","expr","then","replace"],"code":"invalid argument","description":"Ref expected, String provided."}]}',

Also, regarding the indexes, apologies if I don’t quite follow yet.
So, I just read the best practices and it said to create an index if the ref isn’t known beforehand, so I wanted to just have an index that I can use to lookup "some-uuid" which then returns the data: {} object we are trying to save here first!

Is this not what I am doing there in my schema setup script?

Sorry if that is a dumb question haha, I have not really grasped how Fauna’s data structures work.

hmmm… You are already guarding for if any matches exist here:

I think we can be confident that the Match is not getting the right data. Can you check that the Index is working as you expected? What are the results if you Match on an index with a key that you know should be there?

q.Paginate(
  q.Match(
      q.Index("landing_page_redirects_search_by_uuid"),
      "f90b38d9-77d7-47f1-920c-79b2b4bb2c8d"
  )
)

Other Notes

Using Select

You don’t have to wrap other instances of Var("d") in a data field if you just need to read something. The reason we needed the data field when we do Create is to get the shape of the Document right. You can change

// from
Select(["data", "key"],  { data: q.Var("d") })

// to 
Select(["key"],  q.Var("d"))

Replace/Update need a Ref

By the way, I think you need to Select("ref" here:

should probably be:

          q.Replace(
            q.Select(
              ["ref"],

Replace and Update won’t know what to do with your key

Permissions

It looks like you are using the old permission system. The permissions field is deprecated.

You should create a Role and provide it the required ABAC privileges to access the Index.

@ptpaterson thanks a lot for these explanations!

Helped a lot, I got it to work with your changes, and I also needed to replace q.Var("d") inside the Replace method, with a data object:

const multiUpsert = (data) => {
    return q.Map(
      data,
      q.Lambda(
        ["d"],
        q.If(
          q.Exists(
            q.Match(
              q.Index(indexMap[params.collection]),
              q.Select(["key"], q.Var("d"))
            )
          ),
          q.Replace(
            q.Select(
              ["ref"],
              q.Get(
                q.Match(
                  q.Index(indexMap[params.collection]),
                  q.Select(["key"], q.Var("d"))
                )
              )
            ),
            {
              data: q.Var("d"),
            }
          ),
          q.Create(q.Collection(collectionMap[params.collection]), {
            data: q.Var("d"),
          })
        )
      )
    );
  };

I also removed the additional fields in my schema, both indexes are now created like this:

CreateIndex({
          name: "landing_page_redirects_search_by_uuid",
          source: Collection("landing_page_redirects"),
          terms: [{ field: ["data", "key"] }],
          unique: true,
        })
      );

The upsert is now working, it creates or replaces.

Which brings me to the question, what is the difference between Replace & Update?
Only that Update makes a partial replacement & merge?

And finally, for my understanding:

As far as I understand now, FaunDB is a document database that creates new document versions for every change.

The way we are using it, as you’ve probably already gathered from the naming of our collections & indexes, is inside of redirects.

Basically a Netlify function, that queries Fauna, picks the next link with the lowest number of visits (the int inside my data object), and then redirects to this url.

For this to work at scale, performance is of course mandatory.

The way I’ve planned this to work in the most performant manner, is:

  • Set Collection history_days to 0
  • Set Collection ttl_days to null (We need every document to exist indefinitely, or redirects break)

This, is all we can do to optimize performance, as we need to update this in realtime while a redirect is happening. Therefore we cannot use techniques as described here: Scaling write throughput | Fauna Documentation

Are my assumptions here true or false? :smiley:

Thanks again!

Replace and Update

You got it! :slight_smile: Replace completely overwrites the data field. Update will merge the existing data field with what you provide.

Scaling your queries

Regarding performance, this pattern may run into 2 problems at scale

  1. simultaneous Read + Write operations on the same Index/Document should be expected to encounter contention if performed about 5 or more times per second. I’ve walked through an example contention scenario before here in the forums.
  2. Even though you set history_days to 0, cleanup of history is performed as a background task. This means you’ll always have some buildup of history that will reach a kind of equilibrium with garbage collection. Assuming an extreme case that avoids contention but still has a lot of updates, you might see increased query time and cost for reading Index history.

Even if you use the Event Sourcing pattern and create periodic snapshots, you can still read all the latest data in real-time. Basically, read the snapshot and all events since the snapshot and calculate the latest value on the fly.

You of course need serialized guarantees for the new_value = current_value + 1 pattern to work. If you could work out an algorithm that can handle eventual consistency, then you can avoid contention by making your indexes nonserialized. Note that this also, by definition, cannot enforce uniqueness.

At this point, this topic is getting pretty off-topic from the original post. If you have additional questions, I encourage you to create a new topic to drill into scaling your query.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.