Non-numeric Document Ref ID

Fauna excites me because it a lot like RethinkDb. Real-time streaming, relational document model, and all without the overhead of managing any instance - a super winning combination for sure! It reminds me of Rethink and I wonder if the engineers studied Rethink.

One feature of Rethink that was brilliant was the ability to override the unique identifier, or let it be set automatically. Fauna allows you to do the same, but the identifier must be a number. The difference between a number and alpha-numeric would seem small, but it’s actually significant in a relational model.

This one feature in Rethink actually defined how we structure our entire data. The reason is that there are many cases when a document’s reference id should be easy to remember and reason about, and map in code. So alpha-numeric is key for developer sanity. And of course you want that simple identifier to be a primary key, because that’s simple. You can achieve the same effect by adding a field with a human-readable “code”, but this adds a lot of overhead and complexity to the data model.

So, I’m sure this I major ask, but would you consider implementing the feature of setting an alpha-numeric reference id for documents?

Thank-you!!!

Hi @tehcromic and welcome! :wave:

My response is two-fold. 1) you can use a custom primary key in Fauna to manage relations (with some work), but 2) you’ll want to consider if it’s worth it from a performance/cost perspective. I will still describe what it would be like to use a primary key, and then compare to what is currently more idiomatic.

Regardless of the current state of Fauna, you still have great feedback! I will move this topic to the Feature Requests category. There, you and other folks can vote on it to highlight your desire for the feature. The topic can continue to stand as a place of discussion and use cases for adding custom primary keys to Fauna.

Quick comparison

The document’s ID in Fauna is THE key for that document. When you Get a document with its ID (by calling Get(Ref(Collection("coll"), ID))) the database can go straight to your document in storage and fetch it for you. Reads in Fauna can be optimized for the fact that all IDs are 64-bit integers.

If I understand correctly, RethinkDB uses the primary key as a means to shard, index and store data. Provided the primary key, ReDB can go straight to the document in the primary index and fetch it. In this way, ReDB has the advantage of convenience, since the document itself is stored in the primary index.

How to model primary key/primary index in Fauna

You can create an Index in Fauna for whatever business-id that you need. Granted, you do have to be more explicit about it.

With Fauna, you can store a primary key value in a relationship field and resolve that relationship using your “primary index”.

define indexes

CreateIndex({
  name: "users_primary",
  source: Collection("users"),
  terms: [{ field: ["data", "email"] }] // email as PK
})
CreateIndex({
  name: "cars_primary",
  source: Collection("cars"),
  terms: [{ field: ["data", "VIN"] }] // VIN as PK
})
CreateIndex({
  name: "cars_by_owner",
  source: Collection("cars"),
  terms: [{ field: ["data", "owner"] }] // foreign key
})

create some data

Create(Collection("users"), {
  data: {
    /* ... */
    email: "me@fauna.com" // PK
  }
})
Create(Collection("cars"), {
  data: {
    /* ... */
    VIN: "WBAFB3345YLH46720",
    owner: "me@fauna.com" // store user PK
  }
})

query for the data

// get owner of a car
// costs 3 Read Ops
Let(
  {
    car: Get(Match(Index("cars_primary"), "WBAFB3345YLH46720")),  // 1 Read Op for reading the index
    owner_email: Select(["data", "owner"], Var("car"))
  },
  Get(  // 1 Read Op for `Get`ing the document
    Match(Index("users_primary"), Var("owner_email")) // 1 Read Op for reading the index
  ) 
)
// get all cars owned by user
// costs 1 + N Read Ops
Let(
  {
    user_email: "me@fauna.com",
  },
  Map(
    Paginate(Match(Index("cars_by_owner"), Var("user_email"))), // 1 min Read Op for reading the Index
    Lambda("ref", Get(Var("ref"))) // 1 Read Op for each car document to `Get`
  )
)

Note that the example requires an extra layer of indirection when getting the owner of the car, since , and it would be much more efficient to use the related document’s Ref, since that is a pointer directly to the document.

Alternative

It is indeed often helpful to reach for your data via a human-friendly value. My recommendation is then to:

  • manage your relations in Fauna using document Refs and Indexes on the Refs
  • add indexes on your business id(s) that enable user-friendly queries
  • expand your query results using the efficient relations, the code for which does not need to handle the Fauna ids, just the fields in which they are stored.

We’ve already compromised somewhat in our example, since the Indexes return document Refs. But now consider if we stored the user Ref in the car, rather than just the user email.

Create(Collection("cars"), {
  data: {
    /* ... */
    VIN: "WBAFB3345YLH46720",
    owner: Ref(Collection("users"), "101") // store user Ref
  }
})

we save on read operations by avoiding a second index when we get the owner of a car

// get owner of a car
// costs 2 Read Ops
Let(
  {
    car: Get(Match(Index("cars_primary"), "WBAFB3345YLH46720")),  // 1 Read Op for reading the index
    owner: Select(["data", "owner"], Var("car"))
  },
  Get(Var("owner")) // 1 Read Op for `Get`ing the document
)

We add a small cost to fetch the user Ref to get all cars for a given user email

// get all cars owned by user
// costs 2 + N Read Ops
Let(
  {
    user: Get(Match(Index("users_primary"), "me@fauna.com")) // 1 Read Op for `Get`ing the document
    user_ref: Select("ref", Var("user"))
  },
  Map(
    Paginate(Match(Index("cars_by_owner"), Var("user_ref"))), // 1 min Read Op for reading the Index
    Lambda("ref", Get(Var("ref"))) // 1 Read Op for each car document to `Get`
  )
)

Thanks! This is indeed a viable option, and I’m continuing to test out Fauna based on it.

The first, slightly more expensive option is better in many cases in that it allows me to see who I am referencing which in my model is important in some places - some id values are mapped to objects in the code for good reason, etc. The down-side is that having my own custom keys like this will mean more than one index per table, and might actually require many on certain tables. That comes with write costs that would not exist if I could simply reference directly as you say.

OK, I hijacked my own thread here. I’m going to start a different topic on the JS driver topic.