Substring search with variable lengths and positions

Hi,

I’m working on a booking system which I’ve mentioned in another topic. It relies to an external and most of the time a bit archaic API, where I get the availability of an item in this format:

{
    "item_id": 1602,
    "year": 2020,
    "as": "111111111100000000000000011111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111100000000000000000000000000000000000111111111111111111111000000011111110000000000000000000000000000000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000"
}

So:

  • Every char on the as string represents a day for the year
  • as is always 365 or 366 (leap year) chars long.
  • 0 means available, 1 means booked.

When I receive a query with a date range, it means translating those days to a start position, and length, and looking for a substring of 0s starting at that position.

For example, if the user is looking for available items between 10th Jan (index:9) and 15th Jan (length:6), the pseudocode I’d have would be similar to:

Filter(
    Paginate(Match(Index('availability'))),
    Lambda(['as', 'ref'], Equals(SubStr(Var('as'), 9, 6), "000000")
)

The catch is I have around 10.000 of those items per year and I’m planning to keep current and next year’s availability information which accounts for around 20.000 records. The above approach wouldn’t work as I’d get lots of pages with 2 or 3 items.

I’ve searched through the forum, and StackOverflow, and they were really good answers to similar questions, mostly on using Ngrams and custom binding for indexes. I believe, -although perfectly happy to be corrected, my use case is different from what has been suggested.

Am I missing anything obvious here?

The first things I wonder though is, why are you not using FaunaDB dates? You could do range queries on there via index values and Range(). You could combine that together with a ‘available’ boolean and could therefore write a Match on the availability index term to filter on that availability and then further filter using Range(), all leveraging the index.

You do not even have to use dates, you could just use the number of the ‘day in the year’ together with the availability.

I assume this data format might come from an external system. In that case you could opt to write an index binding that transforms this string in an array of objects and filter on those or transform them with a custom script or custom FQL at the moment you write the object.

I do think your case is different than the fuzzy searching posts you’ve read but the insights are similar. Each of these fuzzy searching solutions relies on breaking down the document attribute in other things (e.g. ngrams). In this case we’ll just break down the string as well into subcomponents that are index-friendly.

For example:

Your index could look like:

{
  name: "test",
  source: Collection("items"),
  terms: [
    {
      field: ["data", "as", "available"]
    }
  ],
  values: [
    {
      field: ["data", "as", "day"]
    }
  ]
}

(add values after the day field if you want other return values (such as the ref)

However, I think you will be better served by separating your days in another collection. Storing all those in an array is probably not the best idea if you want to easily update that document (except if you are planning on updating that complete array every time anyway… you know… due to some external system).

In which case your index would become:
Your index could look like:

{
  name: "test",
  source: Collection("days"),
  terms: [
    {
      field: ["data", "as", "available"]
    }
  ],
  values: [
    {
      field: ["data", "as", "day"]
    }
  ]
}

And you could add the reference of the day from the return values here, then use Map and Get on the user_reference to join this data with the original item again. This approach is way more flexible when you update items. But it will probably cost more reads.

Thank you @databrecht

You are right into assuming the as data format coming from an external system. Transforming those into days either in an array for an item, or to its own collection makes sense. I shall investigate that further.

Just a side question as you mentioned FaunaDB dates. I was trying to create another document which had date fields in it. Using the Javascript client, I had a json with a Date object but when I checked the field in the dashboard I saw that field populated as {}. I assumed I wasn’t allowed to create docs with dates as in them. Is that right?

You cannot use the javascript Date object, i.e. the result new Date(), directly. Did you use Fauna’s Date/Time functions?

Can I use those outside of the query context? I mean I pass a regular JSON object to q.Create. You mean I can replace the date field with the result of Fauna’s Date function?

I assume what Paul means is to use the iso string that comes from the javascript date and pass that into FaunaDB’s date function:

q.Date('1970-01-01')
1 Like