Index value has an array, calling the index returns same documents as the amount of items in the array

I have an index that returns the necessary values to the frontend:

CreateIndex({
    name: "all_blogs_by_status",
    source: Collection("Blog"),
    terms: [
        {
            field: ["data", "status"]
        }
    ],
    values: [
        {
            field: ["ref", "id"]
        },
        {
            field: ["data", "title"]
        },
        {
            field: ["data", "slug"]
        },
        {
            field: ["data", "image"]
        },
        {
            field: ["data", "type"]
        },
        {
            field: ["data", "answered"]
        },
        {
            field: ["data", "tags"]
        },
        {
            field: ["data", "body"]
        }
    ]
})

The documents looks like this:

{
  title: "Test 3",
  slug: "test-3",
  image: "some-image",
  type: "blog",
  answered: false,
  tags: ["test tag 1", "test tag 2", "test tag 3"],
  status: "published",
  body: "# test post 1",
}

Calling the index like this:

const { data }: any = await client.query(
    q.Map(
      q.Paginate(q.Match(q.Index("all_blogs_by_status"), "published")),
      q.Lambda("post", q.Var("post"))
    )
  );

The result is this:

{
  data: [
    [
      '321165580971278848',
      'Test 1',
      'test-1',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 1',
      '# test post 1'
    ],
    [
      '321165580971278848',
      'Test 1',
      'test-1',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 2',
      '# test post 1'
    ],
    [
      '321165580971278848',
      'Test 1',
      'test-1',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 3',
      '# test post 1'
    ],
    [
      '321165580975473152',
      'Test 2',
      'test-2',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 1',
      '# test post 1'
    ],
    [
      '321165580975473152',
      'Test 2',
      'test-2',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 2',
      '# test post 1'
    ],
    [
      '321165580975473152',
      'Test 2',
      'test-2',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 3',
      '# test post 1'
    ],
    [
      '321165580979667456',
      'Test 3',
      'test-3',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 1',
      '# test post 1'
    ],
    [
      '321165580979667456',
      'Test 3',
      'test-3',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 2',
      '# test post 1'
    ],
    [
      '321165580979667456',
      'Test 3',
      'test-3',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 3',
      '# test post 1'
    ],
    [
      '321165580981764608',
      'Test 4',
      'test-4',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 1', # post 4 tag 1
      '# test post 1'
    ],
    [
      '321165580981764608',
      'Test 4',
      'test-4',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 2', # post 4 tag 2
      '# test post 1'
    ],
    [
      '321165580981764608',
      'Test 4',
      'test-4',
      'https://images.unsplash.com/photo-1497215728101-856f4ea42174?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2940&q=80',
      'blog',
      false,
      'test tag 3', # post 4 tag 3
      '# test post 1'
    ]
  ]
}

As you can see the index returns the same documents as the amount of items in the tags array.

Couldnt find any examples on how to return the index with the tags array as an array in each index result.

How would I make this work?

Feel free to formulate the question better for the future people who needs this answered, if needed. Thanks!

Per the documentation:

When a term field is an array, one index entry per array item is created, which makes it easy to search for any one of the array’s items.

If you don’t want this behavior, don’t index the array field. Since you are calling Get on every item in the result set, you are retrieving all of the information that the index was returning anyway, which is a bit redundant.

If you index multiple fields that contain arrays, then the number of index entries per document is the cartesian product of the number of items in each array.

I think you misread the query, I am not calling Get, just getting each item stored in the index, although your suggestion worked. When deleting the tags from the returning values each unique document is returned once and not for each item in the value tags array.

Is it possible to return a value that is also an array? Like this:

[
[
      '321165580971278848',
      'Test 1',
      'test-1',
      'image',
      'blog',
      false,
      ['tag 1','tag 2','tag 3'], # array in returning values from index
      '# test post 1'
    ],
[
      '321165580975473152',
      'Test 2',
      'test-2',
      'image',
      'blog',
      false,
      ['tag 1','tag 2','tag 3'], # array in returning values from index
      '# test post 1'
    ],
]

Or do I need to delete the tags from the index values and get them separately?

Thanks!

Is it even possible to return an array in an index?
Trying this now, but not working:

CreateIndex({
    name: "all_blogs_by_status",
    source: {
        Collection("Blog"),
        fields:{
            tags: Query(
                Lambda(
                   "post",
                   Select(["data", "tags"], Var("post"))
                )
            )
        }
    },
    terms: [
        {
            field: ["data", "status"]
        }
    ],
    values: [
        {
            field: ["ref", "id"]
        },
        {
            field: ["data", "title"]
        },
        {
            field: ["data", "slug"]
        },
        {
            field: ["data", "image"]
        },
        {
            field: ["data", "type"]
        },
        {
            binding: "tags"
        },
        {
            field: ["data", "answered"]
        },
        {
            field: ["data", "body"]
        }
    ]
})

Sorry, you’re right, you’re not using Get.

If you index an array, one index entry per array item is created. Thus, an index cannot return a complete array.

If you convert the array to a JSON string, or similar encoding, that might work. But that’s probably not what you want.

Yea. I totally get that. But would be nice to use the values section to return more than just strings. If I understand correct, both the terms and the values are indexed even the arrays in a document.

Would be nice to be able to choose if a returning value could disable indexing so it can return the original content value like you can in JS arrays. E.g. having a object or array as a item in an array.

If I wanted to try the JSON string method would the returning data look like this?:

[
      '321165580971278848',
      'Test 1',
      'test-1',
      'image',
      'blog',
      false,
      # tags section
      'test tag 1',
      'test tag 2',
      'test tag 3',
      # tags section
      '# test post 1'
    ],

The method i am using now is to remove all the returning values from the index and create new objects in a JS map function on the api.

I am doing this for now since the resources used would not make that much difference. But then again I have no idea how much an index that returns required values vs returning the whole documents use of resources.

That is for the future to log the resources used with different situations.

By the way I am starting to fall in love with Fauna schema migrate and Fauna docker image. They make development a breeze. I have setup some automation scripts in vs code to tear down the db with every change to the schema and rebuild it with test data. All of the keys used are automatically inserted in the environment variables. I will publish a guide on it when i get everything up and running:)

Thanks for the awesome tools you provide!

Yes, if a document field is an array, and that field is included in an index’s terms or values definitions, one index entry per array item is created. Aside from encoding the field, the only choice available, for this situation, is whether you index an array field or not.

If you use a JSON string, the result set would include JSON strings. If you have an array that looks like this:

["apple", "banana", "cucumber"]

The string version looks like this:

'["apple","banana","cucumber"]'

To use the tags effectively, a JSON string is probably not what you want, especially since there are no built-in JSON encode/decode functions (interpreting the tags field would likely all be client-side, rather than in the database). However, encoding is the only viable option for returning an array-like value that does not create multiple index entries per document.

In terms of efficiency, building an index that covers all of a documents fields means that your per-document storage is twice what it would be compared to not having the index, and for every document written, there is at least one similarly-sized index entry to write. So your storage and write ops effectively double.

If you built your index(es) to avoid including the tags field, then you wouldn’t have one index entry per array item, so the savings would be greater (5 array items would cause 5 index entries to be created; the storage multiplies, but the writes are batched, so it depends on how many 1K writes need to be made).

An alternative approach would be to avoid indexing the tags field, and fetch its value whenever you actually need it. For example, you could Select(["data", "tags"], Get(ref)). Whether that costs you more than indexing the field would depend heavily on how often you need that value.

From a cost savings perspective, use the minimum number of indexes and index the minimum number of fields, that still supports your desired workflow.

I’m glad that you’re enjoying the tools that we provide! I look forward to your upcoming guide!

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.