Excluding documents from index

Hi there!

I have a use case where I’d like to index only documents that fulfil a certain condition - all others should be excluded from the index entirely.

I suspect this is possible using bindings. I saw Find Documents when structure changes, which seems related. However, I’d prefer not to have to add an extra term to my index. Ideally, the index would just exclude the documents I want to exclude, without that affecting how to query the index.

Is there a way to do this?

I just learned that it’s possible to create indices for multiple collections. This is an appropriate solution for my use case (some indices will consider documents in either of two collections, while other indices only consider documents in one collection).

I’m still interested in an answer to my original question, though.

Hi Felix,

I’m not quite sure what you mean by “documents that fulfill a certain condition”. If you mean only index on a name attribute that starts with “B” or something like that, then yes, the answer would be to use an index binding. If you’re only interested in documents that have a name attribute at all, then you would just add name to the terms of the index.

Can you share more details about what kind of conditions you’re trying to limit your search to?

Cory

Thanks, Cory, let me elaborate a bit.

The use case is that some documents in my collection can be flagged as “obsolete”:

{
    "id": 12345,
    "foo": "bar",
    "obsolete": true // or false or absent
}

Now I would, for example, like to have an index that includes only those documents where obsolete is false or absent, and whose only term would be for the id field.

How would I declare such an index? Would I declare a binding that evaluates to null if obsolete is false or absent, and then use that binding as one of the index’s values? Would that exclude the document from the index?

Hi Felix,

Thanks for that additional context. So in this case you would need two indexes. One that matches when obsolete: false, and one that matches when obsolete is not present. Then run a Union on those two.

We’ll assume you already have an index that matches on the obsolete element, and can handle either true or false values in the match. The index binding for when obsolete is missing would be:

CreateIndex({
  name: "null_obsolete",
  source: [{
    collection: Collection("collection"),
    fields: {
      null_obsolete: Query(
        Lambda(
          "doc",
          Equals(Select(["data", "obsolete"], Var("doc"), null), null)
        )
      )
    }
  }],
  terms: [ {binding: "null_obsolete"} ],
})

And then the Union would be:

Paginate(Union(
  Match(Index("null_obsolete"),true),
  Match(Index("is_obsolete"),false)
))

In my case, the results look like:

{
  data: [
    Ref(Collection("collection"), "301569487722775048"),
    Ref(Collection("collection"), "301569507541910024")
  ]
}

Obviously your results will vary.

1 Like

@Felix This can be done with a single index. If a binding returns null that value will not be saved in the index. You can conditionally return either null or the Document Ref depending on some condition and avoid using any terms.

CreateIndex({
  name: "not_obsolete",
  source: [
    {
      collection: Collection("my_collection"),
      fields: {
        ref_if_not_obsolete: Query(
          Lambda(
            "doc",
            Let(
              {
                // assume that "not present means "not obsolete"
                is_obsolete: Select(["data", "obsolete"], Var("doc"), false)
              },
              If(
                Var("is_obsolete"),
                
                // if obsolete, set the binding to null so that it is not saved
                null, 
                
                // if not obsolete, save the Document Ref
                Select("ref", Var("doc"))
              )
            )
          )
        )
      }
    }
  ],
  
  // Provide the binding (which is the Ref ONLY if not obsolete)
  // A resulting Match will look like a default index (returns just the Refs)
  values: [{ binding: "ref_if_not_obsolete" }]
}

and call like this:

Paginate(Match("not_obsolete") )

Example

With some data like this:

and the provided index, the result look like this:

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.