Full text search

What will full text search look like in the first release and future releases after? I’m hoping eventually to be able to generate a cross collection index and be able to map fields from each collection to fields in the cross collection search index. For example Name may be a searchable field in the cross collection index but in user it might link to username. In an article collection it might link to title. And so on. Then you give name in cross collection a weight and you can search on it or all fields. Using paradigms similar to Angolia/ElasticSearch/Azure Cognitive search etc.

We can’t say that for sure yet (especially not me since I’m not on the core team) but I think it does deserve an answer. Disclaimer: this is my personal opinion, I do not have the information you need yet. That said, your feedback has been noted, thanks! :slight_smile:

Will it be cross-collection search?

What you are describing (cross collection searching) is a very specific use-case which is rarely (or is it?) offered by databases. Searching has many forms as well:

  • Fuzzy searching on full blown long documents (e.g. searching through bodies of huge text)
  • Exact autocompletion
  • Slightly fuzzy autocompletion (example in fwitter: https://github.com/fauna-brecht/fwitter-node)
  • Fuzzy Autocompletion (can be implemented by applying Ngram on both the searched fields as the values in your index in combination with Union)

Then there is a choice to:

  • Update search indexes transactionally when data changes.
  • Update search indexes asynchronously in batches (how most search engines work, you can’t assume that you can search on something from the moment you index)

Depending on the:

  • size of your documents
  • the frequency it is accessed (and therefore the price/performance considerations)
  • the user experience you envision

You might choose something different depending on your use-case. ElasticSearch is a prime example of providing many ways to do many of these things.

You can implement most (all?) of these custom in FaunaDB at this point

Before we continue, I want to make sure that users who read this realise that you can already implement many search use-cases on top of FaunaDB.

Arguably, you have even more control over how your search will behave than in a dedicated search index. A similar reasoning to what this author explains for reasoning why you would implement a custom geosearch. A few examples:

  • You could implement prefix search in FaunaDB by generating an array of prefixes in an index binding and write a query on those prefixes.
  • You could use Ngrams to divide your document attributes in parts of words and search for these parts of words.
  • You could Ngrams to divide your document attributes in trigrams. Divide your search value in trigrams and do Union Match on all trigrams.
  • You can write indexes over multiple collections so that solves the need you have.
  • You could choose to have an index with bindings that updates immediately on write on the underlying documents (this might of course slow down writes, everything in IT is a tradeoff) or you could update a separate collection for searching with ChangeSets (perfectly feasible and actually super efficient with our temporality constructs) to get a eventual consistency search that updates in batches and does not impact your writes.

What will be offered out-of-the-box?

ElasticSearch is specialising in search, we specialise in something more level (you could build an ElasticSearch on top of FaunaDB) but we do feel the need that there should be search functionality available, especially to support fuzzy attribute matching in GraphQL without having to dive into FQL. I am unable to tell you how that would look but I assume (read, personal opinion) that will (at first at least) be limited to a specific attribute on a specific collection.

After all, it makes much more sense to give people all the building blocks to do whatever they want and build that very well.

Ready to dive in for a custom implementation?

That said, feel free to ask me how to implement a specific use-case, I’m happy to help!

3 Likes

the link is broken, this is the link that works https://github.com/fauna-brecht/fwitter

Simple text search can be implemented with ContainsStr.

There’s an example of that in this article:

Like @databrecht has mentioned, you could also use Ngram to split up a string and search the resulting array. There’s also an example of that in the article I linked.

If that is not sufficient it is also possible to use FindStrRegex to match with Regex patterns.

Those 3 options should cover the majority of use cases.

3 Likes

that’s very valuable, thank you
is ConstrainsStr a new function?

1 Like

AFAIK it has existed since 2.6

1 Like

any timeline for release of full-text-search?

This actually can help for small collections, however for a large amount of data, you need to “somehow” iterate over all the pages inside the filter function to evaluate all the documents which can leads to performance issues as we’ll ends up having a function with a linear complexity.

Is there any workaround to deal with this ? as I’m right now in a situation where I have to do search inside a collection with +1000 documents.

You can have pages with up to 100,000 items with the size setting of Paginate:

If you need more items than that you can use the Ngram solution with an index binding and an index search term. There’s an example at the end of this article in the section “Filtering by any letter”:

2 Likes

Are you guys still planning on implementing full-text search?

1 Like

Maybe it would be also interesting to develop a direct integration for a service like Meilisearch or Elasticsearch. With that we would have the requested functionality and the integration could cover directly the import/update/remove and security logic.

Meilisearch is getting so much traction right now!
It would be wonderful to have an integration as Mike suggested.

Maybe look into an integration with TypeSense also.

1 Like

@Mike @zvictor @man2xxl Regarding Regarding connecting with external systems, we are partnering with Airbyte to provide a Fauna connector. The connector is open source, and provides a way to perform full or incremental sync operations.

Airbyte has connectors for Meilisearch and TypeSense, so you might be interested in the Airbyte connector.

The Fauna Airbyte Connector is now available as a cloud connector as well. Check it out here