Filtering and deleting duplicate with a specific value

Heya, I am pretty new to FaunaDB so this question may seem trivial but I can’t seem to solve this issue I am having and maybe someone can help.
I am building a jobs aggregator. This involves pulling data from multiple sources.
A program pulls in jobs data from multiple sources. Jobs are sometimes advertised in multiple sources, so my fauna collection contains duplicate documents. Since not all keys of the document need to be unique I need to remove duplicate document by a particular set of keys (job title, location, company name etc. ) but ignore others (apply url). Additionally I am using pagination on the front end.
I am trying to remove the duplicates jobs by filtering the collection to only contain jobs that have a unique set of values (job title, location, company name etc.) and delete any duplicate documents.
What I have attempted
I have tried to get the distinct values from the collection (How to get distinct values from documents of a given collection?) but since this returns an array, I can’t use the “after” parameter of Paginate.

Hi Hitesh,

This is an interesting question. It might be helpful if you could share an some example documents you’re working with, and if you already have one or more indexes defined share those, as well.

That said, building the index around the terms you’re trying to filter on and using Distinct() on those terms to get only a single reference seems like your best path forward. Whether you delete the references not returned or just rely on the index to only return one of them would be up to you.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.