How to get distinct values from documents of a given collection?

Let say I have a collection ‘people’ with following documents.
{
first: “Grace”,
last: “Hopper”,
degrees: [“BA”, “MA”, “PhD”],
letter: “C”
}

{
first: “Stephen”,
last: “Cook”,
degrees: [“BS”, “PhD”],
letter: “F”
}

{
first: “Leslie”,
last: “Lamport”,
degrees: [“BS”, “MA”, “PhD”]
}

{
first: “Grace”,
last: “Turing”,
degrees: [“BA”, “MA”, “MS”, “PhD”],
letter: “B”
}

How do I get distinct ‘first’ i.e. Garce,Stephen,Leslie from above collection?

1 Like

Hello, you might need to create an Index over the first field as values.

CreateIndex({
  name: 'myIndex',
  source: Collection('people'),
  values: [{ field: ['data', 'first'] }],
})

It will return all first values. Then you can use Paginate(Distinct(Match(Index('myIndex')))).
The issue is that you don’t have the ref, which can be more complicated, but using Join and an other Index or finding an other way might be possible. But I guess I don’t know what you really want for the next steps.

1 Like

I am going into same direction right now, although I need only distinct values which you specified for now, I am sure in very near future I will need ref for sure. So trying to build something which returns ref too. Thanks for the reply though…

The only thing I can think of is :

Map(
  Paginate(
    Distinct(
      Match(
        Index("myIndex"),
      )
    )
  ),
  x => Select('ref', Get(Match(Index('myIndexByFirst'), x)))
)

But I don’t find this query elegant. If someone has a better way, might be great to share !

@n44ps @gschauhan, if you need a distinct name and ref, better include the ref in the Index like this.

CreateIndex({
  name: 'myIndex',
  source: Collection('people'),
  values: [{ field: ['data', 'first'] }, { field: ['ref'] }],
})

This query can miss some entries. Consider the below sample data and Distinct(Match(Index("myIndex"))) will have only one GRACE and Select('ref', Get(Match(Index('myIndexByFirst'), x))) will only fetch one entry while you wanted two.

{
first: “Grace”,
last: “Turing”,
degrees: [“BA”, “MA”, “MS”, “PhD”],
letter: “B”
}
first: “Grace”,
last: “Hopper”,
degrees: [“BA”, “MA”, “PhD”],
letter: “C”
}

From my understanding, @gschauhan wanted to only get the first Grace by saying “Distinct”.
So the end result if I’m not wrong would be :

[{
    first: 'Grace',
    last: 'Hopper',
    degrees: ['BA', 'MA', 'PhD'],
    letter: 'C'
},
{
    first: 'Stephen',
    last: 'Cook',
    degrees: ['BS', 'PhD'],
    letter: 'F'
},
{
    first: 'Leslie',
    last: 'Lamport',
    degrees: ['BS', 'MA', 'PhD']
}]

So my previous query was written on purpose to skip other Documents with the same first value ^^

@n44ps No worries, good discussion and brainstorming. I appreciate your contribution to the Fauna Community. Up to @gschauhan to design per their need. :slight_smile:

2 Likes

This is what I was looking for. Requirements changed and we went with somewhat different implementation.

Very good point. We need distinct by given field irrespective of other fields, but this really helped and cleared some doubts.