Getting distinct results per element of an index result

Hi,

I’m implementing a search for a booking system. User can search for bookings (which are called offers in the index names) using multiple combinations of search filters. I’ve built the logic very close to the “Option 3” in this article.

I use separate indexes for each criterion I want to search for, like:

{
  name: "offers_by_country",
  unique: false,
  serialized: true,
  source: "offers",
  terms: [
    {
      field: ["data", "startBase", "country"],
      transform: "casefold"
    }
  ],
  values: []
}

and I have one which I use to join it by ref getting me all the values I want to display a search result.

{
  name: "offers_all_values_by_ref",
  unique: false,
  serialized: true,
  source: "offers",
  terms: [
    {
      field: ["ref"]
    }
  ],
  values: [
    {
      field: ["data", "x"]
    },
    {
      field: ["data", "y"]
    },
    {
      field: ["data", "z"]
    },
    {
      field: ["data", "uid"]
    }
  ]
}

I join each offers_by_foo index with the offers_all_values_by_ref index as in:

  q.Join(
    q.Match(q.Index("offers_by_country"), q.Casefold("italy")),
    q.Lambda(
      ["ref"],
      q.Match(q.Index("offers_all_values_by_ref"), q.Var("ref"))
    )
  )

push those different joins into an array, and I query in the end using an Intersection of the array items.

This works well.

But I get multiple results with the same uid (see the offers_all_values_by_ref's last value). This is normal as they are different offers. But for my use case, I want a single result per uid. They are already sorted by price, so I’m happy with getting the first one.

I hope I haven’t put myself into a corner using this technique but I can’t find an easy way to get what I want. I’d appreciate any help.

Thanks

@fred I am having trouble to understand your question. Do you mind sharing all the Index definition and Query you are trying to run ? offers_by_foo index details seems missing.

Are you trying to implement Get me the first offers document (sorted by price) for a given country and uid ?

Sorry @Jay-Fauna,

The reason I used the index name offers_by_foo is I have lots of indexes which have the exact same structure. I used foo as a placeholder. They are for example, offers_by_country, offers_by_name, offers_by_year, offers_by_type, etc. I use Intersection to combine them in the end.

offers_by_country is one of them.

These indexes each take 1 term, and they return ref as value.

Reading my question again, I realise I unnecessarily complicated it.

If I can get an answer to the question: Get me the list of offers documents (sorted by price) for a given country where there is only 1 offer per uid

With the following indexes and the query, I think I can manage the rest:

Index: offers_by_country

{
  name: "offers_by_country",
  unique: false,
  serialized: true,
  source: "offers",
  terms: [
    {
      field: ["data", "startBase", "country"],
      transform: "casefold"
    }
  ],
  values: []
}

Index: offers_all_values_by_ref

{
  name: "offers_all_values_by_ref",
  unique: false,
  serialized: true,
  source: "offers",
  terms: [
    {
      field: ["ref"]
    }
  ],
  values: [
    {
      field: ["data", "price"]
    },
    {
      field: ["data", "uid"]
    }
  ]
}

Note: offers_all_values_by_ref contains many more fields between price and uid. I removed them for simplification.

Query:

q.Paginate(
  q.Join(
    q.Match(q.Index("offers_by_country"), q.Casefold("croatia")),
    q.Lambda(
      ["ref"],
      q.Match(q.Index("offers_all_values_by_ref"), q.Var("ref"))
    )
  )
)

Thanks

As a further simplification, I think the last query can be even reduced to:

q.Paginate(
  q.Join(
    q.Match(q.Index("offers_by_country"), q.Casefold("croatia")),
    q.Index("offers_all_values_by_ref")
  )
)

But I’m still stuck with getting a single offer per uid
I also tried to approach with the tools @databrecht has given in this answer . But no luck, at least for me

@fred giving an another try to explain. I am assuming your offers data looks like this.

Country Name Year Price uid
US Jay 1990 1000 aaa
US Jay 1900 1500 aaa
US Jay 1900 400 aaa
France Bob 2020 1000 bbb
France Bob 2020 1500 bbb
France Bob 2000 400 ccc
US Julie 2020 800 ddd
US Julie 2020 5000 ddd
US Julie 2000 400 yyy
Aus Jay 2020 10000 zzz
Aus Jay 1990 6000 zzz

Query - Get all offers for a given Country and Name and Year group by uid, sorted by price desc and ranking = 1. So for Name = 'Jay and 'Year = 1900, results should be

Country Name Year Price uid
US Jay 1900 1500 aaa
Aus Jay 2020 10000 zzz

Few Indexes

CreateIndex({
  name: "offers_by_country",
  source: Collection("offers"),
  terms: { field: ["data", "Country"] },
  values: [
    { field: ["data", "Country"] },
    { field: ["data", "uid"] }
  ]
}
)
CreateIndex({
  name: "offers_by_name",
  source: Collection("offers"),
  terms: { field: ["data", "Name"] },
  values: [
    { field: ["data", "Country"] },
    { field: ["data", "uid"] }
  ]
}
)
CreateIndex({
  name: "offers_by_year",
  source: Collection("offers"),
  terms: { field: ["data", "Year"] },
  values: [
    { field: ["data", "Country"] },
    { field: ["data", "uid"] }
  ]
}
)
CreateIndex({
  name: "offers_by_country_uid_sortby_price_desc",
  source: Collection("offers"),
  terms: [{ field: ["data", "Country"] }, {field: ["data", "uid"]} ],
  values: [
    { field: ["data", "Price"], reverse: true },
    { field: ["data", "uid"] },
    { field: ["data", "Country"] },
    { field: ["data", "Name"] },
    { field: ["data", "Year"] }
  ]
}
)

SQL query - SELECT Country, uid FROM Offers WHERE Year = "1990" AND Name = "Jay";

Select(["data"],
Paginate(
Intersection(
Match(Index("offers_by_year"),"1990"),
Match(Index("offers_by_name"),"Jay"),
)))

[
  ["Aus", "zzz"],
  ["US", "aaa"]
]

Next step is to use Index offers_by_country_uid_sortby_price_desc to get grouping and sorting.

Map(
Select(["data"],
Paginate(
Intersection(
Match(Index("offers_by_year"),"1990"),
Match(Index("offers_by_name"),"Jay"),
), {size:100000}))
,Lambda("x",Select(["data"],Paginate(Match(Index("offers_by_country_uid_sortby_price_desc"),Var("x")))))
)

[
  [
    [10000, "zzz", "Aus", "Jay", "2020"],
    [6000, "zzz", "Aus", "Jay", "1990"]
  ],
  [
    [1500, "aaa", "US", "Jay", "1900"],
    [1000, "aaa", "US", "Jay", "1990"],
    [400, "aaa", "US", "Jay", "1900"]
  ]
]

Next, you can add {size: 1} to the Pagination to get the first Offer.

Map(
Select(["data"],
Paginate(
Intersection(
Match(Index("offers_by_year"),"1990"),
Match(Index("offers_by_name"),"Jay"),
), {size:100000}))
,Lambda("x",Union(Select(["data"],Paginate(Match(Index("offers_by_country_uid_sortby_price_desc"),Var("x")), {size: 1}))))
)

[
  [10000, "zzz", "Aus", "Jay", "2020"],
  [1500, "aaa", "US", "Jay", "1900"]
]

Hope this explains how Map, Pagination, Union, Intersection, Select are used. FQL is powerful language and there are multiple ways to achieve a solution. In this case Map/Lambda/Pagination is a better choice than Join.

Hi @Jay-Fauna

Thanks very much, I’ve quickly checked this and appreciate your detailed explanation. I’ll try it and get back to you.

Just to be sure, I’m assuming on your second and third indexes (offers_by_name , offers_by_year) values should be respectively Name and Year, instead of Country right?

The values of these indexes are terms for the Index offers_by_country_uid_sortby_price_desc. So it should be Country and uid.

Thanks @Jay-Fauna

In your first query and its result:

Select(["data"],
Paginate(
Intersection(
Match(Index("offers_by_year"),"1990"),
Match(Index("offers_by_name"),"Jay"),
)))

[
  ["Aus", "zzz"],
  ["US", "aaa"]
]

Given your data, the result would be:

[
  ["Aus", "zzz"],
  ["US", "aaa"],
  ["US", "aaa"]
]

which breaks the uniqueness and end up with the ultimate result:

[
  [6000, "zzz", "Aus", "Jay", 1900],
  [400, "aaa", "US", "Jay", 1900],
  [400, "aaa", "US", "Jay", 1900]
]

I think wrapping your last query Distinct to the last query would solve that?

Apart from that, it all makes sense.

Thanks a lot

@Fred there was a small typo in data. Last row was

Aus Jay 1900 6000 zzz
should be

Aus Jay 1990 6000 zzz

Paginate(Match(Index("offers_by_name"),"Jay"))

{
  data: [
    ["Aus", "zzz"],
    ["Aus", "zzz"],
    ["US", "aaa"],
    ["US", "aaa"],
    ["US", "aaa"]
  ]
}

Paginate(Match(Index("offers_by_year"),"1990"))

{
  data: [
    ["Aus", "zzz"],
    ["US", "aaa"]
  ]
}

Paginate(Intersection((Match(Index("offers_by_name"),"Jay")), (Match(Index("offers_by_year"),"1990"))))

{
  data: [
    ["Aus", "zzz"],
    ["US", "aaa"]
  ]
}
1 Like

Hi @Jay-Fauna,

I think there might be a problem with the solution.

If you add the following document to the dataset:

{ "Country": "US",  "Name": "Jay",  "Year": "1990", "Price": "4000",  "uid": "aaa"}

The result is no longer unique by uid

@fred Mind sharing what you see ? This is what I see.

Map(
Select(["data"],
Paginate(
Intersection(
Match(Index("offers_by_year"),"1990"),
Match(Index("offers_by_name"),"Jay"),
), {size:100000}))
,Lambda("x",Union(Select(["data"],Paginate(Match(Index("offers_by_country_uid_sortby_price_desc"),Var("x")), {size: 1}))))
)

[
  [10000, "zzz", "Aus", "Jay", "2020"],
  [1500, "aaa", "US", "Jay", "1900"]
]

@Jay-Fauna

Sure, we only need two rows to show my case.

Country Name Year Price uid
US Jay 1990 1000 aaa
US Jay 1990 2000 aaa

To make sure we are on the same page, here are the scripts I run:

Do(
  CreateIndex({
    name: "offers_by_country",
    source: Collection("offers"),
    terms: [{ field: ["data", "Country"] }],
    values: [
      { field: ["data", "Country"] },
      { field: ["data", "uid"] }
  ]}),
  CreateIndex({
    name: "offers_by_name",
    source: Collection("offers"),
    terms: [{ field: ["data", "Name"] }],
    values: [
      { field: ["data", "Country"] },
      { field: ["data", "uid"] }
  ]}),
  CreateIndex({
    name: "offers_by_year",
    source: Collection("offers"),
    terms: [{ field: ["data", "Year"] }],
    values: [
      { field: ["data", "Country"] },
      { field: ["data", "uid"] }
  ]}),
  CreateIndex({
    name: "offers_by_country_uid_sortby_price_desc",
    source: Collection("offers"),
    terms: [{ field: ["data", "Country"] }, {field: ["data", "uid"]} ],
    values: [
      { field: ["data", "Price"], reverse: true },
      { field: ["data", "uid"] },
      { field: ["data", "Country"] },
      { field: ["data", "Name"] },
      { field: ["data", "Year"] }
  ]})
)
Do(
  Create(Collection("offers"),{
    data: {"Country": "US", "Name": "Jay", "Year": "1990", "Price": 1000, "uid": "aaa"}
  }),
  Create(Collection("offers"),{
    data: {"Country": "US", "Name": "Jay", "Year": "1990", "Price": 2000, "uid": "aaa"}
  })
)

And then the query:

Map(
Select(["data"],
Paginate(
Intersection(
Match(Index("offers_by_year"),"1990"),
Match(Index("offers_by_name"),"Jay"),
), {size:100000}))
,Lambda("x",Union(Select(["data"],Paginate(Match(Index("offers_by_country_uid_sortby_price_desc"),Var("x")), {size: 1}))))
)

gives me:

[
  [2000, "aaa", "US", "Jay", "1990"],
  [2000, "aaa", "US", "Jay", "1990"]
]

Thanks