Query data field, statistics

Hi,

I am trying to generate some statistics from match data (JSON:https://api.opendota.com/api/matches/5132512783)

I have a table of these matches and I am trying to find out for example which is the most picked hero or most banned which would need to be counted from using hero_id and picks_bans:
picks_bans": [
{
“is_pick”: false,
“hero_id”: 17,
“team”: 1,
“order”: 0,
“ord”: 0,
“match_id”: 5212816885
},
{
“is_pick”: false,
“hero_id”: 69,
“team”: 0,
“order”: 1,
“ord”: 1,
“match_id”: 5212816885
},

as well as check top stats such as most kills which would need to be extracted and counted from:
“benchmarks”: {
“gold_per_min”: {
“raw”: 390,
“pct”: 0.19154557463672392
},
“xp_per_min”: {
“raw”: 468,
“pct”: 0.12681638044914134
},
“kills_per_min”: {
“raw”: 0.05689900426742532,
“pct”: 0.08982826948480846
}

these are all part of the match JSON object and I am wondering if Fauna is even a good choice for these kind of applications?

Any help is greatly appreciated.

You probably want to tell us what kind of document we are looking at.
Are these properties on some kind of ‘player’ document?

I can already say this though.

Aggregations

There are limits in place due to the distributed/scalable nature. Since Fauna is an OLTP database and not an OLAP, these kinds of aggregations are (currently) not its strength. You can write it, but you’ll bump into pagination or computation limits probably when your data size grows. More about how to do that can be found in related posts such as:

In your case, for your aggregation you would probably (I didn’t think this through so it might not be the best way) start with all hero ids. Map over that and then per hero get all heroes that were picked by using an index on is_pick and hero_id as terms and count that. That way you will get all counts within a page and could sort externally… or could opt to continue paginating if the result of 100 000 max per page is not enough (you have more players) and then sort afterwards.

Alternative approach 1 - precalculate

Instead of aggregating on the fly, you could store aggregations. For example you could:

  • Opt to store a document per hero and increase the count each time a hero is picked. This is a tradeoff since if many users would write to that document at the same time, that’s a bad idea. There are techniques to reduce the amount of writes to the same document, e.g you could have 10 documents and randomly pick one.
  • Opt to store a count of picks in the document you provided above, which if I’m correct that this is a player, means you have a picked hero count per player. You could also store that in a separate document though that links back to player and hero. This is an intermediate solution, writes are separated over multiple documents (to which you write anyway) yet you have to count less compared to counting over all picks_bans in te array.

Alternative approach 2 - stream to external system

Stream data into an engine that is good in aggregations (RockSet / Clickhouse), you could use temporality to get changesets and stream it into such a system.

I don’t think I will be running in any limits, I am already filtering the matches on league_id (matches belonging to tournament) and then further to filter them on date/time + region they were played in. A tournament will at most have 500 matches and I am only looking to generate statistics once, after tournament concluded and store the report in a separate table in Fauna to use in a web-app.

The pick_bans is a property of the match document which contains all possible information about the match.

I was looking at: data aggregation and hoping to get similar to:

picks_bans: [
    { hero_id: '17', total_picks_bans: 10,  total_picks: 4,  total_bans: 6},
     { hero_id: '54', total_picks_bans: 5,  total_picks: 2,  total_bans: 3},
     { hero_id: '2', total_picks_bans: 22,  total_picks: 13,  total_bans: 9}
  ]

but I am not sure if my way of thinking is the best where I have an array variable of picks_bans and unique_pick_bans and map and filter for pick/ban?

I would avoid arrays personally, you are going to update that document for each user that selects a hero. That means, many writes to one document, all these writes have to put in a certain order (serialized) which will slow down writes. I would at the very east make a hero_statistics collection or something and store these separately per hero instead of using an array.

When I made an index with terms is_pick and hero_id, I think due to the picks_bans array being nested in the match document I get data duplication

For example if I create an Index “picks_bans_by_match_id” that returns hero_id and query it with match_id 5212816885 and is_pick false then I would get all hero_ids wether they are is_pick true OR false.

{
  name: "picks_bans_by_match_id",
  unique: false,
  serialized: true,
  source: "Matches",
  terms: [
    {
      field: ["data", "match_id"]
    },
    {
      field: ["data", "picks_bans", "is_pick"]
    }
  ],
  values: [
    {
      field: ["data", "picks_bans", "hero_id"]
    }
  ]
}

Would you happen to have a suggestion to fix that - am I creating indexes wrong?

Alternatively I have tried to create an Index with Bindings where I create a binding of filtered picks_bans where is_pick= true and then use that array binding picks as a term:

  name: "match_id_picks",
  source: {
    collection: Collection("Matches"),
    fields: {
      picks: Query(
        Lambda(
          "matchDoc",
          Filter(Select(["data","picks_bans"], Var("matchDoc")), Lambda("pick_ban", Equals(Select("is_pick", Var("pick_ban")), true)))
        )
      )
    }
  },
  terms: [
    {
      field: ["data", "picks","hero_id"]
    }
  ],
  values: [
    { binding: "picks"}
  ]
})

Unfortunately, I am not getting any data returned for some reason?

Hey @Fyriif , since your questions are very domain specific, try to provide the full definition each time and try to make sure that we can just run the queries to get the data you have.

CreateCollection(...)
CreateIndex(...)
Create(Collection(..),  < your data> )

And test whether it gets created correctly. That will make it much faster for us to respond to your questions. The answer to this particular question is that your index is incorrect, you have written “data” “picks” “hero_id” while according to your data model it needs to be “data” “picks_bans” “hero_id”.

Try to reduce your tests to simpler cases in case you are stuck. For example, try to test first with:

CreateIndex({
  name: "match_id_picks2",
  source: {
    collection: Collection("test")
  },
  terms: [
    {
      field: ["data", "picks", "hero_id"]
    }
  ]
}

and you would have seen that this one already doesn’t work and that it’s not necessarily related to your binding. Once you fix that, separate out your binding and test it separately for a given document e.g.

Filter(
  Select(
    ["data", "picks_bans"],
    Get(Ref(Collection("test"), "291325376922124801"))
  ),
  Lambda("pick_ban", Equals(Select("is_pick", Var("pick_ban")), true))
)

Clearly your binding is correct.

[
  {
    is_pick: true,
    hero_id: 30,
    team: 0,
    order: 1,
    ord: 1,
    match_id: 5050331668
  }
]

Hi,

I think I explained my problem wrong so I do apologize:

In my collection I have for example two documents:

{
  match_id: 5132545461,
  picks_bans: [
    {
      is_pick: true,
      hero_id: 105,
      team: 0,
      order: 0,
      ord: 0,
      match_id: 5132545461
    },
    {
      is_pick: false,
      hero_id: 66,
      team: 1,
      order: 1,
      ord: 1,
      match_id: 5132545461
    },
  ],
}

and

{
  match_id: 5132545462,
  picks_bans: [
    {
      is_pick: true,
      hero_id: 115,
      team: 0,
      order: 0,
      ord: 0,
      match_id: 5132545461
    },
    {
      is_pick: true,
      hero_id: 66,
      team: 1,
      order: 1,
      ord: 1,
      match_id: 5132545461
    },
  ],
}

So I create an index that should return document that has int hero_id and bool is_pick i.e hero_id 66, is_pick true which should return 1 of the documents.

{
  name: "hero_id_is_picked",
  unique: false,
  serialized: true,
  source: "Matches",
  terms: [
    {
      field: ["data", "picks_bans", "hero_id"]
    },
    {
      field: ["data", "picks_bans", "is_pick"]
    }
  ]
}

but the index would return two of the documents, ignoring the boolean value.

One of the solutions I tried to do was to create a binding index to filter the picks_bans to create array of just picks or bans (depending on index) like so:

CreateIndex({
  name: "picks_hero_id",
  source: {
    collection: Collection("Matches"),
    fields: {
      picks: Query(
        Lambda(
          "matchDoc",
          Filter(Select(["data","picks_bans"], Var("matchDoc")), Lambda("pick_ban", Equals(Select("is_pick", Var("pick_ban")), true)))
        )
      )
    }
  },
  terms: [
    { binding: ["picks","hero_id"]}
  ],
  values: [
    { binding: "picks"}
  ]
})

But this returns no data matches.

Ultimately my goal is to filter match documents on other criteria such as league_id, start_time and cluster and then see how many of certain hero_id picks or bans there are.

Also my current solution is to do the follows:

Let(
  {
    matches: Map(
      Paginate(Match(Index("matches_by_leagueid"), 11517)),
      Lambda("ref", Get(Var("ref")))
    ),
    pbs_data: Reduce(Lambda(["accumulator", "value"], Append(Var("value"), Var("accumulator"))), Array(), Map(Var("matches"),Lambda("match", Select(["data", "picks_bans"], Var("match"))))),
    heroes_picks: Map(Filter(Select(['data', 0], Var("pbs_data")), Lambda("pick_ban", Equals(Select("is_pick", Var("pick_ban")), false))), Lambda("pick", Select("hero_id", Var("pick")))),
    heroes_bans: Map(Filter(Select(['data', 0], Var("pbs_data")), Lambda("pick_ban", Equals(Select("is_pick", Var("pick_ban")), false))), Lambda("pick", Select("hero_id", Var("pick")))),
    unique_heroes: Distinct(Union(Var("heroes_picks"),Var("heroes_bans"))),
    pbs: Map(
      Var("unique_heroes"),
      Lambda(
        "hero",
        Let(
          {
            p_heroes: Filter(Var("heroes_picks"), Lambda("heroid", Equals(Var("hero"), Var("heroid")))),
            b_heroes: Filter(Var("heroes_bans"), Lambda("heroid", Equals(Var("hero"), Var("heroid")))),
          },
          {
            hero: Var("hero"),
            total_picks: Count(Var("p_heroes")),
            total_bans: Count(Var("b_heroes")),
            popularity: Add(Count(Var("p_heroes")),Count(Var("b_heroes"))),
          }
        )
      )
    ),
  },
  {
    radiant_wins: Select(["data",0],Count(Filter(Var("wins"), Lambda("win", Equals(Var("win"), true))))),
    dire_wins: Select(["data",0],Count(Filter(Var("wins"), Lambda("win", Equals(Var("win"), false))))),
    heroes_picks_bans_popularity: Var("pbs")
  }
)

where I get all the matches for league_id and use Map function to filter and thus count hero_id occurrences in either the heroes_picked or heroes_banned arrays but of course I get quite close to the 12000 compute ops limit

That’s how indexes combine multiple values from arrays. If you want to get a clear view of that you could throw them in values instead of terms and see hwo these values are combined. That’s definitely not what you are looking for since it generates combinations for all possibilities.

image

Your other solution with

seems a bit strange to me and I can’t execute such an index in the shell. image

Isn’t what you are looking for rather:

CreateIndex({
  name: "picks_hero_id4",
  source: {
    collection: Collection("test"),
    fields: {
      hero_picked: Query(
        Lambda(
          "matchDoc",
          Map(
            Filter(
              Select(["data", "picks_bans"], Var("matchDoc")),
              Lambda(
                "pick_ban",
                Equals(Select("is_pick", Var("pick_ban")), true)
              )
            ),
            Lambda("x", Select(["hero_id"], Var("x")))
          )
        )
      )
    }
  },
  terms: [{ binding: "hero_picked" }],
  values: [{ field: "ref" }]
})

Which works (I think)

If you do need to combine values from multiple arrays.

If you need to combine values (but I don’t think you do), you can workaround this by combining the values. For example the following statement would combine them:

Map(
  Filter(
    Select(
      ["data", "picks_bans"],
      Get(Ref(Collection("test"), "291329370225639941"))
    ),
    Lambda("pick_ban", Equals(Select("is_pick", Var("pick_ban")), true))
  ),
  Lambda('doc',Concat([
    ToString(Select(['is_pick'], Var('doc'))),
    "___",
    ToString(Select(['hero_id'], Var('doc')))
  ]))
)

Which you could then throw in an index binding:

CreateIndex({
  name: "picks_hero_id",
  source: {
    collection: Collection("test"),
    fields: {
      picks: Query(
        Lambda(
          "matchDoc",
          Map(
            Filter(
              Select(
                ["data", "picks_bans"],
                Var('matchDoc')
              ),
              Lambda("pick_ban", Equals(Select("is_pick", Var("pick_ban")), true))
            ),
            Lambda('doc',Concat([
              ToString(Select(['is_pick'], Var('doc'))),
              "___",
              ToString(Select(['hero_id'], Var('doc')))
            ]))
          )
        )
      )
    }
  },
  terms: [
    { binding: "picks"}
  ]
})

However, you don’t need those true values right you have already filtered them out in the binding? For example, throwing it in ‘values’ instead of terms we now see have an index only on the true values:

image

But in case you do need to combine vlaues, the above solution could work and you could then search for true__105 in the terms to receive all references that picked the hero.

I hope I’ve been of help, I might not be able to help further though since this is really starting to go deep into a specific application domain.

Hi,

Thank you for the help, the index binding did the exact thing I was looking for.

Just one more question, the index binding, how am I able to Join for an index with league_id? So that I can count the hero picks in matches that belong to a certain league_id?

would it be via:

Count(Intersection(
  Match(Index("picks_hero_id_final"),86),
  Match(Index("matches_by_leagueid"),11280)
  ))

You will need t o provide us with much more details on the different document structures involved to answer that question.

the match file example from earlier also has the following attributes

  • “leagueid”: 11280,
  • “start_time”: 1569981543,
  • “cluster”: 236,

How would I go by only getting matches that have inputted leagueid, range of start_time and range of cluster?

I tried to do

Let(
 {
   matches:Intersection(
     Match(Index("matches_by_leagueid"),11517),
     Range(Match(Index("matches_by_start_time")),1574971461,1575232551),
     Range(Match(Index("matches_by_cluster")),131,138),
   )
 },
 {
   counted_m: Count(Var("matches")),
 }
)

but I am not getting matches even if I know they exist? Am I meant to be using Join?

my indexes are as follows:

cluster:

{
  name: "matches_by_cluster",
  unique: false,
  serialized: true,
  source: "Matches",
  terms: [
    {
      field: ["data", "cluster"]
    }
  ]
}

leagueid:

{
  name: "matches_by_leagueid",
  unique: false,
  serialized: true,
  source: "Matches",
  terms: [
    {
      field: ["data", "leagueid"]
    }
  ]
}

start_time:

{
  name: "matches_by_start_time",
  unique: false,
  serialized: true,
  source: "Matches",
  terms: [
    {
      field: ["data", "start_time"]
    }
  ]
}

That depends on your index values, this is explained in this stackoverflow topic: How to query by multiple conditions in faunadb? - Stack Overflow

Your indexes are not correct, range indexes only work on values, not on terms. Once you use values, you will notice that intersections won’t work as you worte it there, since you can only intersect on things that return the same values. Therefore, you will need join to get only the reference out of there. Not ideal, we are looking into better ways for future versions of FQL. This is explained in that stackoverflow question.

Question from me: are you building a game that we will be able to play? :slight_smile:

Hi,

Not building a game, I am making a web-app that displays stats from game called DotA2.

So I was able to get the join/intersection working by following the link, my only question with it is why doesn’t it work on non-range match?

Range(Match(Index("by_leagueid")),11280,11280),
works but
Match(Index("by_leagueid"),11280,11280),
doesn’t

Also with my previous question that you’ve answered about the index bind:

CreateIndex({
  name: "picks_hero_id_final",
  source: {
    collection: Collection("Matches"),
    fields: {
      hero_picked: Query(
        Lambda(
          "matchDoc",
          Map(
            Filter(
              Select(["data", "picks_bans"], Var("matchDoc")),
              Lambda(
                "pick_ban",
                Equals(Select("is_pick", Var("pick_ban")), true)
              )
            ),
            Lambda("x", Select(["hero_id"], Var("x")))
          )
        )
      )
    }
  },
  terms: [{ binding: "hero_picked" }],
  values: [{ field: "ref" }]
})

how would this be adapted to be intersected with:

matches: Intersection(
      Join(
        Range(Match(Index("by_leagueid")),11280,11280),
        Lambda(['value', 'ref'], Match(Index('ref_by_ref'), Var('ref'))
      )),
      Join(
        Range(Match(Index("by_cluster")),231,236),
        Lambda(['value', 'ref'], Match(Index('ref_by_ref'), Var('ref'))
      ))
    ),

I tried the following but get data [0] even though I should have had multiple values returned:

Let(
  {
    matches: Intersection(
      Join(
        Range(Match(Index("by_leagueid")),11280,11281),
        Lambda(['value', 'ref'], Match(Index('ref_by_ref'), Var('ref'))
      )),
      Join(
        Range(Match(Index("by_cluster")),231,236),
        Lambda(['value', 'ref'], Match(Index('ref_by_ref'), Var('ref'))
      ))
    ),
    match_docs: Map(
      Paginate(Var("matches")),
      Lambda("ref", Get(Var("ref")))
    ),
    
  },
  {
    hero_count:Count(Intersection(
      Join(
        Range(Match(Index("picks_hero_id_final")),86,86),
        Lambda(['value', 'ref'], Match(Index('ref_by_ref'), Var('ref'))
      )),
      Var("matches"),
  
    )),
  }
)

Match works on terms
Range works on values
I would read through these tutorials Getting started with FQL, Fauna’s native query language - part 1

There is a great tutorial on indexes in there

Sorry but I currently can’t jump deeper into the code. I tried providing you with all the necessary details but can’t further debug code else I won’t get other work done. I hope you understand.

Thank you,

No worries I realize that you’ve been more than helpful!