Sometimes I want to perform a collection or database-wide data transformation or analytics operation that spans across several pages of data. Currently there’s no in-database way of performing this. It requires an external client to maintain state across pagination calls. It would be nice if Fauna could figure out some way of supporting this usecase even if it means a relaxing of the strong consistency guarantees that currently exist.
I had a similar use case.
Just curious, Did you find a solution to your use case?
Fauna is optimized for operational workloads rather than analytics. Any analytical workloads in Fauna are basically brute force compute operations, whereas an analytics datastore built to optimize such queries will perform much more efficiently.
If you have a dataset that is too big to fetch in a single transaction, and you need to transform and aggregate over it, we recommend either:
- a batch process where you download all the data at a given snapshot and then process in your desired platform/language.
- sync your Fauna data with an external analytics datastore.
Working with external data stores
You can use the FQL API to sync your Fauna data with an external data source or to export data to some format. This is how we worked to provide a Fauna source connector for Airbyte.
We are also working on a new feature, built first-class into the product for exporting data. The proposed feature would enable users to pull a complete snapshot of an entire database from Fauna, which includes both the database’s full set of records, where the data in the snapshot would be consistent to a single transaction time.
NOTE: the Airbyte connector was built with FQL v4 in mind, and requires v4 indexes a roles. We have plans to update this to FQL v10 and rely on newer features like change feeds and data export.
Pagination
Unlike FQL v4, pagination in FQL v10 is tied to the snapshot time of the first query, so when you fetch an entire Set with pagination, it is consistent with that snapshot. You can even kick off pagination of multiple sets in a single query to align the timestamp for each Set and then make separate requests to complete. Here’s an example with the latest JS driver
const client = new Client({
secret: SECRET,
max_attempts: MAX_ATTEMPTS,
max_backoff: MAX_BACKOFF,
})
const response = await client.query(fql`{
things: things.all().pageSize(50),
users: users.all().pageSize(50)
}`)
const thingsPaginator = client.paginate(response.data.things)
const usersPaginator = client.paginate(response.data.users)
const promiseForAllThings = Array.fromAsync(thingsPaginator.flatten())
const promiseForAllUsers = Array.fromAsync(usersPaginator.flatten())
const everything = await Promise.all([promiseForAllThings, promiseForAllUsers])