Because this method scans the full set, it returns an error if there are more than 16000 documents in the set. This method can timeout for large sets under that limit.
I don’t anticipate my collection to have more than 16k records, but I’d like to prepare for it regardless. Plus, I’d definitely not want this to error out because of a timeout. What’s the safest way to ensure all documents matching the criteria are deleted?
When you need to perform an update on an entire collection that is too big to perform in a single transaction, then you will need to break it up into multiple transactions.
Paginate and update some in each transaction
The .paginate method materializes a Set immediately in your query. Use .paginate to start an operation, then use Set.paginate(after) for subsequent queries.
initial query
// *** Step 1: Define the Set in the initial query
// assume index for efficient search. `.where` performs a full collection scan.
let page = collection_transactions
.byAccount(collection_accounts.byId("389333327894544977"))
.pageSize(50)
.paginate()
// *** Step 2: Perform the mutation
let data = page.data
data.forEach(transaction => someUpdateAction(transaction))
// - or -
// let newData = data.map(transaction => someUpdateAction(transaction))
page {
data, // can return data or something else, or no data at all
after // but make sure you return the pagination cursor!
}
subsequent queries
// *** Step 1: Define the Set in subsequent queries
let page = Set.paginate("hdWDxoJmdGhpbmdzgwAAZipudWxsKoHKhoMAAGYq..."
// *** Step 2: Perform the mutation
let data = page.data
data.forEach(transaction => someUpdateAction(transaction))
// - or -
// let newData = data.map(transaction => someUpdateAction(transaction))
page {
data, // can return data or something else, or no data at all
after // but make sure you return the pagination cursor!
}
Note that “Step 2” in both requests is the same.
Load into your client and perform the update in batches
The first method is probably the most common thing to do, but there may be scenarios where you want to load all of the information locally and then operate on it. Maybe you need to perform a complex aggregation on the data first, or analyze the data and confirm the operation before it starts.
This is helpful, thank you! Is there a way to kick off this deletion in the background instead of having to wait for the response? For example:
I start the mutation from a client-side or a serverless call
I invoke a Fauna UDF that basically handles the deletion for me - it will paginate, go over multiple iterations, all sorts of things until the process is complete
Would I have to “wait” for Fauna to complete the job, or can I exit early and allow Fauna to do its thing?
UDFs are a way to encapsulate reusable pieces of FQL. They are a part of your request and not an external process. If you need to run multiple operations consecutively, then you will need to perform that yourself with your preferred tools.