Called when stopping to process one partition of new data in the executor side.
Called when stopping to process one partition of new data in the executor side. This is
guaranteed to be called either open returns true or false. However,
close won't be called in the following cases:
Throwableopen throws a Throwable.the error thrown during processing data or null if there was no error.
Called when starting to process one partition of new data in the executor.
Called when starting to process one partition of new data in the executor. See the class
docs for more information on how to use the partitionId and epochId.
the partition id.
a unique id for data deduplication.
true if the corresponding partition and version id should be processed. false
indicates the partition should be skipped.
Called to process the data in the executor side.
Called to process the data in the executor side. This method will be called only if open
returns true.
The abstract class for writing custom logic to process data generated by a query. This is often used to write the output of a streaming query to arbitrary storage systems. Any implementation of this base class will be used by Spark in the following way.
open(...)method has been called, which signifies that the task is ready to generate data.For each partition with `partitionId`: For each batch/epoch of streaming data (if its streaming query) with `epochId`: Method `open(partitionId, epochId)` is called. If `open` returns true: For each row in the partition and batch/epoch, method `process(row)` is called. Method `close(errorOrNull)` is called with error (if any) seen while processing rows.Important points to note:
partitionIdandepochIdcan be used to deduplicate generated data when failures cause reprocessing of some input data. This depends on the execution mode of the query. If the streaming query is being executed in the micro-batch mode, then every partition represented by a unique tuple (partitionId, epochId) is guaranteed to have the same data. Hence, (partitionId, epochId) can be used to deduplicate and/or transactionally commit data and achieve exactly-once guarantees. However, if the streaming query is being executed in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication.close()method will be called ifopen()method returns successfully (irrespective of the return value), except if the JVM crashes in the middle.Scala example:
Java example:
2.0.0