pyspark.streaming.DStream.groupByKeyAndWindow¶

DStream.groupByKeyAndWindow(windowDuration: int, slideDuration: int, numPartitions: Optional[int] = None) → pyspark.streaming.dstream.DStream[Tuple[K, Iterable[V]]][source]¶

Return a new DStream by applying groupByKey over a sliding window. Similar to DStream.groupByKey(), but applies it over a sliding window.

Parameters

windowDurationint: width of the window; must be a multiple of this DStream’s batching interval
slideDurationint: sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval
numPartitionsint, optional: Number of partitions of each RDD in the new DStream.

pyspark.streaming.DStream.groupByKey

pyspark.streaming.DStream.join