RDD.
subtractByKey
Return each (key, value) pair in self that has no pair with matching key in other.
New in version 0.9.1.
RDD
another RDD
the number of partitions in new RDD
a RDD with the pairs from this whose keys are not in other
See also
RDD.subtract()
Examples
>>> rdd1 = sc.parallelize([("a", 1), ("b", 4), ("b", 5), ("a", 2)]) >>> rdd2 = sc.parallelize([("a", 3), ("c", None)]) >>> sorted(rdd1.subtractByKey(rdd2).collect()) [('b', 4), ('b', 5)]