pyspark.
TaskContext
Contextual information about a task which can be read or mutated during execution. To access the TaskContext for a running task, use: TaskContext.get().
TaskContext.get()
New in version 2.2.0.
Examples
>>> from pyspark import TaskContext
Get a task context instance from RDD.
RDD
>>> spark.sparkContext.setLocalProperty("key1", "value") >>> taskcontext = spark.sparkContext.parallelize([1]).map(lambda _: TaskContext.get()).first() >>> isinstance(taskcontext.attemptNumber(), int) True >>> isinstance(taskcontext.partitionId(), int) True >>> isinstance(taskcontext.stageId(), int) True >>> isinstance(taskcontext.taskAttemptId(), int) True >>> taskcontext.getLocalProperty("key1") 'value' >>> isinstance(taskcontext.cpus(), int) True
Get a task context instance from a dataframe via Python UDF.
>>> from pyspark.sql import Row >>> from pyspark.sql.functions import udf >>> @udf("STRUCT<anum: INT, partid: INT, stageid: INT, taskaid: INT, prop: STRING, cpus: INT>") ... def taskcontext_as_row(): ... taskcontext = TaskContext.get() ... return Row( ... anum=taskcontext.attemptNumber(), ... partid=taskcontext.partitionId(), ... stageid=taskcontext.stageId(), ... taskaid=taskcontext.taskAttemptId(), ... prop=taskcontext.getLocalProperty("key2"), ... cpus=taskcontext.cpus()) ... >>> spark.sparkContext.setLocalProperty("key2", "value") >>> [(anum, partid, stageid, taskaid, prop, cpus)] = ( ... spark.range(1).select(taskcontext_as_row()).first() ... ) >>> isinstance(anum, int) True >>> isinstance(partid, int) True >>> isinstance(stageid, int) True >>> isinstance(taskaid, int) True >>> prop 'value' >>> isinstance(cpus, int) True
Get a task context instance from a dataframe via Pandas UDF.
>>> import pandas as pd >>> from pyspark.sql.functions import pandas_udf >>> @pandas_udf("STRUCT<" ... "anum: INT, partid: INT, stageid: INT, taskaid: INT, prop: STRING, cpus: INT>") ... def taskcontext_as_row(_): ... taskcontext = TaskContext.get() ... return pd.DataFrame({ ... "anum": [taskcontext.attemptNumber()], ... "partid": [taskcontext.partitionId()], ... "stageid": [taskcontext.stageId()], ... "taskaid": [taskcontext.taskAttemptId()], ... "prop": [taskcontext.getLocalProperty("key3")], ... "cpus": [taskcontext.cpus()] ... }) ... >>> spark.sparkContext.setLocalProperty("key3", "value") >>> [(anum, partid, stageid, taskaid, prop, cpus)] = ( ... spark.range(1).select(taskcontext_as_row("id")).first() ... ) >>> isinstance(anum, int) True >>> isinstance(partid, int) True >>> isinstance(stageid, int) True >>> isinstance(taskaid, int) True >>> prop 'value' >>> isinstance(cpus, int) True
Methods
attemptNumber()
attemptNumber
How many times this task has been attempted.
cpus()
cpus
CPUs allocated to the task.
get()
get
Return the currently active TaskContext.
getLocalProperty(key)
getLocalProperty
Get a local property set upstream in the driver, or None if it is missing.
partitionId()
partitionId
The ID of the RDD partition that is computed by this task.
resources()
resources
Resources allocated to the task.
stageId()
stageId
The ID of the stage that this task belong to.
taskAttemptId()
taskAttemptId
An ID that is unique to this task attempt (within the same SparkContext, no two task attempts will share the same attempt ID).
SparkContext