pyspark.sql.functions.max#

pyspark.sql.functions.max(col)[source]#

Aggregate function: returns the maximum value of the expression in a group.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

The target column on which the maximum value is computed.

Returns
Column

A column that contains the maximum value computed.

Notes

  • Null values are ignored during the computation.

  • NaN values are larger than any other numeric value.

Examples

Example 1: Compute the maximum value of a numeric column

>>> import pyspark.sql.functions as sf
>>> df = spark.range(10)
>>> df.select(sf.max(df.id)).show()
+-------+
|max(id)|
+-------+
|      9|
+-------+

Example 2: Compute the maximum value of a string column

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([("A",), ("B",), ("C",)], ["value"])
>>> df.select(sf.max(df.value)).show()
+----------+
|max(value)|
+----------+
|         C|
+----------+

Example 3: Compute the maximum value of a column in a grouped DataFrame

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([("A", 1), ("A", 2), ("B", 3), ("B", 4)], ["key", "value"])
>>> df.groupBy("key").agg(sf.max(df.value)).show()
+---+----------+
|key|max(value)|
+---+----------+
|  A|         2|
|  B|         4|
+---+----------+

Example 4: Compute the maximum value of multiple columns in a grouped DataFrame

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame(
...     [("A", 1, 2), ("A", 2, 3), ("B", 3, 4), ("B", 4, 5)], ["key", "value1", "value2"])
>>> df.groupBy("key").agg(sf.max("value1"), sf.max("value2")).show()
+---+-----------+-----------+
|key|max(value1)|max(value2)|
+---+-----------+-----------+
|  A|          2|          3|
|  B|          4|          5|
+---+-----------+-----------+

Example 5: Compute the maximum value of a column with null values

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(1,), (2,), (None,)], ["value"])
>>> df.select(sf.max(df.value)).show()
+----------+
|max(value)|
+----------+
|         2|
+----------+

Example 6: Compute the maximum value of a column with “NaN” values

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(1.1,), (float("nan"),), (3.3,)], ["value"])
>>> df.select(sf.max(df.value)).show()
+----------+
|max(value)|
+----------+
|       NaN|
+----------+