pyspark.pandas.Series.apply#
- Series.apply(func, args=(), **kwds)[source]#
Invoke function on values of Series.
Can be a Python function that only works on the Series.
Note
this API executes the function once to infer the type which is potentially expensive, for instance, when the dataset is created after aggregations or sorting.
To avoid this, specify return type in
func
, for instance, as below:>>> def square(x) -> np.int32: ... return x ** 2
pandas-on-Spark uses return type hint and does not try to infer the type.
- Parameters
- funcfunction
Python function to apply. Note that type hint for return type is required.
- argstuple
Positional arguments passed to func after the series value.
- **kwds
Additional keyword arguments passed to func.
- Returns
- Series
See also
Series.aggregate
Only perform aggregating type operations.
Series.transform
Only perform transforming type operations.
DataFrame.apply
The equivalent function for DataFrame.
Examples
Create a Series with typical summer temperatures for each city.
>>> s = ps.Series([20, 21, 12], ... index=['London', 'New York', 'Helsinki']) >>> s London 20 New York 21 Helsinki 12 dtype: int64
Square the values by defining a function and passing it as an argument to
apply()
.>>> def square(x) -> np.int64: ... return x ** 2 >>> s.apply(square) London 400 New York 441 Helsinki 144 dtype: int64
Define a custom function that needs additional positional arguments and pass these additional arguments using the
args
keyword>>> def subtract_custom_value(x, custom_value) -> np.int64: ... return x - custom_value
>>> s.apply(subtract_custom_value, args=(5,)) London 15 New York 16 Helsinki 7 dtype: int64
Define a custom function that takes keyword arguments and pass these arguments to
apply
>>> def add_custom_values(x, **kwargs) -> np.int64: ... for month in kwargs: ... x += kwargs[month] ... return x
>>> s.apply(add_custom_values, june=30, july=20, august=25) London 95 New York 96 Helsinki 87 dtype: int64
Use a function from the Numpy library
>>> def numpy_log(col) -> np.float64: ... return np.log(col) >>> s.apply(numpy_log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64
You can omit the type hint and let pandas-on-Spark infer its type.
>>> s.apply(np.log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64