pyspark.pandas.melt#

pyspark.pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value')[source]#

Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

Parameters

frameDataFrame
id_varstuple, list, or ndarray, optional: Column(s) to use as identifier variables.
value_varstuple, list, or ndarray, optional: Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
var_namescalar, default ‘variable’: Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.
value_namescalar, default ‘value’: Name to use for the ‘value’ column.

Returns

DataFrame: Unpivoted DataFrame.

Examples

>>> df = ps.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
...                    'B': {0: 1, 1: 3, 2: 5},
...                    'C': {0: 2, 1: 4, 2: 6}},
...                   columns=['A', 'B', 'C'])
>>> df
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6

>>> ps.melt(df)
  variable value
      A     a
      B     1
      C     2
      A     b
      B     3
      C     4
      A     c
      B     5
      C     6

>>> df.melt(id_vars='A')
   A variable  value
a        B      1
a        C      2
b        B      3
b        C      4
c        B      5
c        C      6

>>> df.melt(value_vars='A')
  variable value
0        A     a
1        A     b
2        A     c

>>> ps.melt(df, id_vars=['A', 'B'])
   A  B variable  value
0  a  1        C      2
1  b  3        C      4
2  c  5        C      6

>>> df.melt(id_vars=['A'], value_vars=['C'])
   A variable  value
0  a        C      2
1  b        C      4
2  c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

>>> ps.melt(df, id_vars=['A'], value_vars=['B'],
...         var_name='myVarname', value_name='myValname')
   A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5