pandas: Get unique values and their counts in a column

This article explains how to get unique values and their counts in a column (= Series) of a DataFrame in pandas.

Use the unique(), value_counts(), and nunique() methods on Series. nunique() is also available as a method on DataFrame.

pandas.Series.unique() returns unique values as a NumPy array (ndarray)
pandas.Series.value_counts() returns unique values and their counts as a Series
pandas.Series.nunique() and pandas.DataFrame.nunique() return the number of unique values as either an int or a Series

This article begins by explaining the basic usage of each method, then shows how to get unique values and their counts, and more.

Contents

pandas.Series.unique()
pandas.Series.value_counts()
pandas.Series.nunique(), pandas.DataFrame.nunique()
Get the number of unique values
Get the list of unique values
Get the counts of each unique value
Get the dictionary of unique values and their counts
Get the mode (most frequent value) and its frequency
- value_counts()
- mode()
- describe()
Get the normalized frequencies

To count values that meet certain conditions, refer to the following article.

pandas: Count DataFrame/Series elements matching conditions

The describe() method is useful to compute summary statistics including the mode and its frequency.

pandas: Get summary statistics for each column with describe()

The pandas version used in this article is as follows. Note that functionality may vary between versions. The following data is used for the examples. Missing values (NaN) are inserted for explanation purposes.

sample_pandas_normal.csv

import pandas as pdprint(pd.__version__)# 2.1.4df = pd.read_csv('data/src/sample_pandas_normal.csv')df.iloc[1] = float('nan')print(df)# name age state point# 0 Alice 24.0 NY 64.0# 1 NaN NaN NaN NaN# 2 Charlie 18.0 CA 70.0# 3 Dave 68.0 TX 70.0# 4 Ellen 24.0 CA 88.0# 5 Frank 30.0 NY 57.0

source: pandas_value_counts.py

`pandas.Series.unique()`

unique() returns unique values as a one-dimensional NumPy array (ndarray). Missing values (NaN) are included. The values are arranged in the order of appearance.

pandas.Series.unique — pandas 2.1.4 documentation

print(df['state'].unique())# ['NY' nan 'CA' 'TX']print(type(df['state'].unique()))# <class 'numpy.ndarray'>

source: pandas_value_counts.py

`pandas.Series.value_counts()`

value_counts() returns a Series where the unique values are the index (labels) and their counts are the values.

pandas.Series.value_counts — pandas 2.1.4 documentation

print(df['state'].value_counts())# state# NY 2# CA 2# TX 1# Name: count, dtype: int64print(type(df['state'].value_counts()))# <class 'pandas.core.series.Series'>

source: pandas_value_counts.py

By default, missing values (NaN) are excluded, but if the dropna argument is set to False, they are also counted.

print(df['state'].value_counts(dropna=False))# state# NY 2# CA 2# NaN 1# TX 1# Name: count, dtype: int64

source: pandas_value_counts.py

By default, the values are sorted in descending order of frequency. If the ascending argument is set to True, they are sorted in ascending order. Alternatively, setting sort to False leaves them unsorted, arranged in their original order of appearance.

print(df['state'].value_counts(dropna=False, ascending=True))# state# NaN 1# TX 1# NY 2# CA 2# Name: count, dtype: int64print(df['state'].value_counts(dropna=False, sort=False))# state# NY 2# NaN 1# CA 2# TX 1# Name: count, dtype: int64

source: pandas_value_counts.py

`pandas.Series.nunique()`, `pandas.DataFrame.nunique()`

nunique() on Series returns the number of unique values as an integer (int).

pandas.Series.nunique — pandas 2.1.4 documentation

By default, missing values (NaN) are excluded; however, setting the dropna argument to False includes them in the count.

print(df['state'].nunique())# 3print(type(df['state'].nunique()))# <class 'int'>print(df['state'].nunique(dropna=False))# 4

source: pandas_value_counts.py

nunique() on DataFrame returns the number of unique values for each column as a Series.

pandas.DataFrame.nunique — pandas 2.1.4 documentation

Similar to Series, the nunique() method on DataFrame also has the dropna argument. Additionally, while the default counting is column-wise, changing the axis argument to 1 or 'columns' switches the count to row-wise.

print(df.nunique())# name 5# age 4# state 3# point 4# dtype: int64print(type(df.nunique()))# <class 'pandas.core.series.Series'>print(df.nunique(dropna=False))# name 6# age 5# state 4# point 5# dtype: int64print(df.nunique(dropna=False, axis='columns'))# 0 4# 1 1# 2 4# 3 4# 4 4# 5 4# dtype: int64

source: pandas_value_counts.py

Get the number of unique values

The number of unique values can be counted using nunique() on Series and DataFrame.

print(df['state'].nunique())# 3print(df.nunique())# name 5# age 4# state 3# point 4# dtype: int64

source: pandas_value_counts.py

Get the list of unique values

unique() returns unique values as a NumPy array (ndarray). ndarray can be converted to a Python built-in list (list) using the tolist() method.

Convert numpy.ndarray and list to each other

print(df['state'].unique().tolist())# ['NY', nan, 'CA', 'TX']print(type(df['state'].unique().tolist()))# <class 'list'>

source: pandas_value_counts.py

You can call tolist() on the index attribute of the Series returned by value_counts(), or use the values attribute to obtain the data as a NumPy array (ndarray).

print(df['state'].value_counts().index.tolist())# ['NY', 'CA', 'TX']print(type(df['state'].value_counts().index.tolist()))# <class 'list'>print(df['state'].value_counts().index.values)# ['NY' 'CA' 'TX']print(type(df['state'].value_counts().index.values))# <class 'numpy.ndarray'>

source: pandas_value_counts.py

unique() always counts NaN as a unique value, but value_counts() allows you to specify whether to count NaN with the dropna argument.

print(df['state'].value_counts(dropna=False).index.tolist())# ['NY', 'CA', nan, 'TX']

source: pandas_value_counts.py

Get the counts of each unique value

To get the counts (frequency, number of occurrences) of each unique value, access the values of the Series returned by value_counts().

vc = df['state'].value_counts()print(vc)# state# NY 2# CA 2# TX 1# Name: count, dtype: int64print(vc['NY'])# 2print(vc['TX'])# 1

source: pandas_value_counts.py

To extract the unique value and its count in a for loop, use the items() method.

pandas.Series.items — pandas 2.1.4 documentation

for index, value in df['state'].value_counts().items(): print(index, value)# NY 2# CA 2# TX 1

source: pandas_value_counts.py

It was named iteritems(), but it has been changed to items(). iteritems() was removed in pandas version 2.0.

Get the dictionary of unique values and their counts

You can call the to_dict() method on the Series returned by value_counts() to convert it into a dictionary (dict).

d = df['state'].value_counts().to_dict()print(d)# {'NY': 2, 'CA': 2, 'TX': 1}print(type(d))# <class 'dict'>print(d['NY'])# 2print(d['TX'])# 1

source: pandas_value_counts.py

To extract the unique value and its count in a for loop, use the items() method.

Iterate through dictionary keys and values in Python

for key, value in d.items(): print(key, value)# NY 2# CA 2# TX 1

source: pandas_value_counts.py

Get the mode (most frequent value) and its frequency

`value_counts()`

By default, value_counts() returns a Series sorted in order of frequency, so the first element represents the mode (most frequent value) and its frequency.

print(df['state'].value_counts())# state# NY 2# CA 2# TX 1# Name: count, dtype: int64print(df['state'].value_counts().index[0])# NYprint(df['state'].value_counts().iat[0])# 2

source: pandas_value_counts.py

The original Series values are used as the index of the resulting Series. If this index is numeric, accessing it directly with [number] can lead to errors. Instead, use iat[number] for accurate access.

pandas: Select rows/columns by index (numbers and names)

# print(df['age'].value_counts()[0])# KeyError: 0print(df['age'].value_counts().iat[0])# 2

source: pandas_value_counts.py

You can apply it to each column of a DataFrame using the apply() method.

pandas: Apply functions to values, rows, columns with map(), apply()
Lambda expressions in Python

print(df.apply(lambda x: x.value_counts().index[0]))# name Alice# age 24.0# state NY# point 70.0# dtype: objectprint(df.apply(lambda x: x.value_counts().iat[0]))# name 1# age 2# state 2# point 2# dtype: int64

source: pandas_value_counts.py

As mentioned above, by default, missing values (NaN) are excluded. If the dropna argument is set to False, they are also counted.

Be aware that if there are multiple modes, you can get only one mode using this method.

`mode()`

The mode() method on Series returns the modes as a Series. Converting this Series to a list with tolist() allows you to obtain the modes as a list. Even if there is only one mode, it will be a list.

print(df['state'].mode())# 0 CA# 1 NY# Name: state, dtype: objectprint(df['state'].mode().tolist())# ['CA', 'NY']print(df['age'].mode().tolist())# [24.0]

source: pandas_value_counts.py

Applying it with apply() to each column results in a Series with lists of modes as values.

s_list = df.apply(lambda x: x.mode().tolist())print(s_list)# name [Alice, Charlie, Dave, Ellen, Frank]# age [24.0]# state [CA, NY]# point [70.0]# dtype: objectprint(type(s_list))# <class 'pandas.core.series.Series'>print(s_list['name'])# ['Alice', 'Charlie', 'Dave', 'Ellen', 'Frank']print(type(s_list['name']))# <class 'list'>

source: pandas_value_counts.py

mode() is also available as a method of DataFrame. It returns a DataFrame. If the number of modes differs for each column, the empty parts are filled with missing values (NaN).

print(df.mode())# name age state point# 0 Alice 24.0 CA 70.0# 1 Charlie NaN NY NaN# 2 Dave NaN NaN NaN# 3 Ellen NaN NaN NaN# 4 Frank NaN NaN NaN

source: pandas_value_counts.py

By default, missing values (NaN) are excluded. If the dropna argument is set to False, they are also counted. For more details about mode(), refer to the following article.

pandas: Get the mode (the most frequent value) with mode()

`describe()`

The describe() method can calculate the number of unique values, the mode, and its frequency for each column together. top represents the mode, and freq represents its frequency. Each item can be obtained with loc[].

print(df.astype('object').describe())# name age state point# count 5 5.0 5 5.0# unique 5 4.0 3 4.0# top Alice 24.0 NY 70.0# freq 1 2.0 2 2.0print(df.astype('object').describe().loc['top'])# name Alice# age 24.0# state NY# point 70.0# Name: top, dtype: object

source: pandas_value_counts.py

In describe(), the listed items depend on the data type (dtype) of the column, so astype() is used for type conversion.

describe() excludes missing values (NaN), and unlike other methods, it does not have a dropna argument. Note that even if there are several modes, this method returns only one.

Get the normalized frequencies

When the normalize argument of value_counts() is set to True, the returned values are normalized so that their total is 1. Be aware that the values may differ based on the dropna setting if missing values NaN are included.

print(df['state'].value_counts(normalize=True))# state# NY 0.4# CA 0.4# TX 0.2# Name: proportion, dtype: float64print(df['state'].value_counts(dropna=False, normalize=True))# state# NY 0.333333# CA 0.333333# NaN 0.166667# TX 0.166667# Name: proportion, dtype: float64

source: pandas_value_counts.py

pandas: Get unique values and their counts in a column | note.nkmk.me (2024)

pandas.Series.unique()

pandas.Series.value_counts()

pandas.Series.nunique(), pandas.DataFrame.nunique()

Get the number of unique values

Get the list of unique values

Get the counts of each unique value

Get the dictionary of unique values and their counts

Get the mode (most frequent value) and its frequency

value_counts()

mode()

describe()

Get the normalized frequencies

`pandas.Series.unique()`

`pandas.Series.value_counts()`

`pandas.Series.nunique()`, `pandas.DataFrame.nunique()`

`value_counts()`

`mode()`

`describe()`