Creating Hierarchical Columns from Unique Values in a Pandas DataFrame

In this article, we’ll explore how to create hierarchical columns based on unique values in specific columns of a pandas DataFrame. This is particularly useful when working with data that has multiple categories or subcategories.

Problem Statement

Suppose you have a pandas DataFrame with three columns: S.No, Name1, and Name2. The Name1 and Name2 columns contain unique values, and you want to create hierarchical columns based on these unique values. This will result in a new DataFrame where the hierarchical column is created based on the unique values in Name1 and Name2.

Solution Overview

To solve this problem, we’ll use the following steps:

Create a helper DataFrame with all possible combinations of unique values from S.No, Name1, and Name2.
Merge the original DataFrame with the helper DataFrame using the merge function.
Pivot the resulting DataFrame to create hierarchical columns based on the unique values in Name1 and Name2.

Step 1: Creating the Helper DataFrame

The first step is to create a helper DataFrame that contains all possible combinations of unique values from S.No, Name1, and Name2. We can use the itertools.product function to achieve this.

from io import StringIO
from itertools import product
import pandas as pd

# Create a sample DataFrame
txt = '''S.No    Name1     Name2     Size    
1           ABC         XYZ         12    
2           BCA         XCZ         15    
3           DAB         ZXM         20'''

df = pd.read_table(StringIO(txt), sep="\s+")

# Get unique values from S.No, Name1, and Name2
unique_sno = df['S.No'].unique()
unique_name1 = df['Name1'].unique()
unique_name2 = df['Name2'].unique()

# Create a helper DataFrame with all possible combinations of unique values
fill_df = pd.DataFrame(list(product(unique_sno, unique_name1, unique_name2)), 
                                    columns=['S.No', 'Name1', 'Name2'])

Step 2: Merging the DataFrames

Next, we’ll merge the original DataFrame with the helper DataFrame using the merge function. We’ll use a right join to ensure that all rows from the helper DataFrame are included in the resulting DataFrame.

# Merge the original DataFrame with the helper DataFrame
df = df.merge(fill_df, on=['S.No', 'Name1', 'Name2'], how='right')

Step 3: Creating Hierarchical Columns

Finally, we’ll use the pivot_table function to create hierarchical columns based on the unique values in Name1 and Name2. We’ll set the index of the DataFrame to S.No, and specify Name1 and Name2 as the columns to be pivoted.

# Create a pivot table to create hierarchical columns
pvtdf = df.pivot_table(index='S.No', columns=['Name1', 'Name2'], 
                       values='Size', aggfunc='max', dropna=False)\
                       .rename_axis([None, None], axis="columns")

Example Use Case

Here’s an example of how to use this solution:

Suppose you have a DataFrame with the following data:

| S.No | Name1 | Name2 | Size |
| --- | --- | --- | --- |
| 1    | ABC   | XYZ   | 12   |
| 2    | BCA   | XCZ   | 15   |
| 3    | DAB   | ZXM   | 20   |

You can use the solution above to create a hierarchical column based on the unique values in Name1 and Name2. The resulting DataFrame would look like this:

| S.No | ABC     | BCA     | DAB     | XCZ    | XYZ    | ZXM   |
| --- | --- | --- | --- | --- | --- | --- |
| 1    | 12.0   | NaN    | NaN    | NaN   | 12.0 | NaN   |
| 2    | NaN    | 15.0  | NaN    | NaN   | NaN   | NaN   |
| 3    | NaN    | NaN    | 20.0  | NaN   | NaN   | NaN   |

As you can see, the hierarchical column is created based on the unique values in Name1 and Name2. The index of the DataFrame remains S.No, which allows for easy filtering and sorting.

Conclusion

In this article, we explored how to create hierarchical columns based on unique values in specific columns of a pandas DataFrame. We used the itertools.product function to create a helper DataFrame with all possible combinations of unique values from S.No, Name1, and Name2. We then merged the original DataFrame with the helper DataFrame using the merge function, and created hierarchical columns based on the unique values in Name1 and Name2 using the pivot_table function. This solution can be used to solve a variety of problems involving hierarchical data structures.

Last modified on 2024-12-11