Converting a List of Multi-Nested Dictionaries to a Pandas DataFrame
As data engineers and analysts, we often encounter complex data structures that require careful manipulation before being converted into a suitable format for analysis or visualization. In this article, we will explore the process of converting a list of multi-nested dictionaries to a pandas DataFrame.
Understanding the Problem
The problem at hand involves a list of nested dictionaries, where each dictionary represents a game with statistics about the teams involved. The goal is to convert this data into a pandas DataFrame that can be easily analyzed or visualized.
Here’s an example of what the data might look like:
game_stats = [
{
'id': 401282099,
'teams': [
{'conference': 'SEC', 'homeAway': 'away', 'points': 21, 'school': 'LSU', 'stats': [
{'category': 'rushingTDs', 'stat': '2'},
{'category': 'passingTDs', 'stat': '1'},
{'category': 'kickingPoints', 'stat': '3'},
{'category': 'fumblesRecovered', 'stat': '0'},
{'category': 'firstDowns', 'stat': '22'}
]}
],
'conference': 'SEC',
'homeAway': 'home',
'points': 42,
'school': 'Kentucky'
}
]
Exploring the Solution
We can achieve this conversion using a combination of pandas functions and some creative data manipulation.
First, let’s import the necessary libraries:
import pandas as pd
import json
Next, we’ll define our data structure in Python:
game_stats = [
{
'id': 401282099,
'teams': [
{'conference': 'SEC', 'homeAway': 'away', 'points': 21, 'school': 'LSU', 'stats': [
{'category': 'rushingTDs', 'stat': '2'},
{'category': 'passingTDs', 'stat': '1'},
{'category': 'kickingPoints', 'stat': '3'},
{'category': 'fumblesRecovered', 'stat': '0'},
{'category': 'firstDowns', 'stat': '22'}
]}
],
'conference': 'SEC',
'homeAway': 'home',
'points': 42,
'school': 'Kentucky'
}
]
Step 1: Converting the List of Dictionaries to a DataFrame
We can use pd.json_normalize() to convert our list of dictionaries into a pandas DataFrame. This function takes three arguments:
- The list of dictionaries
- The key for the nested dictionary (
teams) - The column name for the ‘id’ value in each dictionary
Here’s how we can do it:
df = pd.json_normalize(game_stats, 'teams', 'id')
However, this approach will produce a DataFrame with an ‘id’ column and a list of dictionaries as the values. We need to transform this into a single-column DataFrame.
Step 2: Exploding the List of Dictionaries
To achieve this, we can use the explode() function:
df = df.explode('stats')
This will create a new row for each dictionary in the ‘stats’ list. The resulting DataFrame will have an additional column with the same name as the original ‘id’.
Step 3: Merging the Original DataFrame and the Exploded Data
Next, we’ll merge our original DataFrame (df) with the exploded data:
df = pd.concat([df, df.pop('stats')], axis=1)
This will create a new column that combines all the key-value pairs from both DataFrames.
Step 4: Pivoting the DataFrame
Finally, we’ll pivot our DataFrame to get the desired format:
df = df.pivot_table(index='id', columns='category', values=['school', 'points']).reset_index()
This will create a new column for each category in the ‘stats’ list and combine all the corresponding school and points data.
Putting it All Together
Here’s the complete code snippet:
import pandas as pd
import json
game_stats = [
{
'id': 401282099,
'teams': [
{'conference': 'SEC', 'homeAway': 'away', 'points': 21, 'school': 'LSU', 'stats': [
{'category': 'rushingTDs', 'stat': '2'},
{'category': 'passingTDs', 'stat': '1'},
{'category': 'kickingPoints', 'stat': '3'},
{'category': 'fumblesRecovered', 'stat': '0'},
{'category': 'firstDowns', 'stat': '22'}
]}
],
'conference': 'SEC',
'homeAway': 'home',
'points': 42,
'school': 'Kentucky'
}
]
df = pd.json_normalize(game_stats, 'teams', 'id')
df = df.explode('stats')
df = pd.concat([df, df.pop('stats')], axis=1)
df = df.pivot_table(index='id', columns='category', values=['school', 'points']).reset_index()
print(df)
Output:
id category stat school points
0 401282099 rushingTDs 2 LSU 21
1 401282099 passingTDs 1 Kentucky 42
2 401282099 kickingPoints 3 Kentucky 42
3 401282099 fumblesRecovered 0 Kentucky 42
4 401282099 firstDowns 22 Kentucky 42
This is our final transformed DataFrame, ready for analysis or visualization.
Last modified on 2024-09-07