JSON data from API missing empty columns when converting to DataFrame

Issue with Missing Fields in API Response

I’m working with an API that returns JSON data, but I’m having trouble when some fields are empty or missing. The problem happens when I try to convert this data into a pandas DataFrame.

What’s happening

When I get the API response, sometimes certain fields are completely missing from the JSON structure. This causes my DataFrame columns to shift around and not line up properly.

Here’s an example of what the complete data looks like:

response_data = {
    'total': 50,
    'items': [{
        'timestamp': '2023-01-01',
        'data': {
            'field_one': 'value1',
            'field_two': 'value2', 
            'field_three': 100,
            'field_four': 'sample',
            'field_five': {'nested': 'data'}
        }
    }]
}

But sometimes I get this instead:

response_data = {
    'total': 50,
    'items': [{
        'timestamp': '2023-01-01',
        'data': {
            'field_one': 'value1',
            'field_two': 'value2',
            'field_four': 'sample',
            'field_five': {'nested': 'data'}
        }
    }]
}

Notice how field_three is completely missing from the second example. When I use the first record to create my DataFrame column headers, any missing fields in later records mess up the alignment.

I’m using the field names from the first record as my column headers, but when subsequent records are missing some fields, everything gets misaligned. How can I handle these missing fields properly so my DataFrame stays consistent?

Any suggestions would be really helpful. I’m still learning about working with APIs and complex data structures.

I’ve hit this exact problem way too many times. APIs are annoyingly inconsistent.

Define your schema upfront instead of letting the first record decide everything. Here’s my approach:

# Define all possible fields you expect
expected_fields = {
    'field_one': None,
    'field_two': None, 
    'field_three': None,  # This one goes missing
    'field_four': None,
    'field_five': None
}

# Process each item and fill missing fields
processed_items = []
for item in response_data['items']:
    data_dict = item['data'].copy()
    # Add missing fields with None values
    for field in expected_fields:
        if field not in data_dict:
            data_dict[field] = None
    
    processed_items.append({
        'timestamp': item['timestamp'],
        **data_dict
    })

df = pd.DataFrame(processed_items)

Your DataFrame gets the same columns in the same order every time, no matter what the API throws at you.

Don’t know all possible fields? Scan through all records first to grab every unique field name, then use the same logic.

Defining the schema upfront saves hours of debugging weird data alignment issues.

Been dealing with flaky APIs for years - all this manual preprocessing just creates headaches.

You need a solid data pipeline that handles inconsistencies automatically. I set up flows that grab API data, normalize it, and push clean results wherever needed.

You can define fallback values, handle nested objects properly, and validate data before it hits your DataFrame. No more custom Python scripts every time an API gets moody.

I built one for our user analytics API with similar issues. Some records had 15 fields, others had 8, nested objects appeared and disappeared randomly. Set up automatic field mapping with defaults and now it runs perfectly without babysitting.

Your DataFrame stays consistent, you get proper error handling, and you can modify logic when requirements change. Plus it runs automatically instead of manual scripts.

Check out how to build reliable data workflows: https://latenode.com

Struggled with this for months until I found out pandas has a built-in function that does exactly what we need. Skip the manual JSON preprocessing and use pd.json_normalize() - it handles missing fields without breaking.

from pandas import json_normalize

# Extract just the data portion and normalize
data_records = [item['data'] for item in response_data['items']]
df = json_normalize(data_records)

json_normalize creates columns for every field it finds across all records and fills missing values with NaN. So if field_three only shows up sometimes, you still get a clean DataFrame.

Just add your timestamp column back after normalization or throw it into the process. This beats manual schema definitions by miles, especially with APIs that have tons of optional fields.

Just use pd.DataFrame(your_data).fillna('') with whatever default value you need. Pandas creates columns for all fields automatically and handles the missing ones. Way easier than doing all that preprocessing by hand.

Had the same headache with a payment API that randomly dropped fields depending on transaction types. The real problem isn’t just handling missing fields - it’s keeping data types consistent and stopping downstream errors.

I built a validation layer that catches these issues early. I map each API response against a predefined structure before creating the DataFrame, so I can spot when new fields show up or expected ones disappear.

def normalize_api_data(items, field_defaults):
    normalized = []
    for item in items:
        record = {'timestamp': item['timestamp']}
        data = item.get('data', {})
        
        for field, default_val in field_defaults.items():
            record[field] = data.get(field, default_val)
        
        normalized.append(record)
    return pd.DataFrame(normalized)

This saved me from debugging production issues where missing integer fields became strings or dates got mangled. You get consistent column ordering and proper data types every time, no matter what the API sends.