How to prevent scientific notation when loading CSV data in pandas

lucask · July 18, 2025, 2:07pm

I’m working with a CSV file that has transit route data. The file contains stop identifiers that are usually long numbers followed by letters, but sometimes they’re just numbers without letters.

When I load this data using pandas, the numeric-only codes get converted to scientific notation automatically. Here’s what my dataframe looks like:

import pandas as pd

# Sample of what I'm seeing
df = pd.DataFrame({
    'start_id': ['780012345AB', '780012346CD', '780012347EF', '7.80E+09'],
    'start_lat': [52.1234, 52.1567, 52.1890, 52.2123],
    'start_lon': [-1.2345, -1.2678, -1.2901, -1.3234],
    'end_id': ['780012346CD', '780012347EF', '780012348GH', '780012349IJ']
})

print(df)

The column has object dtype, which means I can’t use typical numeric operations on it. I need these identifiers to stay as their original numbers so I can join them with other datasets that have the same stop codes.

Is there a way to read the CSV file while keeping these large numbers in their original format instead of converting them to scientific notation?

SilentSailing34 · July 28, 2025, 5:07am

Just specify the data type when reading your CSV. Use the dtype parameter in pd.read_csv() and set those columns to string:

df = pd.read_csv('your_file.csv', dtype={'start_id': str, 'end_id': str})

I’ve hit this exact problem with large ID numbers in transit data. Here’s the thing - these IDs should be strings anyway since they’re labels, not numbers you’d actually calculate with. Even if they’re all numeric, you’re treating them as categories.

This also handles mixed alphanumeric codes and keeps everything consistent. When joining with other datasets, just make sure those also use the same dtype for ID columns.

Emma_Fluffy · July 25, 2025, 11:59pm

You can also use the converters parameter for more control over how specific columns get parsed. Works great when you’ve got mixed data types in the same column:

df = pd.read_csv('your_file.csv', converters={'start_id': str, 'end_id': str})

I ran into the same thing with product codes - leading zeros kept getting stripped. The converters method processes each value individually before pandas tries to guess the data type, so it completely skips the scientific notation conversion.

Another option is low_memory=False with dtype specification. Pretty useful for larger files where pandas might make different type guesses for different chunks of the same column.

ethant · July 25, 2025, 2:40pm

Yeah, this is super annoying. Try keep_default_na=False with dtype=str when you’re reading the file - pandas gets overly helpful sometimes and converts stuff you don’t want converted. Also check your original CSV for whitespace that might be messing with the conversion.

lucask · July 29, 2025, 2:40pm

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.