How to prevent pandas from converting large numbers to exponential format

alexj · July 5, 2025, 7:58pm

I’m working with a CSV file that has transit route data. The dataset includes station IDs that are mostly long numbers followed by letters, but some entries are just numeric values.

When I load this data using pandas, the purely numeric station IDs get automatically converted to scientific notation, which creates problems when I try to merge with other datasets.

import pandas as pd

# Sample of what my data looks like after loading
station_data = {
    'station_id': ['782019556AB', '782013456C', '782009834B', '7.82E+08'],
    'latitude': [40.7589, 40.7614, 40.7651, 40.7523],
    'longitude': [-73.9851, -73.9776, -73.9845, -73.9712],
    'next_station': ['782013456C', '782009834B', '7.82E+08', '782005123D']
}

df = pd.DataFrame(station_data)
print(df)

The column has object dtype, but the scientific notation format makes it impossible to join with my reference tables that contain the original numeric format. I can’t apply standard numeric conversion methods since most values contain letters.

What’s the best approach to handle this mixed data type situation and prevent the scientific notation conversion?

sarahj · July 18, 2025, 9:20am

Had this exact problem with medical record IDs last year. Here’s what worked: use engine=‘python’ and dtype={‘station_id’: str} in pd.read_csv(). The default C parser still converts numbers to scientific notation even with dtype set, but the python engine actually respects the string format. Also try adding quoting=csv.QUOTE_NONNUMERIC - forces pandas to treat everything as strings first. If your CSV’s already messed up with scientific notation, this regex fixes it: df[‘station_id’].str.replace(r’(\d+.d+)E+(\d+)', lambda m: str(int(float(m.group(0)))), regex=True). Catches those E+ patterns and converts them back to normal integers.

Tom01_Wonder · July 16, 2025, 7:37pm

The problem happens during CSV parsing, not after. Use converters={‘station_id’: str} in pd.read_csv() to fix this. I’ve hit the same issue with ID columns that mix letters and numbers. This forces pandas to treat the whole column as strings from the start, so it won’t auto-convert anything to scientific notation. Way more targeted than setting dtype for everything. If you’ve already loaded the dataframe, fix the scientific notation entries with df[‘station_id’] = df[‘station_id’].apply(lambda x: format(float(x), ‘.0f’) if ‘E’ in str(x) else x). This only converts the problematic entries back to normal numbers while keeping the alphanumeric ones intact.

MarkSeeker91 · July 15, 2025, 6:24pm

try loading your csv with dtype=‘str’ in pd.read_csv(). it keeps long numbers as strings and avoids that annoying exp format. worked for me with similar issues!