Converting scientific notation to regular numbers in Pandas DataFrame

sophiac · March 31, 2025, 9:58pm

I’m working with a big CSV file that has info about bus routes. The stop codes are mostly long numbers with a letter at the end but some are just numbers. When I load this into pandas the big numbers turn into scientific notation. Like this:

code_o        lat_o     lon_o     code_d
490016444HN   51.56878  0.1811568 490013271R
490013271R    51.57493  0.1781319 490009721A
490009721A    51.57708  0.1769355 490010407C
...
4.90E+09      51.57071  0.2087701 490003049E

The column type is ‘object’ but I need these to be normal numbers so I can join with other tables. I can’t change the whole column at once because it’s not all ‘int’ or ‘float’.

Any ideas how to fix this? I tried a few things but nothing worked. It’s driving me crazy. Thanks for any help!

livbrown · April 13, 2025, 10:22pm

I’ve dealt with this exact problem before in my transportation data analysis work. What worked for me was a two-step approach:

First, I used pd.to_numeric with ‘errors=‘coerce’’ to convert what we can:
df[‘code_o’] = pd.to_numeric(df[‘code_o’], errors=‘coerce’)
Then, I applied a custom function to handle the rest:
def format_code(x):
return f’{x:.0f}’ if pd.notnull(x) and x >= 1e6 else x
df[‘code_o’] = df[‘code_o’].apply(format_code)

This preserves alphanumeric codes and converts scientific notation to regular numbers. It’s not perfect, but it solved 99% of my issues without manual intervention. Just make sure to double-check your results after applying this method.

Alex_Thunder · April 13, 2025, 3:38am

hey sophiac, pandas can be tricky w/ mixed data types. have u tried applying a custom function to the column? something like:

df[‘code_o’] = df[‘code_o’].apply(lambda x: ‘{:.0f}’.format(float(x)) if isinstance(x, float) else x)

this should keep strings as-is & convert scientific notation to reg nums. lmk if it helps!

alexlee · April 11, 2025, 1:04pm

I encountered a similar issue when working with large datasets containing mixed numeric and alphanumeric identifiers. One effective solution I found was using the ‘to_numeric’ function from pandas with the ‘errors’ parameter set to ‘coerce’. This approach converts the numeric values while preserving the alphanumeric ones:

df[‘code_o’] = pd.to_numeric(df[‘code_o’], errors=‘coerce’).fillna(df[‘code_o’])

This method handles the conversion gracefully, maintaining the original format for non-numeric entries. It’s particularly useful for datasets with inconsistent formatting across rows. Remember to import pandas as pd if you haven’t already. Hope this helps resolve your issue!