I’m working with a big CSV file that has info about bus routes. The stop codes are mostly long numbers with a letter at the end but some are just numbers. When I load this into pandas the big numbers turn into scientific notation. Like this:
code_o lat_o lon_o code_d
490016444HN 51.56878 0.1811568 490013271R
490013271R 51.57493 0.1781319 490009721A
490009721A 51.57708 0.1769355 490010407C
...
4.90E+09 51.57071 0.2087701 490003049E
The column type is ‘object’ but I need these to be normal numbers so I can join with other tables. I can’t change the whole column at once because it’s not all ‘int’ or ‘float’.
Any ideas how to fix this? I tried a few things but nothing worked. It’s driving me crazy. Thanks for any help!
I’ve dealt with this exact problem before in my transportation data analysis work. What worked for me was a two-step approach:
-
First, I used pd.to_numeric with ‘errors=‘coerce’’ to convert what we can:
df[‘code_o’] = pd.to_numeric(df[‘code_o’], errors=‘coerce’)
-
Then, I applied a custom function to handle the rest:
def format_code(x):
return f’{x:.0f}’ if pd.notnull(x) and x >= 1e6 else x
df[‘code_o’] = df[‘code_o’].apply(format_code)
This preserves alphanumeric codes and converts scientific notation to regular numbers. It’s not perfect, but it solved 99% of my issues without manual intervention. Just make sure to double-check your results after applying this method.
hey sophiac, pandas can be tricky w/ mixed data types. have u tried applying a custom function to the column? something like:
df[‘code_o’] = df[‘code_o’].apply(lambda x: ‘{:.0f}’.format(float(x)) if isinstance(x, float) else x)
this should keep strings as-is & convert scientific notation to reg nums. lmk if it helps!
I encountered a similar issue when working with large datasets containing mixed numeric and alphanumeric identifiers. One effective solution I found was using the ‘to_numeric’ function from pandas with the ‘errors’ parameter set to ‘coerce’. This approach converts the numeric values while preserving the alphanumeric ones:
df[‘code_o’] = pd.to_numeric(df[‘code_o’], errors=‘coerce’).fillna(df[‘code_o’])
This method handles the conversion gracefully, maintaining the original format for non-numeric entries. It’s particularly useful for datasets with inconsistent formatting across rows. Remember to import pandas as pd if you haven’t already. Hope this helps resolve your issue!