I’m working with a pandas DataFrame that has both float64 and string columns. When I use to_csv
to save it, big numbers show up in scientific notation. For instance, 1344154454156.992676 becomes 1.344154e+12 in the file.
I want to keep the full numbers without scientific notation. I tried using float_format
, but it didn’t work because of the string columns. Here’s a simple example:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'text': ['apple', 'banana', 'cherry'],
'numbers': np.random.rand(3) * 1e14
})
df.to_csv('output.csv')
This outputs scientific notation for the ‘numbers’ column. How can I make it show the full numbers instead? I’d like the output to look something like this:
text numbers
0 apple 94184321380806.796875
1 banana 22383735919307.046875
2 cherry 99180119890642.859375
Any ideas on how to achieve this without breaking the string columns?
I’ve dealt with this problem in my data analysis work. A straightforward solution is to use the ‘float_format’ parameter with a custom format string. Try this:
df.to_csv(‘output.csv’, float_format=‘%.6f’)
This will format all float columns to 6 decimal places without scientific notation. It won’t affect your string columns, so you don’t need to worry about those.
If you need more control, you can use a dictionary comprehension to specify formats for each column:
formats = {col: ‘%.6f’ if df[col].dtype == ‘float64’ else ‘%s’ for col in df.columns}
df.to_csv(‘output.csv’, float_format=formats)
This approach gives you flexibility to handle different column types individually. Just adjust the format strings as needed for your specific requirements.
hey, i’ve run into this too. one thing that worked for me was using pandas’ to_string() method first, then writing that to a file. something like:
with open(‘output.csv’, ‘w’) as f:
f.write(df.to_string(index=False))
this keeps the full numbers without scientific notation. just remember to set index=False if you don’t want the index column in your output.
I’ve encountered this issue before when working with large datasets containing mixed data types. One approach that worked for me is using the float_format
parameter in combination with a custom formatter function. Here’s a solution I found effective:
def format_float(x):
return f'{x:.6f}' if isinstance(x, float) else x
df.to_csv('output.csv', float_format='%.6f',
formatters={'text': format_float, 'numbers': format_float})
This method preserves the full precision of your float values without scientific notation, while also handling the string columns correctly. The format_float
function checks if the value is a float and formats it accordingly, otherwise it leaves it as is for string values.
Keep in mind that this approach might slightly increase the file size due to the full representation of large numbers. If file size is a concern, you might want to consider alternative storage formats like parquet or HDF5 for more efficient handling of mixed data types.