Issue with Python interpreter cleanup in embedded C++ application
I have a C++ application that embeds Python using the CPython API. My workflow involves calling Py_Initialize() to start the interpreter, running some Python scripts, and then calling Py_FinalizeEx() to clean up before the next cycle.
Everything works perfectly when I run regular Python scripts multiple times. However, I encounter a problem when my Python script imports the tensorflow library. After the first successful execution, subsequent runs fail with this error:
Traceback (most recent call last):
File "C:\Projects\MyApp\scripts\ml_processor.py", line 2, in <module>
import tensorflow as tf
File "C:\Python39\lib\site-packages\tensorflow\__init__.py", line 12, in <module>
from . import core
File "C:\Python39\lib\site-packages\tensorflow\core\__init__.py", line 8, in <module>
from .engine import Engine
File "C:\Python39\lib\site-packages\tensorflow\core\engine.py", line 19, in <module>
from typing_extensions import Protocol
File "C:\Python39\lib\site-packages\typing_extensions\__init__.py", line 15, in <module>
from ._internal import (
ImportError: PyO3 modules compiled for CPython 3.8 or older may only be initialized once per interpreter process
It seems like Py_FinalizeEx() is not completely resetting the interpreter state when PyO3-based modules are involved. The error suggests that some internal components remain in memory even after finalization.
What’s the proper way to handle this cleanup issue?
I hit this exact problem building a document processing system that ran TensorFlow models on and off. PyO3’s initialization just doesn’t play nice with the initialize-finalize cycle - they’re fundamentally incompatible. Tried a bunch of cleanup strategies but nothing worked. Eventually I restructured the whole thing to keep the Python interpreter running for the app’s entire lifetime. Instead of reinitializing, I built a module reload system that clears specific entries from sys.modules and only reimports what’s needed between operations. This killed the PyO3 reinitialization error and kept memory usage reasonable. The trick was adding proper garbage collection calls and explicitly deleting large objects after each operation - otherwise you get memory bloat during long sessions.
had a similar issue with another lib too. PyO3 seems to hold on to some state so reinit can be a hassle. best to just keep the interpreter alive instead of starting/stopping it. it’s a bit messy but gets the job done.
This is a core PyO3 memory management issue. PyO3 modules create static global state during initialization that sticks around after the Python interpreter shuts down. When Py_FinalizeEx runs, it cleans up Python objects but can’t touch the Rust-side globals PyO3 maintains. I hit this exact problem in production when dynamically loading different ML models. Our workaround was keeping one long-running Python interpreter for the app’s entire lifetime instead of cycling it. We built a reset mechanism that clears sys.modules entries and reloads scripts when needed - gives you similar functionality without the restart overhead. You could also isolate TensorFlow ops in separate worker processes and use IPC, but that adds complexity and latency.
Yeah, this is a known issue with PyO3 extensions and embedded Python interpreters. PyO3 holds onto global state that doesn’t get cleared even after calling Py_FinalizeEx. I’ve run into this with rust-based Python packages - basically PyO3 modules get locked to a single process instance and won’t reset when you restart the interpreter. There’s no clean fix for this. You’ll need to either redesign your app to skip reinitializing the interpreter entirely, or spawn new processes for each TensorFlow operation (though that’ll hurt performance).
Been there, done that. The PyO3 limitation isn’t going anywhere.
I gave up fighting interpreter cleanup and moved all TensorFlow stuff to external workflows. My C++ app just triggers automated pipelines that handle ML separately.
You get complete process isolation - no memory leaks, no PyO3 headaches, nothing. C++ stays clean while ML runs in its own environment.
Set up triggers for new data, process through TensorFlow, send results back. No interpreter management.
Bonus: TensorFlow crashes can’t kill your main app anymore. You can scale ML processing independently too.
Check out automation platforms for workflow orchestration: https://latenode.com