I’m using TensorFlow 2.x and exploring the various image augmentation functions, like tf.image.stateless_random_flip_up_down. While these functions randomly apply transformations to images, I’m struggling to find a straightforward way to know exactly what changes were made for each image in a batch.
This information is really important for tasks that require precise localization, such as when my predictions involve points or bounding boxes. If an image is flipped or altered in any way, I need my target data (y) to reflect these modifications accurately.
I suspect that the current image transformation APIs in TensorFlow 2.x don’t provide details about the applied transformations. Ideally, I’d prefer not to create custom solutions, as I’ve done with the older Keras data augmentation approach. Is there a simpler method to handle this?
Hit this same problem building a keypoint detection pipeline. Use tf.random.Generator with explicit state management instead of the stateless functions. Create a generator, capture its state before augmentation, then use that same state to reproduce the random logic separately. You’ll get deterministic behavior and can track which transformations actually happened. Another option that worked well - switch to tf.keras.utils.image_dataset_from_directory with custom preprocessing functions that return (image, transformation_metadata) tuples. More setup but you get full control over tracking. The stateless functions are great for reproducibility but suck for introspection, which is exactly what you need for coordinate tasks.
You’re right - those stateless random functions don’t give you transformation metadata by default. Hit the same wall working on object detection where I needed to adjust bounding boxes after augmentation. Here’s what worked for me: create a wrapper that uses the same random seed twice. First pass tells you what transformation will happen, second pass actually does it. For flips, just check if the random value beats the threshold before applying - now you know if it flipped without TensorFlow telling you. I’ve also used tf.py_function to wrap custom augmentation that returns both the transformed image and a flag. Yeah, it breaks graph compilation a bit, but kept the flexibility I needed for coordinate transforms. Performance hit was pretty minimal since the actual image processing still runs through TensorFlow ops.
you’re overthinking this. just use tf.random.stateless_uniform with the same seed to predict the flip before applying it - if the random value’s > 0.5 it flips, otherwise it doesn’t. recreate the internal logic and track it yourself. works great for my bbox stuff, no weird workarounds needed.