I’m working on a binary classification problem and trying to optimize my model performance. My dataset has around 16k samples with 7 features each.
Current setup:
- Total parameters: approximately 48K
- Input data shape: (samples, 7)
- Binary output (0 or 1)
Sample input data:
data_x = np.array([[ 2019, 15, 45123, ..., 0, 7234, 0],
[ 2019, 15, 45123, ..., 0, 7899, 0],
[ 2019, 31, 45123, ..., 0, 7234, 0],
...,
[ 8500, 25, 52341, ..., 0, 9876, 0]], dtype=int32)
Target labels:
labels_y = np.array([0., 0., 0., ..., 1., 0., 0.])
My current model architecture:
neural_net = Sequential()
neural_net.add(BatchNormalization(input_shape=(7,)))
neural_net.add(Dense(64, activation="relu"))
neural_net.add(Dense(64, activation="relu"))
neural_net.add(Dense(128, activation="relu"))
neural_net.add(Dense(32, activation="relu"))
neural_net.add(Dense(1, activation='sigmoid'))
Training configuration:
neural_net.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
training_history = neural_net.fit(train_x, train_y, batch_size=1024,
validation_data=(test_x, test_y), epochs=30000)
I’ve tried different approaches:
- Adding batch normalization improved accuracy from 50% to 73%
- Tested both ‘adam’ and ‘rmsprop’ optimizers with similar results
- Experimented with different batch sizes (1024, 2048)
- Extended training to 30000 epochs which gave me 78.51% accuracy
- Doubled layer sizes but still getting similar performance
Currently getting around 6k correct predictions out of 16k samples. What techniques or modifications should I try to push the accuracy higher?