Simple Classification in Neural Network

Published in

Analytics Vidhya

5 min readAug 29, 2020

We know that many complex machine learning problems can easily be solved using neural network. For example, in supervised learning (classification), we can use them to classify the image or text.

Now, what if we do it for the simple dataset that actually can be solved using “regular” machine learning?

In this post, we’ll try to use the simple dataset titled “Gender Classification” from Kaggle. This data contains only 66 rows, 4 features (that represent interests of the user’s gender in our target variable).

Our target is to classify the gender of the user based only on their interests/preferences. By using data.info(), we can see that there is no NULL values in our dataset.

However, despite of no null values in our data, since we only have 4 features and 2 classes, we still have a problem. We can’t be sure that there is no inconsistent label from the same feature values in our data. We can detect them by grouping the data as follows.

grouping = data.groupby(list(data.columns)[:-1]).apply(lambda x: x.Gender.nunique())
grouping[grouping.eq(2)]

What will we do if we have different label (output) from the same feature values (input)?

Well, in this case, nothing we can do now. If it is because of the mistake happened during the data entry (human error), then we can drop those values. However, we can’t just do that since it will make bias in our model.

In reality, it is reasonable that both genders could have mutual interest in color, music genre, beverage, and soft drink. Therefore, we can tackle this problem by adding more features, especially by finding unique characteristics for each gender. But for now, let’s just take it as it is.

Preprocess the Data

The next thing to consider is we can’t directly input the data to neural network model since the data is still in categorical text. Hence, we have to encode the data using One-Hot Encoding and Label Encoding.

Train and Test the Model

We define the model so we can create new model for every iteration in K-Fold. For the input layer, we input the data with shape 20 (because we got 20 columns in total after preprocessing the data), and use ‘float32’ as dtype (because I will export this model into TFJS — that supports float32).

We chose Adam as our optimizer since it is the most popular than the rest. You can find more about other optimizers on this link and learn more about Adam here.

def train_model(X_train, X_test, y_train, y_test):
  model = tf.keras.models.Sequential([
            tf.keras.Input(shape=(20), dtype='float32'),
            tf.keras.layers.Dense(units=1024, activation='relu'),
            tf.keras.layers.Dropout(0.4),
            tf.keras.layers.Dense(units=1, activation='sigmoid')
          ])  model.compile(optimizer=Adam(lr=0.0001),
                loss='binary_crossentropy',
                metrics=['accuracy'])  # Callback to reduce learning rate if no improvement in validation loss for certain number of epochs
  reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=10, min_lr=1e-8, verbose=0)# Callback to stop training if no improvement in validation loss for certain number of epochs
  early_stop = EarlyStopping(monitor='val_loss', patience=20, verbose=0)  history = model.fit(
              X_train, y_train,
              epochs=1000,
              validation_data=(X_test, y_test),
              callbacks=[reduce_lr, early_stop],
              verbose=0
            )  tr_loss, tr_acc = model.evaluate(X_train, y_train)
  loss, accuracy = model.evaluate(X_test, y_test)
  return model, history, tr_loss, tr_acc, loss, accuracy

Then we run and try to find best model in each iteration in K-Fold.

kfold = KFold(n_splits=5, random_state=42, shuffle=True)loss_arr = []
acc_arr = []
trloss_arr = []
tracc_arr = []temp_acc = 0for train, test in kfold.split(data):
  model, history, trloss_val, tracc_val, loss_val, acc_val =   train_model(X.iloc[train], X.iloc[test], y[train], y[test])  # If we got better accuracy in validation, then we save the split scenario and the model
  if acc_val > temp_acc:
      print("Model changed")
      temp_acc = acc_val
      model.save('best_model.h5')
      train_index = train
      test_index = test
      best_history = history  trloss_arr.append(trloss_val)
  tracc_arr.append(tracc_val)
  loss_arr.append(loss_val)
  acc_arr.append(acc_val)

This is the result of the accuracy for each iteration.

Below are the accuracy and loss plot for our best model (in 5th iteration).

Lastly, I made ReactJS App and have deployed it on Github Page. You can try to predict the gender from chosen preferences interactively.

Gender Guess!

This website is intended for project only

musmeong.github.io

The model I used is the same with what I proposed in this post, then it is converted into TFJS using following code (after pip install tensorflowjs).

!tensorflowjs_converter --input_format keras best_model.h5 models/

Repo Link for React App : https://github.com/musmeong/gender-guess/tree/master

Kaggle Notebook : https://www.kaggle.com/musmeong/gender-classification-using-neural-network

Simple Classification in Neural Network

Preprocess the Data

Train and Test the Model

Gender Guess!

This website is intended for project only

Written by Muhamad Mustain