Predict A Person's Personality Based on the Shape of Handwriting of Letters "i", "o", and "t" Using Levenberg Marquardt Backpropagation Method

The literature shows that Graphology is common and relatively useful in our life. For example, as one of the job requirements. Professional organizations hire a professional handwriting analyst called Graphologist to analyze the characteristic traits of the candidates by identified their handwriting. However, the accuracy of handwriting analysis depends on how skilled the graphologist is, two graphologists which predict the same handwriting may give us a different result of the prediction. To improve the accuracy, we develop a system that can automatically predict a person’s personality based on the shape of the handwriting of the letters "i", "o", and "t" using the Levenberg Marquardt Backpropagation method. Based on this research we got the maximum accuracy by using 2 hidden layers. We got 71,42% of accuracy for the letter “i”, 76,92% of accuracy for the letter “o”, and 60% of accuracy for letter the “t”.


I. INTRODUCTION
Graphology is an ancient science which uses different attributes of handwriting to analyze the person's personality traits, including emotional outlay, fears, honesty, defenses, and many others. Graphology is also known as an empirical science because this science is proven based on phenomena in one population, where there was the quantification of results or there are results from statistical tests that can be accounted for [1].
In this paper the personality traits revealed by the shape of letter "i", "o", and "t". Handwriting of the letter "i" can indicate someone's ego [2], the letter "o" can indicate someone's honesty [2], and the letter "t" can indicate how high self-esteem someone has [3]. The letters "i", "o", and "t" are also selected because they can be analyzed in block letters and in Indonesia itself, the handwriting that produced by the community tends to be in the form of block letters. It is proved by research [4], which is based on the data collection process the sample of 150 correspondents showed almost all correspondents use block letters in their handwriting. While the shape of letters that can only be analyzed in handwriting is the letter "y" and "d" [2].
In Indonesia, the professional organizations hire a professional handwriting analyst called Graphologists to analyze the characteristic traits of the candidates which are identified from their handwriting as one of the job requirements. However, the accuracy of handwriting analysis depends on how skilled the graphologist is, two different graphologists which predict the same handwriting may give us a different result of the prediction. Therefore, an automated system for handwriting analysis is needed. This research aims to develop a system which can be used to predict a person's personality based on the handwriting shape of the letters "i", "o", and "t" using the Levenberg Marquardt Backpropagation method.

A. Related Works
Predict a person personality based on the shape of the letter "i" using Backpropagation Neural Network method have been done earlier by [5] [4], they got 95% and 80,97% of accuracy. Then related works which predict a person personality based on the shape of the letter "t" using Backpropagation method have been done earlier by some researchers like [6] [7][3] [4]. The accuracy which they got were 60% [6], 73,33% [7], 88% [3], and 87,97% [4]. While [8] who implement Levenberg Marquart Backpropagation got 65% of accuracy on predicted someone personality based on the shape of letter "t".
A Researcher [9] said that the accuracy from using ANN to recognize handwriting pattern depends on the variation of the data training and the number of neurons in the hidden layer. The variation of personality traits of the letter "i" and "t" that are used commonly for training data have been defined in [10] and [11]. While, based on [12] which compare 2 different kinds of the neural network in their research got a conclusion that the neural network which has only one hidden layer will give a better result of accuracy than the neural network which has more than one hidden layer. It also happens on other researcher [13] who used a multilayer perceptron with one hidden layer to recognize handwritten English character based on the shape of the letter and got 94% of accuracy.
Based on those researches, there are view things which effect how good the accuracy is. First, the accuracy from the letter recognition process which doesn't reach 100% will affects the accuracy of the personality's predicting process. Second, some parameters like the variation of data training, number of hidden layers, and many more will give some effects too. Third, Levenberg Marquart Backpropagation is already proofing gave a good enough accuracy.
So, in this research, to discover the best accuracy that we can get, we decided to do some experiments. First, we will not do a recognition to the letter like [6] did but only did a person's character recognition based on the shape of the letters. Second, we will use 7 kinds of personality traits for letter "i", 5 kinds of personality traits for letter "o", and 4 kinds of personality traits for letter "t". Third, implementing Levenberg Marquardt Backpropagation to see its performances on predict a person character. Fourth, find out the optimal parameter of Levenberg Marquardt Backpropagation.

B. Theoretical Basis B.1. Graphology
According to the science of Neuropsychology (studies that study the structure and functions of the brain that are related to a person's psychological processes and behavior) handwriting is often referred to as brainwriting [3]. It is the main reason why there are some people who still can writing with their foot or mouth [14] because each neurological brain pattern produces a unique neuromuscular movement that is the same for every person who has that particular personality trait. When writing, these tiny movements occur unconsciously. Each written movement or stroke reveals a specific personality trait [3]. There are two important things we need to remember about the handwriting. First, the handwriting which keeps changing is normal. Second, the handwriting cannot be forged.
Handwriting which keeps changing is normal because it is known that the structure of a person's brain is like the plastic that is always changing through thoughts, feelings, and actions. In Neurology that condition is known as Neuroplasticity (brain plasticity). That is the main reason of when a person's character changes, that person's handwriting will change as well. Handwriting cannot be forged because based on a surgical neuropsychology procedure it is found that something which will be written is produced by the conscious brain, meanwhile the handwritten form produced is the result of subconscious brain scratches. It makes almost impossible for a handwritten document to be completely forged because when a handwriting is forged, automatic scratches produced by the subconscious brain of the original handwriting owner will be impossible to imitate by the counterfeiters because of the role of the limbic system (one part of the human brain that functions to regulate changes in human emotions) [14].

B.2. The Personality Traits Based on
The Letter "i", "o", and "t".
The different types of shape of latter "i", "o", and "t". along with the personality trait associated with them, in Graphology [2] [15] are shown in Table I to Table III.

B.3. Levenberg Marquardt Backpropagation
Levenberg Marquardt Backpropagation (LMB) is the newest modified version of Backpropagation. The differences between LMB and Backpropagation is Backpropagation using the negative gradient descent to update the weight and bias value while LMB using jacobian matrix to do it, it makes the LMB's the convergence process is faster to achieve when compared to the Backpropagation algorithm [16].

B.4. Generalization Testing Using a Confusion Matrix.
The generalization testing process using a confusion matrix is done by finding the optimal parameters of Levenberg Marquardt Backpropagation. The appearance of equation for accuracy calculation is shown in Eq. 1. Accuracy = number of successful patterns correctly recognized total number of patterns tested x100% (1) While, the appearance of equation for error rate calculation is shown in Eq. 2.
Error rate = number of patterns which not successfully recognized total number of patterns tested x100% (2) III. RESEARCH METHOD In the testing process, we use some parameters to find out how it will affect the accuracy. These parameters are the number of the hidden layer's neuron, the epoch, the error tolerance, and the learning rate.

A. Data Collection
The handwriting images of the letter "i", "o", and "t" are taken from 3 different days for each respondent, where every respondent need to wrote those letters on the sample form that is shown in Figure 1. The handwriting sample data that has been collected will be selected manually based on some scope of problems by the Graphologists we worked together with and we will picked 100 data from each letter, where 100 of these data are the handwritten sample data that have passed the selection or doesn't out of the scope of problems. The sample data that has passed the selection will be predicted again manually by the Psychologist or Graphologist to determine the personal character of each handwriting and classify the handwriting sample data into several personality types. In this process, 300 handwriting images are selected. The dataset distribution is shown in Table IV. TABEL IV. DATA COLLENCTION OF THE HANDWRITING IMAGES.
Dataset of each personality type of the alphabet are splited into 75% for training data dan 25% for testing data.

B. Approach
In this research, we used the common pattern recognition processes as shown in Figure 2. The first step is scanning the handwriting images. The second step is the image processing including selecting the area, greyscaling, cropping, and resizing the images into the 5x7 pixel. The third step is feature extraction to get biner's value of the images that followed by reshaping process to reshape the biner's value into shape of matrix [[row x column) x1] then trained it using the Levenberg Marquardt Backpropagation method and saved it into the training data, then used it in the testing process which having the same steps as the training process.

C. The Application Interface
The interface will not display the process or results of the training and testing process because this process will only be done in the Command Window in MatLab. The interface of the e -graphology analysis application is shown in Figure 3.

A. System Testing
There are two kinds of technique that we used to test the system. The first one is doing the generalization testing. The second technique is finding the accuracy and error rate for the testing process by using the confusion matrix.

A.1. The Generalization Testing Process.
The testing process generalization is done by finding the optimal combination values of Levenberg Marquardt Backpropagation's parameters for the training process, such as the number of hidden neurons, the epoch, the error tolerance, and learning rate.

A.1.1. The Influence of The Number of Hidden Neurons.
According to Haykin the number of neurons is at intervals 1 to 9. So, in this experiment, we used 1, 3, 5, 7, and 9 hidden neurons. Then, 10 -4 of the error tolerance, 1000 epochs, and 0.01 of the learning rate. Each experiment are calculated for 5 times. The graph of the relationship between the number of hidden neurons and the average value of the generalization testing is shown in Figure 4. Based on the research [7] a large number of hidden neurons makes the network more flexible and can produce better generalization's value but if the number of hidden neurons is too large the generalization's value tends to decrease and unstable, then the time which needed for the training process becomes longer. As it shown in Figure 4. the average value of the generalization testing for the letter "i" increases in the 1st to 7th hidden neurons, while the average value from training process which used more than 9 hidden neurons begins to decrease and unstable. So, for the next experiment we will be only using 1 to 7 hidden neurons for the letter "i", 1 to 7 hidden neurons for the letter "o", and 5 to 9 hidden neurons for the letter "t".

A.1.2. The Influence of The Number of Epoch.
In this experiment, we used 1000, 1500, 2000, 2500, and 3000 epochs. Then, 10 -4 of the error tolerance, 0.01 of the learning rate, and some of the hidden neurons that have been specified in section A.    Based on the research [7] the larger number of epochs will give an effect in the longer training time. So, for the next experiment, the experiment for the letter "i" will be using 3 hidden neurons with 2000 epoch and 7 hidden neurons with 1000 epoch. The letter "o" will be using 3 hidden neurons with 2000 epoch and 5 hidden neurons with 1500 epoch. The letter "t" will be using 7 hidden neurons with 1000 epoch.

A.1.3. The Influence of The Number of Error Tolerance.
In this experiment, we used 10 -1 , 10 -2 , 10 -3 , 10 -4 , and 10 -5 of the error tolerance, then 0.01 of the learning rate along with some number of hidden neurons and epochs that been specified in section A.1.1 and A.1.2. Each experiment are calculated for 5 times. The graph of the relationship between the number of error tolerance and the average value of the generalization testing are shown in Figure 8, Figure 9, and Figure 10.   Based on the research [7], if the error tolerance value is reduced, the accuracy of the network becomes higher but if the error tolerance is too small it can cause the symptoms of overfitting, as happened with 10 -4 and 10 -5 of the error tolerance values in Figure 10, the accuracy get decreased from 53,38% to 52,00%. The symptoms of an overfitting system arise because the system starts to study patterns in training data overly and it will be difficult to learns new patterns in the testing process. So, for the next experiment the letter "i" will be using 10 -1 of error tolerance and 3 hidden neurons. The letter "o" will be using 10 -3 of error tolerance and 3 hidden neurons. The letter "t" will be using 10 -3 of error tolerance and 7 hidden neurons.

A.1.4. The Influence of The Number of Learning Rate.
In this experiment, we used 0.1, 0.5, and 0.01 of the learning rate along with some number of hidden neurons, epochs, and error tolerance that been specified in section A.1.1 to A.1.3. Each experiment are calculated for 5 times. The graph of the relationship between the number of error tolerance and the average value of the generalization testing is shown in Figure 11. Based on the research [7], if the value of learning rate is too large then the network will becomes unstable, but if the learning rate is too small then the network will take a long time to converge. All results of the optimal parameters are shown in Table V.

A.2. Finding The Accuracy and Error Rate For The Testing Process By Using The Confusion Matrix.
The second step is to finding the accuracy and the error rate of the testing process on pattern recognition using the confusion matrix with using Eq. 1 to Eq. 2. The optimal Levenberg Marquardt Backpropagation parameters that will be used in the training process for this experiment came from the research section A.1.1. to A.1.4.

A.2.1. Experimental On the Letter "i".
The character recognition process based on the shape of the letter "i" has 66 images of training data, 28 data test images and 7 classes. The kind of shape of the letter "i" that would be analyzed is shown in Table VI  The tittle is fully filled and right above the stem "I". 2 The tittle is fully filled and far above the stem "I". 3 The tittle is fully filled and tend to be on the left side of the "I" stem.

4
The tittle is fully filled and tend to be on the right side of the "I" stem.

5
The tittle is replaced with a circle.

6
The tittle is replaced with a dash.   Based on the Table IV the accuracy of letter "i" is 39,28% and the error rate is 60,71%. The testing process is only done on the 1 st to 5 th class to avoid the overfitting symptoms that caused by the number of datas from the 6 th and 7 th class are doesn't suit with the training data requirement. The examples of some data test that are successfully and not successfully classified are shown in Table VIII and Table IX.   TABLE VIII. THE TESTING DATA THAT ARE SUCCESSFULLY CLASSIFIED.

A.2.2. Experimental On The Letter "o".
The character recognition process based on the shape of the letter "o" has 74 images of training data, 26 data test images and 5 classes. The kind of shape of the letter "o" that would be analyzed is shown in Table X and the result  of the testing process is shown in Table XI. The circle is drawn full or closed 3 The circle is drawn full or closed and have a tail.

4
The circle is drawn open at the top and have a tail that forms a small circle in the middle of the circle 5 The circle is drawn open at the below.  Based on the Table XI the accuracy of letter "o" is 26,92% and the error rate is 73,07%. The examples of some data test that are successfully and not successfully classified are shown in Table XII and Table XIII.   TABLE XII. THE TESTING DATA THAT ARE SUCCESSFULLY CLASSIFIED.

A.2.3. Experimental On The Letter "t".
The character recognition process based on the shape of the letter "t" have 75 images of training data, 25 test data images and 4 classes. The kind of shape of the letter "t" that would be analyzed is shown in Table XIV and the result of the testing process is shown in Table XV. TABLE XIV. THE LETTER "t" THAT WILL BE ANALYZED.

Type
Class Definition

1
The line "__" is short and it is almost at the top.

2
The line "__" is long and it is almost at the top.

3
The line "__" is long and right in the middle.

4
The line "__" is short and right in the middle.
Based on the Table XV the accuracy of letter "t" is 32% and the error rate is 68%. The examples of some data test that are successfully and not successfully classified are shown in Table XVI and Table XVII.  According to research [7], the error prediction can be caused by the form of test images that are similar to the test images in the prediction class because the neurons in the first hidden layer are unable to properly recognize the components of the handwritten data pattern. Therefore, as a step to increase the value of accuracy, an experiment will be done by adding another 2 hidden layers on the Levenberg Marquard Backpropagation network which is accompanied by the search to finding the optimal number of hidden neurons for the 2 nd and 3 rd hidden layer.

A.3. Experimental On Finding The Effect of The Number of Hidden Layers On The Accuracy.
This experiment was conducted to determine the effect of using more than one hidden layer on Levenberg Marquardt Backpropagation Neural Network to know it performances in the character recognition process based on the shape of handwriting of the letters "i", "o", and "t".
This experiment will be using three hidden layers and a number of optimal parameters that have been obtained from experiment A.1.1 to A.1.4, where the 1 st hidden layer will use a number of optimal neurons, then the 2 nd and 3 rd hidden layer will use 1, 3, 5, 7, and 9 neurons to find out the optimal number of neurons. The graph of the relationship between the effect of the number of hidden neurons that the 2 nd and 3 rd hidden layer has on the average generalization value are shown in Figure 12 and Figure 13. Based on Figure 12, it is known that the optimal number of neurons in the second hidden layer for the letter "i" is 5 and 7 neurons, for the letter "o" is 1 neuron, and for the letter "t" is 9 neurons. Based on Figure 13, it is known that the optimal number of neurons in the third hidden layer for the letter "i" is 5 neurons, for the letter "o" is 1 neuron, and for the letter "t" is 7 neurons. Then, the result of accuracy for each letter based on the number of hidden layer that being used is shown in Figure 14. It can be concluded that by only using 2 hidden layers in this experiment can increase the accuracy of the process of recognizing a person's character based on the shape of handwriting of the letters "i", "o", and "t". It is because, based on this experiment with only 1 additional hidden layer, the network can do the pattern recognition process in more detail and good enough, where the neurons in the second hidden layer are already be able to recognize each sub-component of the handwriting pattern better and more detail compared to the neurons in the first hidden layer.
Based on this experiment, it also can be concluded that by using 3 hidden layers the accuracy will be decreased. It happened because the network got symptoms of an overfitting, it makes the network starts to study patterns in training data overly then then network will be difficult to learns new patterns in the testing process.

A. Conclusion
Based on the experimental results and discussions, we have some conclusions as follows: 1. The application of e-graphology has been able to predict a person's personality based on the form of handwriting automatically with the maximum accuracy value of "i" is reaching 71.42%; for letter "o" is reaching 76.92% ; and for the letter "t" is reaching 60%.
2. The optimal value of the Levenberg Marquardt Backpropagation parameters in the 1 st hidden layer for the letter "i" are 3 hidden neurons, 10 -1 error tolerance, 2000 epoch, and 0.05 learning rate; for the letter "o" are 3 hidden neurons, 10 -3 error tolerance, 2000 epoch, and 0.01 learning rate; for the letter "t" are 7 hidden neurons, 10 -3 error tolerance, 1000 epoch, and 0.01 learning rate. 3. The number of optimal hidden neurons in the 2 nd hidden layer for the letter "i" are 5 to 7 neurons; for the letter "o" are 1 neuron; and for the letter "t" are 9 neurons. 4. The number of optimal hidden neurons in the 3 rd hidden layer for the letter "i" are 5 neurons; for the letter "o" are 1 neuron; and for the letter "t" are 7 neurons. 5. By only using 2 hidden layers, the accuracy of the letter "i" increased from 39.28% to 71.42, the accuracy of the letter "o" increased from 26.92% to 76.92%, and the accuracy of the letter "t" increased from 32% to 60%.

B. Future Works
This research can still be developed to be a better system. Some suggestions for further research include : 1. Using additional techniques in the feature extraction process to obtain better feature extraction values, such as : template matching. 2. Looking for more valid reference regarding the grouping techniques of letters "i", "o", and "t", such as finding out how high the distance of the tittle from the stem of the letter "i" so it can be grouped into certain personality classes to reduce the error rate of handwriting grouping process.