1. Introduction In recent years many countries including Malaysia have become increasingly concerned about the behaviour of young people that affects their behaviour, family financial and education. According to Steve Stillwell, young people today are raise under circumstances of increasingly complex world requiring them to make difficult decisions that will often have a significant impact on their future such as spending habits, bad behaviour and poor in educational. Lifestyle trends, mobile phone, educational all require young people to start making choices and decisions at an early age. Youth societies currently help young people’s to build better future using technologies with large amount of data or in traditional ways by observing and interviewing each young people’s interest and habits which consume more times to analyse and predicting young people through this traditional method. Big data collecting has the potential to change science in decision making process. It is possible to read and analyse thousands of data survey manually but may consume a lot of time to predict anything that related to the data collected. Progresses in the last decade in the fields of high-performance computer simulation and complex real-time analytics paired with the rapidly increasing volume and heterogeneity of data from various sources are shaping the vision of a data-intensive science.
Data mining is the process of selecting, exploring and modelling large amounts of data in order to discover unknown patterns or relationships that provide a clear useful result (1-s2.0-S1607551X12002173). Predictive data mining in social science by determining young people interest deals with learning models to predict young people’s habits. Data mining methods are usually applied in education contexts but not in general to science social contexts. Youth societies can nowadays take advantage of data mining techniques such as classification techniques to deal with the huge amount of data results and help them to decide young people habits with the predictive accuracy. Today’s technologies, predictive data mining are processed using a classifier to classify the data which normally are design in algorithm. In order to measure the data is being classified correctly, predictive accuracy is measure from the related data. Accuracy of classifier refers to the ability of classifier to predict the class label correctly and the accuracy of the predictor refers to how well a given predictor can guess the value of predicted attribute for a new data.
The purpose of this study was to find the factors related to the data or the classifier that influence the predictive accuracy of classifier using the young people survey data collected and to predict the smoking habit, spending habits, gender or family financial background based on their interest. The purpose of this study generally is to help youth society to analyse the activities of young people so that they can organised event or activities that help them to get a better life in future. 2.0 Literature Review2.
1 Defining young peopleAccording to youth and youth policy of the Swedish perspective young people are a heterogeneous group, whose most common denominator is their age. The group of young people is ages between 13-25 years old. Young people are a group with common needs, interests and characteristics. Young people are mostly easily influence to the environment and attempted to like to try new things such as family background, friends, financial,healths and technologies.
Based on the world population prospects the 2010 Revision ,in term of healths, eventhough young people in general in the healthiest period of life , only 1 per cent of less of 13 to 24 years old do not survive to their 25th birthday. This maybe because of their lifestyles which also involving their interests and habits.2.2 Defining classification in Data miningClassification consists of predicting a certain outcome based on a given input. In order to predict the outcome, the algorithm processes a training set containing a set of attributes and the respective outcome, usually called goal or prediction attribute.
The algorithm tries to discover relationships between the attributes that would make it possible to predict the outcome. Next the algorithm is given a data set not seen before, called prediction set which contains the same set of attributes, except for the prediction attribute–not yet known. The algorithm analyses the input and produces a prediction.
The prediction accuracy defines how “good” the algorithm is. Predictive accuracy usually in term of percentages.