DMMLACS 2020 Abstracts

Area 1 - Data Mining and Machine Learning Applications for Cyber Security

Full Papers
Paper Nr: 1

Detecting IoT Botnet Formation using Data Stream Clustering Algorithms


Gabriel C. Arimatéa and Admilson L. Ribeiro

Abstract: The Internet of Things has gained much importance nowadays due to its applicability to many ecosystems on day-to-day use. However, these embedded systems have several hardware constraints, and theses device’s security has been neglected. Consequently, botnets malwares have taken advantage of poor security schemas on these devices. This paper proposes unsupervised machine learning using data streams to detect the botnet formation on the edge of the network. The results obtained by the algorithm includes an average of 98.43% accuracy and taking about 20.07 ms to evaluate each sample from the stream, making it reliable and fast, even in a more constrained device, such as Raspberry Pi 3 B+.

Paper Nr: 2

Computing Massive Trust Analytics for Twitter using Apache Spark with Account Self-assessment


Georgios Drakopoulos, Andreas Kanavos, Konstantinos Paximadis, Aristidis Ilias, Christos Makris and Phivos Mylonas

Abstract: Although trust is predominantly a human trait, it has been carried over to the Web almost since its very inception. Given the rapid Web evolution to a true melting pot of human activity, trust plays a central role since there is a massive number of parties interested in interacting in a multitude of ways but have little or even no reason to trust a priori each other. This has led to schemes for evaluating Web trust in contexts such as e-commerce, social media, recommender systems, and e-banking. Of particular interest in social networks are classification methods relying on network-dependent attributes pertaining to the past online behavior of an account. Since the deployment of such methods takes place at Internet scale, it makes perfect sense to rely on distributed processing platforms like Apache Spark. An added benefit of distributed platforms is paving the way algorithmically and computationally for higher order Web trust metrics. Here a Web trust classifier in MLlib, the machine learning library for Apache Spark, is presented. It relies on both the account activity but also on that of similar accounts. Three datasets obtained from topic sampling regarding trending Twitter topics serve as benchmarks. Based on the experimental results best practice recommendations are given.

Short Papers
Paper Nr: 3

SMS Spam Identification and Risk Assessment Evaluations


Alaa Mohasseb, Benjamin Aziz and Andreas Kanavos

Abstract: Short Message Service (SMS) constitutes one of the most used communication medium. It has become an integral part of people’s lives and like other communication media, SMS texts have been used for propagating spam messages. Despite the fact that a broad range of spam techniques have been proposed to reduce the frequency of such incidents, many difficulties are still present due to text ambiguity; there, the same words can be used in seemingly similar texts which makes it more difficult to identify spam messages. In this paper, we propose an approach for identifying and classifying spam SMS based on the Syntactical features and patterns of the message. The proposed approach consists of four main parts, namely, SMS Pre-processing, Syntactical Features Extraction and Pattern Formulation, Classification and, Risk Analysis. Experimental results show that the proposed approach achieves a good level of accuracy. In addition, to show the effectiveness of handling class imbalance on the classification performance, two additional experiments were conducted using the implementation of the SMOTE algorithm. There, the results depicted that handling class imbalance help in improving identification and classification accuracy. Furthermore, based on the above, a risk model has been proposed that addresses the risk probability and the impact of spam SMS.