WEBIST 2020 Abstracts


Area 1 - Artificial Intelligence on the Web

Full Papers
Paper Nr: 15
Title:

Automated Tag Enrichment by Semantically Related Trends

Authors:

Antonella Arca, Salvatore Carta, Alessandro Giuliani, Maria M. Stanciu and Diego R. Recupero

Abstract: The technological evolution of modern content sharing applications led to unbridled increase of video content creation and with it multimedia streaming, content sharing and video advertising. Managing huge volumes of video data becomes critical for various applications such as video browsing, retrieval, and recommendation. In such a context, video tagging, the task of assigning meaningful human-friendly words (i.e., tags) to a video, has become an important pillar for both academia and companies alike. Indeed, tags may be able to effectively summarize the content of the video, and, in turn, attract users and advertisers interests. As manual tags are usually noisy, biased and incomplete, many efforts have been recently made in devising automated video tagging approaches. However, video search engines handle a massive amount of natural language queries every second. Therefore, a key aspect in video tagging consists of proposing tags not only related to video contents, but also popular amongst users searches. In this paper, we propose a novel video tagging approach, in which the proposed tags are generated by identifying semantically related popular search queries (i.e., trends). Experiments demonstrate the viability of our proposal.

Short Papers
Paper Nr: 11
Title:

Text Mining of Medical Documents in Spanish: Semantic Annotation and Detection of Recommendations

Authors:

Carlos Tellería, Sergio Ilarri and Carlos Sánchez

Abstract: In medical practice, identifying relevant facts and therapeutic recommendations from health-related documents is a key issue to ensure an efficient and effective service to patients. However, the automatic analysis of text documents to extract relevant data is a challenging task. This is the case particularly when we deal with documents written in languages other than English, for which the availability of lexical resources and tools is much more limited and less experiences have been reported. In this paper, we present our experience dealing with texts written in Spanish in a medical context. By applying text mining techniques and exploiting semantic resources, we present an approach to automatically label documents using appropriate medical terms. Besides, we also describe a technique that attempts to detect practice recommendations for doctors automatically in clinical guides. An experimental evaluation shows the benefits of applying text mining techniques as a support system for doctors as well as its feasibility. The scarcity of experimental evaluations with medical documents in Spanish motivated our work.

Paper Nr: 16
Title:

Efficient Thumbnail Identification through Object Recognition

Authors:

Salvatore Carta, Eugenio Gaeta, Alessandro Giuliani, Leonardo Piano and Diego R. Recupero

Abstract: Given the overwhelming growth of online videos, providing suitable video thumbnails is important not only to influence user’s browsing and searching experience, but also for companies involved in exploiting video sharing portals (YouTube, in our work) for their business activities (e.g., advertising). A main requirement for automated thumbnail generation frameworks is to be highly reliable and time-efficient, and, at the same time, economic in terms of computational efforts. As conventional methods often fail to produce satisfying results, video thumbnail generation is a challenging research topic. In this paper, we propose two novel approaches able to provide relevant thumbnails with the minimum effort in terms of time execution and computational resources. The proposals rely on an object recognition framework which captures the most topic-related frames of a video, and selects the thumbnails from its resulting frames set. Our approach is a trade-off between content-coverage and time-efficiency. We perform preliminary experiments aimed at assessing and validating our models, and we compare them with a baseline compliant to the state-of-the-art. The assessments confirm our expectations, and encourage the future improvement of the proposed algorithms, as our proposals are significantly faster and more accurate than the baseline.

Paper Nr: 23
Title:

Converting Web Pages Mockups to HTML using Machine Learning

Authors:

Tiago Bouças and António Esteves

Abstract: Converting Web pages mockups to code is a task that developers typically perform. Due to the time required to accomplish this task, the time available to devote to application logic is reduced. So, the main goal of the present work was to develop deep learning models to automatically convert mockups of Web graphical interfaces into HTML, CSS and Bootstrap code. The trained model must be deployed as a Web application. Two deep learning models were built, resulting from two different approaches to integrate in the Web application. The first approach uses a hybrid architecture with a convolutional neuronal network (CNN) and two recurrent networks (RNNs), following the encoder-decoder architecture commonly adopted in image captioning. The second approach is focused on the spatial component of the problem being addressed, and includes the YOLO network and a layout algorithm. Testing with the same dataset, the prediction’s correction achieved with the first approach was 71.30%, while the second approach reached 88.28%. The first contribution of the present paper is the development of a rich dataset with Web pages GUI sketches and their captions. There was no dataset with sufficiently complex GUI sketches before we start this work. A second contribution was applying YOLO to detect and localize HTML elements, and the development of a layout algorithm that allows us to convert the YOLO result into code. It is a completely different approach from what is found in the related work. Finally, we achieved with YOLO-based architecture a prediction’s correction higher than reported in the literature.

Paper Nr: 24
Title:

What’s in a Definition? An Investigation of Semantic Features in Lexical Dictionaries

Authors:

Luigi Di Caro

Abstract: Encoding and generating word meanings as short definitions for user- and machine-consumption dictionaries is still a usually adopted strategy within interpretable lexical-semantic resources. Definitions have the property of being natural to be created by humans, while several techniques have been proposed for the automatic extraction from large corpora. However, the reversed process of going back to the words (i.e., onomasiological search) is all but simple for both humans and machines. Indeed, definitions show context- and conceptual-based properties which influence their quality. In this contribution, I want to draw the attention to this problem, through a simple content-to-form experimentation with humans in the loop. The results give some first insight on the relevance of the problem from a computational perspective. In addition, I analyzed (both quantitatively and qualitatively) a set of 1,901 word definitions taken from different sources, towards the modeling of features for their generation and automatic extraction.

Paper Nr: 37
Title:

Automatic Detection of Terms and Conditions in German and English Online Shops

Authors:

Daniel Braun and Florian Matthes

Abstract: Terms and Conditions in online shops are arguably among the most important (or at least the most widely used) forms of consumer contracts. At the same time, they are probably among the least read documents. Thus, their automated analysis is of great interest, not just for research, but also from a consumer protection perspective. To be able to automatically process large amounts of Terms and Conditions and build the corpora which are necessary to train data-driven systems, we need means to identify Terms and Conditions automatically. In this paper, we present and evaluate four different approaches to the automatic detection of Terms and Conditions pages in German and English online shops. We treat the problem as a binary document classification problem for web-pages and report an approach which achieves precision, recall, and F1-score above 0.9 in German and close to 0.9 in English, by analysing the URL of the page.

Area 2 - Internet Technology

Full Papers
Paper Nr: 17
Title:

Transfer Learning to Extract Features for Personalized User Modeling

Authors:

Aymen Ben Hassen and Sonia Ben Ticha

Abstract: Personalized Recommender Systems help users to choose relevant resources and items from many choices, which is an important challenge that remains actuality today. In recent years, we have witnessed the success of deep learning in several research areas such as computer vision, natural language processing, and image processing. In this paper, we present a new approach exploiting the images describing items to build a new user’s personalized model. With this aim, we use deep learning to extract latent features describing images. Then we associate these features with user preferences to build the personalized model. This model was used in a Collaborative Filtering (CF) algorithm to make recommendations. We apply our approach to real data, the MoviesLens dataset, and we compare our results to other approaches based on collaborative filtering algorithms.

Paper Nr: 20
Title:

Selective Auctioning using Publish/Subscribe for Real-Time Bidding

Authors:

Sonia Slimani and Kaiwen Zhang

Abstract: Real-Time Bidding (RTB) advertising has recently experienced a massive growth in the industry of online marketing. RTB technologies allow an Ad Exchange (AdX) to conduct online auctions in order to sell targeted ad impressions by soliciting bids from potential buyers, called Demand Side Platforms (DSPs). In the OpenRTB specifications, which is a well-known open standard protocol for RTB, the AdX sends bid requests to all DSPs for every auction. This communication protocol is highly inefficient since for each given auction, only a small fraction of DSPs will actually submit a competitive bid to the AdX. The exchange of bid requests to uninterested parties waste valuable computation and communication resources. In this paper, we propose to leverage publish/subscribe to optimize the auction protocol used in RTB. We demonstrate how RTB semantics can be expressed using content-based subscriptions, which allows for selective dissemination of bid requests in order to eliminate no-bid responses. We also formulate the problem of minimizing the number of bid responses per auction, and propose combining top-k scoring with regression analysis with continuous variables as a heuristic solution to further reduce the number of irrelevant responses. We then adapt our solution by considering discrete machine learning models for a faster execution. Finally, we evaluate our proposed solutions against the OpenRTB baseline in terms of end-to-end latency and total paid price over time efficiency.

Short Papers
Paper Nr: 21
Title:

A Management Model of Real-time Integrated Semantic Annotations to the Sensor Stream Data for the IoT

Authors:

Besmir Sejdiu, Florije Ismaili and Lule Ahmedi

Abstract: Wireless Sensor Networks (WSNs) are one of the most important components of the Internet of Things (IoT). They produce continuous stream of data and transmit these data to a centralized server. Due to the dramatic increase of streaming data, their management and exploitation has become increasingly important. Furthermore, by adding semantic annotations into sensor stream data, better understanding and more meaningful descriptions is provided, which enables application areas of IoT to become much more intelligent. In this paper, a data stream management model of WSNs for IoT real-time monitoring systems, that supports real-time integration of data from heterogeneous sensors with semantic annotations is presented. To validate the proposed model, an IoT system for real-time water quality monitoring is built, which enables real-time integration of semantic annotations to the sensor stream data in the format of Sensor Observation Service (SOS).

Paper Nr: 25
Title:

Analysis of Selected Characteristics of Open Data Inception Portals in the Context of Smart Cities IoT Data Accessibility

Authors:

Paweł Dymora, Mirosław Mazurek and Bartosz Kowal

Abstract: In this study, we focus on Open Government Data, which is the sphere of public services where such type of data can be useful. In the Industry 4.0 concept, the primary data source is the IoT infrastructure. Open Data is of considerable importance for the software development process. The issue of Open Data is becoming a significant challenge nowadays. Especially when it comes to preparing data for sharing, analyzing it, and searching for hidden dependencies, which opens up new possibilities for computing and artificial intelligence. The paper shows that the architecture of solutions existing, e.g., in Poland, follows global trends. Together with statistics based on the Socrata portal, it can be noticed that these data can be and are successfully used for data processing. New methods and software are being developed for processing data as we write. The vast majority of software is data-driven, and data are needed for verification and validation. The article presents a comprehensive analysis of available open data portals with data.json files as also the analysis of the most commonly used data formats for Open Data Network portal databases.

Paper Nr: 27
Title:

Multifactorial Evolutionary Prediction of Phenology and Pests: Can Machine Learning Help?

Authors:

Francisco J. Lacueva-Pérez, Sergio I. Artigas, Juan B. Vargas, Gorka L. Lezaun and Rafael H. Alonso

Abstract: Agriculture is a key primary sector of economy. Developing and applying techniques that support a sustainable development of the fields and maximize their productivity, while guaranteeing the maximum levels of health and quality of the crops, is necessary. Precision agriculture refers to the use of technology to help in the decision-making process and can lead to the achievement of these goals. In this position paper, we argue that machine learning (ML) techniques can provide significant benefits to precision agriculture, but that there exist obstacles that are preventing their widespread adoption and effective application. Particularly, we focus on the prediction of phenology changes and pests, due to their important to ensure the quality of the crops. We analyze the state of the art, present the existing challenges, and outline our specific research goals.

Paper Nr: 34
Title:

Accommodating Negation in an Efficient Event-based Natural Language Query Interface to the Semantic Web

Authors:

Shane Peelar and Richard A. Frost

Abstract: Although The Semantic Web was built with the Open World Assumption in mind, there are many cases where the Closed World Assumption would be a better fit. This is unfortunate because the OWA prevents rich queries involving negation from taking place, even in contexts where it would be appropriate. In this paper we present an English Natural Language Query Interface to event-based triplestores based on Compositional Semantics that can support both open and closed world semantics with “drop-in” denotations for “no”, “not”, “non”, and “the least”. Where closed world semantics are not appropriate, omitting these denotations is sufficient to restore the OWA. The result is a highly expressive compositional semantics supporting complex linguistic constructs such as chained prepositional phrases, n-ary transitive verbs, superlative phrases, and negation, suitable for expert systems and knowledge bases.

Paper Nr: 43
Title:

Comparative Analysis between the k-means and Fuzzy c-means Algorithms to Detect UDP Flood DDoS Attack on a SDN/NFV Environment

Authors:

João A. Neto, Layse S. Souza and Admilson L. Ribeiro

Abstract: Distributed Denial of Service (DDoS) attacks are a growing issue for computer networks security and have become a serious network security problem. Environments based on Software Defined Networking (SDN) and Network Function Virtualization (NFV) offers the ability to program a network and allows dynamic creation of flow policies. Allied to that, clustering algorithms can be used to classify and detect DDoS. This paper presents a study and an analysis of two unsupervised machine learning algorithms used to detect DDoS attacks in an SDN/NFV simulated environment. The results obtained by the two algorithms include an accuracy rate of 99% and the k-means algorithm was 33% faster than fuzzy c-means, which demonstrates its effectiveness and scalability.

Paper Nr: 6
Title:

Using IoT Platform for 360-Degree Video User Logging and Analysis

Authors:

Antti Luoto, Kari Systä, Otto Hylli and Ville Heikkilä

Abstract: Smart cities are getting more and more attention due to urbanization and IoT trends. At the same time, 360-degree videos are also getting more popular. The watchers of 360-degree videos provide a data source that fit to the data collection aim of smart cities. This paper explores how well 360-degree video user data can be collected, using MQTT as a data transfer protocol, and analyzed with an open source IoT platform. The results suggest that using MQTT with the chosen IoT platform is convenient and general chart visualizations can provide useful insight about 360-degree video watchers. The used research method is design science.

Paper Nr: 7
Title:

Comparing Reliability Mechanisms for Secure Web Servers: Comparing Actors, Exceptions and Futures in Scala

Authors:

Danail Penev and Phil Trinder

Abstract: Modern web applications must be secure, and use authentication and authorisation for verifying the identity and the permissions of users. Programming language reliability mechanisms commonly implement web application security and include exceptions, actors and futures. This paper compares the performance and programmability of these three reliability mechanisms for secure web applications on the popular Scala/Akka platform. Key performance metrics are throughput and latency for workloads comprising successful, unsuccessful and mixed requests across increasing levels of concurrent connections. We find that all reliability mechanisms fail fast: unsuccessful requests have low mean latency (1-2ms) but dramatically reduce throughput: by more than 100x. For a realistic authentication workloads exceptions have the highest throughput (187K req/s) and the lowest mean latency (around 5ms), followed by futures. Our programmability study focuses on the available attack surface measured as code blocks in the web application implementation. For authentication and authorisation actors have the smallest number of code blocks for both our benchmark (3) and a sequence of n security checks (n + 1). Both futures and exceptions have 4 (2n) code blocks. We conclude that Actors minimise programming complexity and hence attack surface.

Paper Nr: 36
Title:

Alternative Approaches for Supporting Lattice-based Access Control (LBAC) in the Fast Healthcare Interoperability Resources (FHIR) Standard

Authors:

Steven Demurjian, Thomas Agresta, Eugene Sanzi and John DeStefano

Abstract: A major challenge in the healthcare industry is the selective availability, at a fine-grained level of detail, of a patient’s data to the various clinicians, nurses, specialists, home health aides, family members, etc. where the decision of who can see which information at which times is controlled by a patient. The information includes: contact and demographics, current conditions, medications, test results, past medical history, history of substance abuse and treatment, mental health information, sexual health information, records relating to domestic violence, reproductive health records, and genetic information. To control sensitivity, multi-level security (MLS) using lattice-based access control (LBAC) can be used to extend the traditional linear sensitivity levels of mandatory access control with the ability to define a complex lattice of sensitivity categorizations suitable for the wide variety of the aforementioned information types. This paper applies and extends our prior work on multi-level security for healthcare using LBAC by exploring alternative approaches to integrate this approach into the Fast Healthcare Interoperability Resources (FHIR) standard at the specification level of the standard.

Area 3 - Mobile Systems

Full Papers
Paper Nr: 19
Title:

Traffic Flow Modelling for Pollution Awareness: The TRAFAIR Experience in the City of Zaragoza

Authors:

Sergio Ilarri, David Sáez and Raquel Trillo-Lado

Abstract: Performing a suitable traffic monitoring is a key issue for a smart city, as it can enable better decision making by both citizens and public administrations. For example, a city council can exploit the collected traffic data for traffic management (e.g., to define suitable traffic policies along the city, such as restricting the circulation of traffic in certain areas). Similarly, citizens could use those data to take appropriate mobility decisions. To measure traffic, a variety of detection methods can be used, but their widespread deployment through the whole city is expensive and difficult to maintain. Therefore, alternative approaches are required, that should allow estimating traffic in any area of the city based only on a few real traffic measurements. In this paper, we describe our approach for traffic flow modelling in the city of Zaragoza, which we are currently applying in the European project “TRAFAIR – Understanding Traffic Flows to Improve Air quality”. The TRAFAIR project aims at the development of a platform to estimate the air quality in different areas of a city, and for this purpose traffic data plays a major role. Specifically, we have adopted an approach that combines historical real traffic measurements with the use of the traffic simulator SUMO on top of real roadmaps of the city and applies a trajectory generation strategy that complements the functionalities of SUMO (e.g., SUMO’s calibrators). An experimental evaluation compares several simulation alternatives and shows the benefits of the chosen approach.

Paper Nr: 28
Title:

Completeness Issues in Mobile Crowd-sensing Environments

Authors:

Souheir Mehanna, Zoubida Kedad and Mohamed Chachoua

Abstract: Mobile sensors are being widely used to monitor air quality to quantify human exposure to air pollution. These sensors are prone to malfunctions, resulting in many data quality issues, which in turn impacts the reliability of analytical studies. In this work, we address the problem of data quality evaluation in mobile crowd-sensing environments, and we focus on data completeness. We introduce a multi-dimensional model to represent the data coming from the sensors in this context and we discuss different facets of data completeness. We propose quality indicators capturing different facets of completeness along with the corresponding quality metrics. We provide some experiments showing the usefulness of our proposal.

Short Papers
Paper Nr: 35
Title:

Exploring Voice Assistant Risks and Potential with Technology-based Users

Authors:

Andreas M. Klein, Andreas Hinderks, Maria Rauschenberger and Jörg Thomaschewski

Abstract: Voice user interfaces (VUIs) or voice assistants (VAs) such as Google Home or Google Assistant (Google), Cortana (Mircosoft), Siri (Apple) or Alexa (Amazon) are highly available in the consumer sector and present a smart home trend. Still, the acceptance seems to be culture-dependent, while the syntax of communication poses a challenge. So, there are some basic questions: ‘Why do people buy VAs?’ ‘What do they use them for?’ ‘What could be improved in the future?’. We explore the opinion of a German technology-based user group to identify the challenges and opportunities of VAs. We focus on the interaction behaviour, frequency of use, concerns, and opinions of this target group as they show a higher variety of interaction as well as privacy concerns in representative population studies. Our preliminary findings confirm previous results (missing accuracy of commands and serious concerns about privacy issues) and show that technology-based users from Germany are intensive users, although with particular concerns about data collection. Probably, there is a correlation between privacy concerns and speech intelligibility as queries relating to VAs are problematic due to repetitions and refinement.

Paper Nr: 33
Title:

Mobile Devices and Systems in ADHD Treatment

Authors:

Renato B. Alves, Mônica Ferreira da Silva, Eber A. Schmitz and Antonio J. Alencar

Abstract: Attention Deficit Hyperactivity Disorder (ADHD) is a neurobiological condition that appears during an individual’s childhood and may follow her/him for life. Even though it is not a new disorder, ADHD treatment is limited to the use of drugs and behavioral therapy, even for children. The objective of this research was to investigate the technological possibilities of mobile devices and web-based information systems, as well as other computer technologies, to support the ADHD treatment phase. Results show the potential of these approaches as alternatives for long-term treatment, as well as the difficulties and limitations that persist today. Besides, the research also highlighted that the use of computer technology could provide persistent long-term results.

Area 4 - Semantic Web

Full Papers
Paper Nr: 30
Title:

On the Fly SPARQL Execution for Structured Non-RDF Web APIs

Authors:

Torsten Spieldenner

Abstract: The concept of the Semantic Web, built around the idea of semantically described Linked Data and the data model of the Resource Description Framework (RDF), has become a prominent idea of seamless access to, and integration of, data. The number of tools to translate from non-RDF to RDF-representation of data has since then been ever increasing. However, to this day, numerous Web APIs do offer time critical data only in non-RDF-formats. Examples for this are traffic and public transport live data. Due to its nature, an offline data dump, as mostly generated by RDF lifting translation tools, is not practical, as it becomes inconsistent with the original data quickly. In this paper, we for this present an approach, that, published as a microservice, allows to send semantic queries against the legacy Web interfaces directly, and return the result in RDF. The service API follows the SPARQL 1.1 Query API specification, and also supports federated queries over distributed endpoints, allowing an easy and accessible way for semantically enriched data integration over legacy endpoints.

Paper Nr: 38
Title:

Soft Querying GeoJSON Documents within the J-CO Framework

Authors:

Giuseppe Psaila, Stefania Marrara and Paolo Fosci

Abstract: GeoJSON documents have become important sources of information over the Web, because they describe geographical information layers. Supposing to have such documents stored in some JSON store, the problem of querying them in a flexible and easy way arises. In this paper, we propose a soft-querying model to easily express queries on features (i.e., data items) within GeoJSON documents, based on linguistic predicates. These are fuzzy predicates that evaluate the membership degree to fuzzy sets; this way, imprecise conditions can be expressed and features can be ranked, accordingly. The paper presents a rewriting technique that translates soft queries on GeoJSON documents into fuzzy J-CO-QL queries: this is the query language of the J-CO Framework, an Internet-based framework able to get, manipulate and save collections of JSON documents in a way totally independent of the source JSON store.

Short Papers
Paper Nr: 26
Title:

Semantic Web Applications for Danish News Media

Authors:

Astrid Ildor

Abstract: Most news media possess a publish-and-forget mindset: Once a news article is published, the information it contains devalue in the messy place of the unstructured Web and great potentials of combining and reusing data is missed. News media has long constituted an area of interest for Semantic Web researchers, but few studies merge technical knowledge with editorial insights. To fill the gap in literature, this study combines technical analysis with interviews and Participatory Design studies with eight Danish news journalists and digital editors. The exploration reveals three areas within the journalistic work process with significant potential of improvement: Journalists’ challenge of finding the right person to comment on a specific topic, issues of finding previously published articles, and the need for generating infoboxes. Each area is examined as a type of Semantic Web application. It is demonstrated how profound annotation of persons, places, organisations, and key terms mentioned in a body of articles is required for each application. Trustworthiness is another major challenge as this cannot yet be fully achieved within the concept of Semantic Web.

Paper Nr: 32
Title:

Ontology-quality Evaluation Methodology for Enhancing Semantic Searches and Recommendations: A Case Study

Authors:

Paula Peña, Raquel Trillo-Lado, Rafael D. Hoyo, María C. Rodríguez-Hernández and David Abadía

Abstract: In the big data era, there exist an increasing demand of models and tools to evaluate quality of data used in decision-making and search processes, as decision based on wrong and poor data quality can lead to enormous loss. Thus, data has become an asset and the most powerful enabler of any organization. In this context, ontologies and semantic techniques have gained importance in order to represent data sources and metadata during the last decades. In this paper, we describe our work-in-progress concerning to the generation of models that encourage data quality through the use of ontologies. In particular, we present a use case where an enriched ontological model of ESCO (European Skills, Competences, Qualifications and Occupations) is used to improve the effectiveness of a search and recommendation system. In more detail, we focus on how ESCO is enriched by following METHONTOLOGY methodology and 101 methodological guidelines. We also provide the design of a search and recommendation system oriented to labour market that exploits the enhanced ontology to suggest qualifications required by job seekers and employees to reach a specific occupation position and different training itineraries to get those recommended qualifications.

Paper Nr: 41
Title:

Detecting Unsatisfiable Pattern Queries under Shape Expression Schema

Authors:

Shiori Matsuoka and Nobutaka Suzuki

Abstract: Among queries for RDF/graph data, pattern query is the most popular and important one. A pattern query that returns empty answer for every valid graph is clearly useless, and such a query is called unsatisfiable. Formally, we say that a pattern query q is unsatisfiable under a schema S if there is no valid graph g of S such that the result of q over g is nonempty. It is desirable that unsatisfiable pattern queries can be detected efficiently before being executed since unsatisfiable query may require much execution time but always reports empty answer. In this paper, we focus on Shape Expression (ShEx) as schema, and we propose an algorithm for detecting unsatisfiable pattern queries under a given ShEx schema. Experimental results suggest that our algorithm can determine the satisfiability of pattern query efficiently.

Paper Nr: 42
Title:

Transforming Property Path Query According to Shape Expression Schema Update

Authors:

Goki Akazawa, Naoto Matsubara and Nobutaka Suzuki

Abstract: Suppose that we have a query q under schema S and then S is updated. Then we have to update q according to the update of S, since otherwise q no longer reports correct answer. However, updating q manually is often a difficult and time-consuming task since users do not fully understand the schema definition or are not aware of the details of schema update. In this paper, we consider transforming queries automatically according to schema update. We focus on Shape Expression (ShEx) and Property Path as schema and query language, respectively, and we take a structural approach to transform Property Path query. For a Property Path query q and a schema update op to an ShEx schema S, our algorithm checks how op affects the structure of q under S, and transforms q according to the result.

Area 5 - Social Networking

Full Papers
Paper Nr: 18
Title:

A Curious Case of Meme Detection: An Investigative Study

Authors:

Chhavi Sharma and Viswanath Pulabaigari

Abstract: In recent times internet ”memes” have led the social media-based communications from the front. Specifically, the more viral memes tend to be, higher is the likelihood of them leading to a social movement, that has significant polarizing potential. Online hate-speeches are typically studied from a textual perspective, whereas memes being a combination of images and texts have been a very recent challenge that is beginning to be acknowledged. Our paper primarily focuses on the meme vs. non-meme classification, to address the crucial primary step towards studying memes. To characterize a meme, metric based empirical analysis is performed, and a system is built for classifying images as meme/non-meme using visual and textual features. An exhaustive set of experimentation to evaluate conventional image processing techniques towards extracting low-level descriptors from an image is performed, which suggests the effectiveness of Haar wavelet transform based feature extraction. Further study establishes the importance of both graphic and linguistic content within a meme, towards their characterization and detection. Along-with the deduction of an optimal F-1 score for meme/non-meme classification, we also highlight the efficiency induced by our proposed approach, in comparison with other popular techniques. The insights gained in understanding the nature of memes through our systematic approach, could possibly help detect memes and flag the ones that are potentially disruptive in nature.

Paper Nr: 29
Title:

Knowledge-based Reliability Metrics for Social Media Accounts

Authors:

Nuno Guimaraes, Alvaro Figueira and Luis Torgo

Abstract: The growth of social media as an information medium without restrictive measures on the creation of new accounts led to the rise of malicious agents with the intend to diffuse unreliable information in the network, ultimately affecting the perception of users in important topics such as political and health issues. Although the problem is being tackled within the domain of bot detection, the impact of studies in this area is still limited due to 1) not all accounts that spread unreliable content are bots, 2) human-operated accounts are also responsible for the diffusion of unreliable information and 3) bot accounts are not always malicious (e.g. news aggregators). Also, most of these methods are based on supervised models that required annotated data and updates to maintain their performance through time. In this work, we build a framework and develop knowledge-based metrics to complement the current research in bot detection and characterize the impact and behavior of a Twitter account, independently of the way it is operated (human or bot). We proceed to analyze a sample of the accounts using the metrics proposed and evaluate the necessity of these metrics by comparing them with the scores from a bot detection system. The results show that the metrics can characterize different degrees of unreliable accounts, from unreliable bot accounts with a high number of followers to human-operated accounts that also spread unreliable content (but with less impact on the network). Furthermore, evaluating a sample of the accounts with a bot detection system shown that bots compose around 11% of the sample of unreliable accounts extracted and that the bot score is not correlated with the proposed metrics. In addition, the accounts that achieve the highest values in our metrics present different characteristics than the ones that achieve the highest bot score. This provides evidence on the usefulness of our metrics in the evaluation of unreliable accounts in social networks.

Short Papers
Paper Nr: 44
Title:

Meme vs. Non-meme Classification using Visuo-linguistic Association

Authors:

Chhavi Sharma, Viswanath Pulabaigari and Amitava Das

Abstract: Building on the foundation of consolidating humor with social relevance, internet memes have become an imperative communication tool of the modern era. Memes percolate through the dynamic ecosystem of the social network, influencing and changing the social order along the way. As a result, the status quo of the social balance changes significantly, and at times channelized in unwanted directions. Besides flagging harmful memes, detecting them amongst the disparate multi-modal online content is of crucial importance, which has remained understudied. As an effort to characterize internet memes, we attempt to classify meme vs non- meme, by leveraging techniques like Siamese network and canonical correlation analysis (CCA), towards capturing the feature association between the visual and textual components of a meme. The experiments are observed to yield impressive performance, and could further provide insights for applications like meme content moderation over social media.

Area 6 - Web infrastructures, Architectures and Platforms

Short Papers
Paper Nr: 13
Title:

MISTRuST: accoMmodatIon Short Term Rental Scanning Tool

Authors:

Iván Ruiz-Rube, Inmaculada Arnedillo-Sánchez and Antonio Balderas

Abstract: The global irruption of ‘shared-accommodation platforms’ has ignited debate regarding the implications of the unprecedented growth of the short-term rental market. While some argue that they have generated new business, work and wealth others, highlight their wider societal, economic and legal effects. Short-term rental removes long-term housing from the market, forces rent prices up, saturates areas with tourism, generates safety and liability concerns and by and large, it is awash with likely illegal listings. This paper presents MISTRuST (accoMmodatIon Short Term Rental Scanning Tool) a computational intelligence based system aimed at uncovering whether a property is being listed in short-term rental (STR) platforms. It enables users to monitor property by scheduling automatic searches and checking listings returned by the system against the target property. The asynchronous pipeline architecture involves three stages: data ingestion, data enrichment and data matching. Preliminary test results with a set of listings are encouraging. However, further evaluation is needed to improve the accuracy of the system on a larger scale. It is hoped MISTRuST will help stakeholders such as property owners, state and agencies and others tackle the growing concern over unlawful STRs and contribute towards sustainable solutions.

Paper Nr: 1
Title:

Life Cycle of Software Development Design in European Structured Economic Reports

Authors:

Ignacio Santos, Elena Castro, Dolores Cuadra and Harith Aljumaily

Abstract: This proposal presents the complete life cycle of software development for semantic economic reports using the MDA paradigm. A panoramic view of the development of these reports using the MDM and the DPM in Europe is shown. Stock market, financial institutions and others are using these reports. Companies, organizations and agencies need to exchange accounting reports. A very high percentage of reports are published and transmitted through the internet. These reports are structured and semantic. In general, the XBRL specification, based on XML, is used as a de facto standard. This research work examines the evolution of this design and analyses the Conceptual Model in detail. Regulators through different Central Banks and European Agencies have established a modelling tool in the context of the European Union (EU), the DPM, which is a European standard. Moreover, a minimum set of consistent definitions and rules based on the MDM using the MDA will be proposed. This paper will analyse the DPM methodology. Finally, it is hoped that this study will help to make the design of reports easier.

Area 7 - Web Interfaces

Full Papers
Paper Nr: 22
Title:

What Web Users Copy to the Clipboard on a Website: A Case Study

Authors:

Ilan Kirsh

Abstract: The clipboard is a central tool in human-computer interaction. It is difficult to imagine a productive day-to-day interaction with computers, tablets, and smartphones, without copy and paste functionalities. This study analyzes real usage data from a commercial website in order to understand what types of textual content users copy from the website, for what purposes, and what can we use such user activity data for. This paper advocates treating clipboard copy operations as a bidirectional human-computer dialogue, in which the computer can gain knowledge about the users, their preferences, and their needs. Copy operations data may be useful in various applications. For example, users may copy to the clipboard words that make the text difficult to understand, in order to search for more information on the internet. Accordingly, word copying on a website may be used as an indicator in Complex Word Identification (CWI) and help in text simplification. Users may copy key sentences in order to use them in summaries or as citations, and accordingly, the frequency of copying full sentences by web users could be used as an indicator in text summarization. Ten different potential uses of copy operations data are described and discussed in this paper. These proposed uses and applications span over a wide range of areas, including web analytics, web personalization, adaptive websites, text simplification, text summarization, detection of plagiarism, and search engine optimization.

Short Papers
Paper Nr: 45
Title:

Rapid Development of a Low-cost Web-based 360 Virtual Tour

Authors:

Maria Insa-Iglesias, Mark D. Jenkins and Gordon Morison

Abstract: The use of 360-degrees Virtual Tour (VT) is a common practice in the education and tourism sector. It has recently gained popularity given the benefits of bringing physical spaces into a 360-degrees experience that can be explored in a simple and intuitive manner. Due to Covid-19, many Organisations wish to utilise immersive 360 technologies but many cannot afford it. In this paper, a 360-degree VT pipeline is proposed to allow Organisations to develop a VT that can compete in functionality with other sophisticated VT. Users with minimal coding experience are able to develop a low-cost web-based VT using a 360 camera, the Open Source tool Marzipano, the developed software framework and documentation from the GitHub repository. The contribution of this work is both the software framework, which complies with the Web Content Accessibility Guidelines (WCAG), for use with the Marzipano tool and a University case study along with a user evaluation to demonstrate the effectiveness of the approach. The usability evaluation run with the stakeholders demonstrates the acceptance of this 360 experience to allow new students to get a 360-degrees view of the GCU Glasgow Campus.