Forschungskolloquium Data and Knowledge Engineering

As part of this research colloquium, current research work in the field of data and knowledge engineering (DKE) will be presented. The colloquium usually takes place on Thursdays from 1:00 pm in room G29-301/or G29-130
Please address questions about the colloquium to Andreas Nürnberger .


Current lectures:

Dec 14, 2022 (10:00 am cet in room G29-130)
Potentials and Limitations of observational population-based studies
Dr. Till Ittermann, (Head of the Statistical Method Unit, Institute for Community Medicine, University Medicine Greifswald)

There is a broad range of medical research questions which can be addressed by population-based studies including the description of prevalence and incidence of diseases and risk factors, the definition of reference intervals for clinical biomarkers, the investigation of associations between potential (genetical) risk factors and diseases, the calculation and validation of prediction models for certain diseases, and data mining analyses. Limitations, which has to be taken into account, derive from selection bias, confounding bias and information bias. This talk will give a summary on the potentials and limitations of population-based studies using examples from the Study of Health in Pomerania.


Past Lectures: 

Oct 21, 2019 (3:00 pm st in room G29-301)
Headline/Summary Automated Evaluation - Challenges, SotA and HEvAS System
Dr. Marina Litvak (Sami Shamoon College of Engineering, Beer Sheva, Israel) 

Automatic headline generation is a sub-task of one-line summarization with many reported applications. Evaluation of systems generating headlines is a very challenging and undeveloped area. In this talk, I will introduce multiple metrics for automatic evaluation of systems in terms of the quality of the generated headlines. The metrics measure the headlines' quality both from the informativeness and the readability perspectives, where informativeness is evaluated at the lexical and semantic levels.


March 14, 2019 (13:00 st in room G29-130)
Interpretable feature learning and classification: from time series feature tweaking to temporal abstractions in medical records
Prof. Panagiotis Papapetrou (Faculty of Social Sciences, Stockholm University)

The first part of the talk will tackle the issue of interpretability and explainability of opaque machine learning models, with focus on time series classification. Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. This talk will formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, the objective is to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class.Moreover, it will be shown that the problem is NP-hard. Two instantiations of the problem will be presented. The second part of the talk will focus on temporal predictive models and methods for learning from sparse Electronic Health Records. The main application area is the detection of adverse drug events by exploiting temporal features and applying different levels of abstraction, without compromising predictive performance in terms of AUC.

Jan. 24, 2019 (1:00 pm st in room G29-301)
Data driven innovation - research challenges and opportunities
Prof. Dr. Barbara Dinter (business informatics, Chemnitz University of Technology)

Modern big data & analytics technologies and methods lead to manifold opportunities for innovative use cases and business models. Although organizations have started to establish appropriate technical and organizational infrastructures (eg big data labs), they still need advice how to benefit best from such investments in particular, if the big data activities should not only result in the optimization of existing applications and processes, but in true data-driven innovation. The talk will provide an overview of how the fields of big data & analytics and of innovation management converge, resulting in many challenging research questions. Following a framework with origins in the Service Dominant Logic, the potential mutual usage and impact of both fields will be presented.The role of open data and of open innovation for data-driven innovation will be illustrated by a research project in the field of open innovation for e-mobility. In addition, recent research on how to teach data driven innovation will be presented.

Jan 18, 2019 (10:00 am st in room G29-301)
From Ontology Development as Craft towards Ontology Engineering
Dr. Fabian Neuhaus (Institute for Intelligent Cooperating Systems, FIN, OVGU)

Ontologies have been successfully in use for at least 20 years. Nevertheless, the development of ontologies is still a cumbersome and expensive process. In my presentation I will address three challenges for ontology developers: (1) It is difficult to reuse ontologies and adapt them for new purposes. (2) A plethora of representation languages ​​​​​​​​leads to difficult choices and interoperability issues. (3) Ontology developers rarely evaluate their ontologies during development time against requirements. These challenges are addressed by the Distributed Ontology, Modelling, and Specification Language (DOL), which is developed and implemented at the OvGU and has become an international standard at the Object Management Group (OMG) in 2018.

Jan 07, 2019 (3:00 pm st in room G29-301)
How to Break an API: How Community Values ​​​​​​Influence Practices
Prof. Dr. Christian Kästner (Carnegie Mellon University, Institute for Software Research)

Breaking the API of a package can create severe disruptions downstream, but package maintainers have flexibility in whether and how to perform a change. Through interviews and a survey, we found that developers within a community or platform often share cohesive practices (eg, semver, backporting, synchronized releases), but that those practices differ from community to community, and that most developers are not aware of alternative strategies and practices, their tradeoffs, and why other communities adopt them. Most interestingly, it seems that often practices and community consensus seems to be driven by implicit values ​​​​in each community, such as stability, rapid access, or ease to contribute. Understanding and discussing values ​​​​openly can help to understand and resolve conflicts,

Nov. 15, 2018 (1:00 p.m. st in room G22, 2nd floor, Faculty Center FWW)
Social Media Analytics - New Potentials and Challenges for Research and Practice
Prof. Dr. Stefan Stieglitz (Univ. Duisburg-Essen, communication. in electronic media / social media)

Researchers as well as companies collect and analyze social media communication for various reasons. Eg to understand general patterns of interaction but also to identify potential customers or to offer new services. A variety of methods are used to structure and visualize these heterogeneous data. By conducting a systematic literature we identified the major challenges in the context of social media analytics. Based on two case studies (one on crisis communication in social media and one on social bots) it will be highlighted why 'dynamics of communication' and the 'quality of data' need to be carefully considered for meaningful analyzes of social media communication.

The colloquium takes place in cooperation with the Faculty of Economics and Management (FWW).

Note: The slides will be in English, the presentation itself will be given in German.

02 Oct 2018 (3:00 p.m. st in room G29-035, SwarmLab)
Development of evolutionary computation methods for multi-objective design optimization and decision-making
Prof. Dr. Hemant Singh (The University of New South Wales, Canberra, Australia)

Simultaneous optimization of multiple conflicting criteria is a problem commonly encountered in several disciplines, such as engineering, operations research and finance. The solution to such problems consists of not one but a set of best  trade-off designs in the objective space, known as the Pareto Optimal Front (POF). Metaheuristics such as Evolutionary algorithms (EAs) are commonly used to solve these problems owing to several advantages, including parallelizability, global nature of search and ability to deal with highly non-linear/black-box functions. However, in their native form, EAs require large numbers of function evaluations to deliver good results, which becomes prohibitive if each design evaluation is done using a computationally expensive experiment (such as Finite Element Analysis, Computational Fluid Dynamics, etc.). This has motivated a number of past and ongoing studies towards developing strategies for reducing the number of design evaluations during the search. This talk discusses some of the recent efforts undertaken by the speaker with his research group in overcoming this challenge using spatially distributed surrogates and decomposition-based methods. Thereafter, mechanisms to support informed decision making (ie selecting the solutions or regions of interest from the POF) will also be discussed. A brief snapshot of some practical applications will also be presented.

10 Sep 2018 (2:00 p.m. st in room G29-301)
Modeling Attention for Post-Desktop User Interfaces
Dr. Felix Putze (Senior Researcher @ Cognitive Systems Lab, University of Bremen)

In recent years, many “post-desktop” user interfaces have emerged, for example the already omnipresent smart phones and smart watches, but also interfaces for Virtual and Augmented Reality. This paradigm shift results in a trend towards mobile and concurrent use of technology, with frequent side effects such as distraction and information overload. By employing biosignal-based user modeling, we can provide information sources to detect and respond to such effects. In this talk, I will focus on different biosignal-based models of attention as one of the central user states, for example to manage the amount and type of information presented as well as for understanding a user's implicitly communicated intent. I will show the results of multiple studies in which we monitor brain activity, eye gaze,

7 Sep 2018 (10:00 am st in room G29-301)
Home Caring Robot and Its Key Technologies
Prof. Dr. Hon Chi Tin (Macau University of Science and Technology)

Robot has wide applications in elderly caring scenario, from lifting robot, social robot to companion robot. As an individual robot to accompany with an elderly people, there are several key technologies behind, namely, obstacle avoidance, behavioral pattern detection, fall detection, natural language processing, remote diagnosis and the like. The research team from Macau University of Science and Technology has developed a robot Singou Butler with the above key technologies. The talk will take Singou Butler as an example to discuss one by one.

Feb 15, 2018 (08:30 st in room G29-301)
Evolution of machine learning - the way from neural networks to deep learning
Prof. Dr. Ali Reza Samanpour (South Westphalia University of Applied Sciences, Department of Engineering and Economics)

The history of Artificial Intelligence suggests that there has been a gradual and evolutionary development of a specific part of computational science underlying machine learning technologies that has not been defined by this perception/conception.

The bulk of these technologies consisted of the methods defined by what is known as computational intelligence, which includes neural networks, evolutionary algorithms, and fuzzy systems. The more data mining topics have emerged, influenced by the rapidly growing data (Big Data), combined with the same challenges of the Internet of Things (IoT), one can observe that the economic system is changing accordingly. Nowadays you can find a number of vendors offering machine learning frameworks. Some of them enable the use of machine learning tools in the cloud. This possibility is mainly given by the big players like Microsoft Azure ML, Amazon Machine Learning, IBM Bluemix and Google Prediction API just to name a few.

Machine learning algorithms extract complex, high-level abstractions as data representations through a hierarchical learning process. Based on relatively simple abstractions formulated at the previous level in the hierarchy, complex abstractions are learned at a given level. Deep learning is a sub-area of ​​machine learning, but could also be described as a further development of the classic artificial neural networks. While traditional machine learning algorithms rely on fixed sets of models for detection and classification, deep learning algorithms independently evolve, guide, or create their own new model layers within the neural networks. This does not have to be developed and implemented manually again and again for new circumstances, as would be the case with classic machine learning algorithms. The advantage of deep learning lies in the analysis and learning of large amounts of data. This makes it a valuable tool for data analytics in the context of raw data that is largely unlabeled and uncategorized.

In other words, how can computers be made to do what needs to be done without being told how it should be done?

Feb. 8, 2018 (12:00 p.m. st in room G29-301)
SAP Health: Applications and Analytics
Dr.-Ing. Matthias Steinbrecher (SAP, Potsdam)

This talk will cover cohort analysis applications and projects of the SAP Health organization. Cohort analysis is about finding and analyzing patient groups for research or therapy. The use cases will cover existing products like SAP Medical Research Insights, upcoming releases like SAP Health for Clinical Quality as well as research topics around pattern visualization in medical records.

November 9, 2017 (1:00 p.m. st in room G29-301)
Cohort analysis made visual on explorative methods for medical research
Dr.-Ing. Thorsten May (Fraunhofer IGD, Darmstadt)

My talk will focus on the present, future, and past of medical visual analytics research at Fraunhofer IGD (in roughly that order). I will present two current examples from our projects on patient cohort analysis. Cohort analysis aims at defining subsets of patients that are comparable by virtue of properties that are relevant for prevention, diagnosis, or therapy. Visual Analytics research for cohort analysis aims at making this process visible and navigable for the medical researchers. Ideally, the visual cohort analysis enables the physician to embed her own knowledge into the cohort definition. We expect future research to extend the basis for cohort analysis beyond clinical, demographic, and follow-up data. Imaging-based approaches (MRI, CT, U/S, ... ) represent rich input that can be used for a more comprehensive analysis of the patients' situation. My talk will outline a number of challenges that remain to be solved. Our research line evolved from research on general multivariate visual data analysis and time-series analysis that started some 12 years ago. Hence, this talk concludes with the “tale of two arrows”, briefly reflecting on struggles to understand and explain what visual analytics actually is, beyond Keim's process model (with the arrows), and to structure our own lectures according to this understanding.

July 03, 2017 (1:15 pm st in room G29-301)
Theory and Practice of Big Data Analytics for Railway Transportation Systems
Assoc. Prof Luca Oneto (University of Genoa, Italy)

Big Data Analytics is one of the current trending research interests in many industrial sectors and in particular in the context of railway transportation systems. Indeed, many aspects of the railway world can greatly benefit from new technologies and methodologies able to collect, store, process, analyze and visualize large amounts of data as well as new methodologies coming from machine learning, artificial intelligence, and computational intelligence to analyze that data in order to extract actionable information. The EC H2020 In2Rail project is the perfect example of an initiative made to bring the big data technologies into the railway world. The purpose of this talk is to show how theory and practice must be exploited together in order to solve real big data analytics problems in the field of railway transportation systems. in particular, we will focus on one of the problems that we are facing in the In2Rail project: predicting the train delays in the Italian railway network by exploiting both data coming from Rete Ferroviaria Italiana and exogenous data sources. For this purpose, we will make use of the most recent advances in the analytics field of research: from the deep learning to the thresholdout model selection framework.

May 12, 2017 (10:00 am st in room G29-301)
Big Data Visualization: Graphics quality factors
Prof. Dr. Juan J. Cuadrado Gallego (Universidad de Alcalá, Spain)

Nowadays Big Data is used in almost all the fields of human knowledge. The main goal of Big Data is to analyze big databases to find useful information that expand the knowledge in the field that it is applied. In addition, the reason to get knowledge is to share it. Moreover, it is in these two points when data visualization is having a bigger role each day. Data visualization can help to analyze the data faster, and can help to share the acquired knowledge more easily. For the reasons many and new data graphics are used and published everyday. But, all of them provide the reasons for which are used? That is, all them allow to have a easy and faster analysis of the databases and a easy and faster transmission of the information/knowledge obtained from the big databases analysis? The answer is no. And the reason is that is not enough use graphics to improve big data analysis. The user must know when to use data visualization and how to use data visualization. It is not enough to know how must be developed a graphic but that must be know which design aspects must be applied to make a graphic useful. This talk introduces the quality aspects that must be applied to obtain not only data visualization but higher quality data visualization.

11. Mai 2017 (11:00 Uhr s.t. in Raum G29-301)
Three Algorithms Inspired by Data from the Life Sciences
Dr. Allan Tucker (Brunel University London)

In this talk I will discuss how the analysis of real-world data from health and the environment can shape novel algorithms. Firstly, I will discuss some of our work on modelling clinical data. In particular I will discuss the collection of longitudinal data and how this creates challenges for diagnosis and the modelling of disease progression. I will then discuss how cross-sectional studies offer additional useful information that can be used to model disease diversity within a population but lack valuable temporal information. Finally, I will discuss the importance of inferring models that generalise well to new independent data and how this can sometimes lead to new challenges, where the same variables can represent subtly different phenomena. Some examples in ecology and genomics will be described.

10. Mai 2017 (17:00 Uhr s.t. in Raum G29-301)
Multiobjective Clustering
Prof. Dr. Sanghamitra Bandyopadhyay (Indian Statistical Institute, Kolkata)

When the only data that is available is unlabelled, clustering is one of the primary operations applied. The objective is to group those data points that are similar to each other, while clearly separating dissimilar groups from each other. In clustering, usually some similarity/dissimilarity metric is optimized such that a pre-defined objective attains its optimal value. The problem of clustering is therefore essentially one of optimization. The use of metaheuristic methods like genetic algorithms has been demonstrated successfully in the past for clustering a data set. The clustering problem inherently admits a number of criteria or cluster validity indices that have to be simultaneously optimized for obtaining improved results. Hence in recent times the problem has been posed in a multiobjective optimization (MOO) framework and popular metaheuristics for multiobjective optimization have been applied. In this talk, we will first briefly discuss about the fuzzy c-means algorithm, followed by an introduction to the basic principles of MOO and the popular NSGA-II algorithm. Subsequently it will be shown how the algorithm is useful for solving the clustering problem. Since such algorithms provide a number of solutions, a way of combining the multiple clustering solutions so obtained into a single one using supervised learning will be explained. Finally, results will be demonstrated on clustering of some popular gene expression data sets.

19.01.2017 (13:00 Uhr s.t. in Raum G29-301)
Random Shapelet Forests for time series classification
Prof. Panagiotis Papapetrou (Stockholm University)

In this talk I will present a novel technique for time series classification called random shapelet forest. Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases.

In the first part of the talk I will discuss a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.

The second part of the talk will focus on early classification of time series. I will present a novel technique that extends the random shapelet forest to allow for early classification of time series. An extensive empirical investigation has shown that the proposed algorithm is superior to alternative state-of-the-art approaches, in case predictive performance is considered to be more important than earliness. The algorithm allows for tuning the trade-off between accuracy and earliness, thereby supporting the generation of early classifiers that can be dynamically adapted to specific needs at low computational cost.

15.12.2016 (13:00 Uhr s.t. in Raum G29-301)
Handling Time-Series Data with Visual Analytics: Challenges and Examples
Dr. Theresia Gschwandtner (TU Wien)

Due to the ever growing amounts of available data we need effective ways to make these often complex and heterogeneous data accessible and analyzable. The aim of Visual Analytics (VA) is to support this information discovery process by combining humans’ outstanding capabilities of visual perception with the computational power of computers. By providing interactive visualizations for the visual exploration of trends, patterns and relationships, with automatic methods, such as machine learning and data mining, VA enables knowledge discovery in large and complex bodies of data. The design of such VA solutions, however, requires careful consideration of the data and tasks at hand, as well as the knowledge and capabilities of the user who is going to work with the solution. Dealing with time-oriented data makes this task even more complex as time is an exceptional data dimension with special characteristics. In my talk, I will illustrate different aspects and characteristics of time-oriented data and how we tackled these problems in previous work with respect to data, users, and tasks. I will give several examples of VA solutions developed in our group and I will put a particular focus on examples from the healthcare domain.

07.07.2016 (15:00 s.t. in Raum G29-301)
Aktives Lernen für Klassifikationsprobleme unter der Nutzung von Strukturinformationen
Dr. rer. nat. Tobias Reitmaier (Universität Kassel)

Heutzutage werden mediale, kommerzielle und auch persönliche Inhalte immer mehr in der digitalen Welt konsumiert, ausgetauscht und somit gespeichert. Diese Daten versuchen IT-Unternehmen mittels Methoden des Data Mining oder des maschinellen Lernens verstärkt wirtschaftlich zu nutzen, wobei in der Regel eine zeit- und kostenintensive Kategorisierung bzw. Klassifikation dieser Daten stattfindet. Ein effizienter Ansatz, diese Kosten zu senken, ist aktives Lernen (AL), da AL den Trainingsprozess eines Klassifikators durch gezieltes Anfragen einzelner Datenpunkte steuert, die daraufhin durch Experten mit einer Klassenzugehörigkeit versehen werden. Jedoch zeigt eine Analyse aktueller Verfahren, dass AL nach wie vor Defizite aufweist. Insbesondere wird Strukturinformation, die durch die räumliche Anordnung der (un-)gelabelten Daten gegeben ist, unzureichend genutzt. Außerdem wird bei vielen bisherigen AL-Techniken noch zu wenig auf ihre praktische Einsatzfähigkeit geachtet. Um diesen Herausforderungen zu begegnen, werden in diesem Vortrag mehrere aufeinander aufbauende Lösungsansätze präsentiert: Zunächst wird mit probabilistischen, generativen Modellen die Struktur der Daten erfasst und die selbstadaptive, (fast) parameterfreie Selektionsstrategie 4DS (Distance-Density-Distribution-Diversity Sampling) entwickelt, die zur Musterauswahl Strukturinformation nutzt. Anschließend wird der AL-Prozess um einem transduktiven Lernprozess erweitert, um die Datenmodellierung während des Lernvorgangs anhand der bekanntwerdenden Klasseninformationen iterativ zu verfeinern. Darauf aufbauend wird für das AL-Training einer Support Vector Machine (SVM) der neue datenabhängige Kernel RWM (Responsibility Weighted Mahalanobis) definiert.

01.07.2016 (13:00 Uhr s.t. in Raum G29-301)
Gemeinsam gegen Kriminaldelikte: Wie die Kombination von Data Mining und Spieltheorie bei der Verbrechensbekämpfung helfen kann
Prof. Richard Weber (Department of Industrial Engineering, Universidad de Chile)

Methoden des Data Mining werden seit vielen Jahren erfolgreich zur Erkennung von Verbrechensmustern eingesetzt. Anwendungen gibt es beispielsweise in der Missbrauchserkennung (fraud detection), Vorhersage von Verbrechen im öffentlichen Bereich und in der cyber Kriminalität. Dabei werden in der Regel Daten ausgewertet, die das Verbrechen beschreiben. In vielen Fällen wird jedoch die Interaktion zwischen Kriminellen und den Verantwortlichen für Sicherheit nicht entsprechend berücksichtigt. In diesem Vortrag stellen wir ein hybrides Modell zur Klassifikation von Verbrechensmustern vor, das diese Interaktion explizit modelliert. Am Beispiel der Identifizierung von phishing emails wird ein Spiel zwischen „Angreifer“ und „Bewacher“ beschrieben, welches Eingangsinformationen für einen auf Support Vector Machines basierenden binären Klassifikator liefert. Anhand eines umfangreichen Datensatzes wird gezeigt, welche Vorteile das beschriebene hybride Modell bietet. Zahlreiche Ansätze für weiterführende Arbeiten deuten auf das Potenzial für zukünftige angewandte Forschung hin.

25.05.16 (14:15 in Raum 301)
Metro Maps: Straight-line, Curved, and Concentric
Prof.  Dr. Alexander Wolff (Universität Würzburg)

The first schematic metro maps appeared in the 1930's when the networks became too big to be readable in a geographically correct layout.  Only 70 years later, computer scientists started to investigate ways how to automate the drawing of metro maps.  In my talk, I will present a few of these approaches.

14.04.16 (13:00 in Raum 301)
Space, Time, and Visual Analytics
Prof. Natalia Andrienko, Prof. Gennady Andrienko (Fraunhofer IAIS and City University London)

Visual analytics aims to combine the strengths of human and computer data processing. Visualization, whereby humans and computers cooperate through graphics, is the means through which this is achieved. Sophisticated synergies are required for analyzing spatio-temporal data and solving spatio-temporal problems. It is necessary to take into account the specifics of the geographic space, time, and spatio-temporal data. While a wide variety of methods and tools are available, it is still hard to find guidelines for considering a data set systematically from multiple perspectives. To fill this gap, we systematically consider the structure of spatio-temporal data, possible transformations, and demonstrate several workflows of comprehensive analysis of different data sets, paying special attention to the investigation of data properties. We shall show several workflows of analysis of real data sets on human mobility, city traffic, animal movement, and football. We finish the talk by outlying directions for future research, including semantic level analysis and big data.

21.01.16 (13:15 in Raum 301)
Learning Shortest Paths for Text Summarisation
Prof. Dr. Ulf Brefeld (Leuphana Universität Lüneburg)

We cast multi-sentence compression as a structured prediction problem. Related sentences are represented by a word graph such that every path in the graph is considered a (more or less meaningful) summary of the collection. We propose to adapt shortest path algorithms to data at hand so that the shortest path realises the best possible summary. We report on empirical results and compare our approach to state-of-the-art baselines using word graphs. The proposed technique can be applied to a great variety of objectives that are traditionally solved by dynamic programming. I’ll conclude with a short discussion of learning knapsack-like problems using the same framework.

10.12.15 (13:00 in Raum G26.1-010)
Trajectories Through the Disease Process: Cross Sectional and Longitudinal Data Analysis
Dr. Allan Tucker (Brunel University London)

Degenerative diseases such as cancer, Parkinson’s disease, and glaucoma are characterised by a continuing deterioration to organs or tissues over time. This monotonic increase in severity of symptoms is not always straightforward however. The rate can vary in a single patient during the course of their disease so that sometimes rapid deterioration is observed and other times the symptoms of the sufferer may stabilise (or even improve - for example when medication is used). The characteristics of many degenerative diseases is however a general transition from healthy to early onset to advanced stages. Clinical trials are typically conducted over a population within a defined time period in order to illuminate certain characteristics of a health issue or disease process. These cross-sectional studies provide a snapshot of these disease processes over a large number of people but do not allow us to model the temporal nature of disease, which is essential for modelling detailed prognostic predictions. Longitudinal studies on the other hand, are used to explore how these processes develop over time in a number of people but can be expensive and time-consuming, and many studies only cover a relatively small window within the disease process. This talk explores the application of intelligent data analysis techniques for building reliable models of disease progression from both cross-sectional and longitudinal studies. The aim is to learn disease `trajectories' from cross-sectional data, integrating longitudinal data and taking into account the sometimes non-stationary nature of the disease process.

19.11.15 (13:15 in Raum G29-301)
Knowledge based Tax Fraud Fighting
Prof. Dr. Hans-Joachim Lenz (Freie Universität Berlin)

Tax Fraud is a criminal activity done by a manager of a firm or at least one tax payer who intentionally manipulates tax data to deprive the tax authorities or the government of money for his own benefit. Tax fraud is a kind of data fraud, and happens every time and every where in daily life. Data fraud is extensionally characterized by the four fields: Spy-out, data plagiarism, manipulation and fabrication. Tax fraud investigations can be embedded into the methodology of knowledge based reasoning. One way is to apply case based reasoning where similar stored cases are retrieved and their information re-used. Alternatively, we put the focus on the Bayesian Learning Theory as a step wise procedure integrating prior information, facts from first (and follow-up) investigations and partial or background information. There is and will be no omnibus test available to detect the underlying manipulations of (even double-entry) book keeping data in business with high precision. However, a bundle of techniques like probability distribution analysis methods, Benford’s Law application, inliers and outlier as well as tests of conformity between data and Business Key Indicators systems exist to give hints for tax fraud. Finally, investigators may be hopeful in the long run because betrayers never will be able to construct a perfect manipulated world of figures, cf. F. Wehrheim (2011).

11/12/15 (1:15 pm in room G29-301) Ethical
challenge for a mobility service provider when dealing with customer data in the digital age.
Karl Partle.  ( Head of the Institute for Computer Science of the Volkswagen Group )

As a branch of philosophy, information ethics is a field of ethics that examines moral issues in dealing with digitally available information in information and communication technologies. Challenges arise from the combination of ideas, some of which are older than 5000 years, with technologies that are less than 50 years old. Four theses:

  • It is not ethics as a philosophical discipline that is being changed, but ethical guidelines are to be reinterpreted in the course of digitalization.
  • All data tends to be available at any time and any place in a defined quality in real time (up-to-date).
  • The distinction between the real world (being) and the digital world (fictitious image of the real) is becoming increasingly blurred.
  • The secrecy of one's own data becomes arbitrarily difficult; however, this also applies to states, secret services and companies.

In view of this, questions arise about social responsibility and the use of consciously drawn moral barriers as a benchmark and measure of ethical foundations in the digital age. This is to be examined using the example of the challenges faced by a mobility service provider when dealing with customer data.

Letzte Änderung: 30.11.2022 - Ansprechpartner:

Sie können eine Nachricht versenden an: Webmaster