This is the Newsletter of the Centre for Data Science and Systems Complexity (DSSC) at the University of Groningen.

Centre for Data Science and Systems Complexity (DSSC), University of Groningen

DSSC Newsletter

MARCH 2017

DSSC awarded COFUND grant
DSSC research profile: Dr. Marco Grzegorczyk (JBI)
PhD profile: MSc. Henk van Waarde 
The course Introduction to Data Science: look ahead by looking back
Kick-Off Meeting Fundamentals of the Universe European Commission logo

Horizon 2020
The Dutch Data Science Awards

DSSC thematic meeting: "Progress of DSSC PhD projects" (27 March, 13.30 - 15.30, 5161.165)
2nd Annual Wind Power Big Data & IoT Forum (30-31 March, Amsterdam)
Digital Innovation Forum 2017 (10-11 May, Amsterdam)
BeNeLearn 2017, The annual machine learning conference of Belgium and The Netherlands (9-10 June, Eindhoven University of Technology)
International Supercomputing Conference (18-22 June, Frankfurt)
TERATEC 2017 Forum: The international meeting for Simulation and High Performance Computing (27-28 June, Ecole Polytechnique)

DSSC awarded COFUND grant

The DSSC is pleased to inform you that it was offered a Marie Sklodowska Curie COFUND grant. The COFUND actions are awarded to regional, national and international programmes for research training and career development that involve international mobility. With this grant DSSC Programme will recruit 10, 4-year long PhD positions (early-stage researchers) in the areas of Adaptive Models & Big Data, Complex Systems & Engineering, Advanced Instrumentation & Big Data.

The 10 PhD positions will be added to the group of 70+ researchers who collaborate at the DSSC to develop new methods that help solve the challenges associated with the combination of Big Data and Systems Complexity.

The research projects and the procedure for the selection of the 10 PhD positions will be discussed at the next DSSC thematic meeting on March 27, 13.30 - 15.30, 5161.165. The agenda for the meeting also includes research updates by current DSSC PhD students (MSc Oscar Portoles Marin, MSc Victor Arturo Bernal), and a presentation about CIT Data Scientists in research (by MSc Jonas Bulthuis and MSc Herbert Kruitbosch). 

DSSC research profile: Dr. Marco Grzegorczyk (JBI)

Photo M. GrzegorczykIn the topical field of systems biology there is considerable interest in learning networks, such as gene regulatory networks and protein activation pathways, from post-genomic data.

Dynamic Bayesian networks (DBNs) are a flexible modelling tool which can be used to this end. However, the conventional DBN model, known from textbooks, is based on the assumption that the underlying regulatory processes are homogeneous. This assumption is unrealistic for many biological applications, and can thus lead to biased results and erroneous conclusions.

For the last years I have therefore been developing a variety of novel non-homogeneous DBN (NH-DBN) models. Those advanced NH-DBN models combine DBNs e.g. with change point models, mixture models or hidden Markov models, so that non-homogeneous processes can also be adequately approximated. However, since the available data sets are usually sparse (e.g. short gene expression time series)  those advanced NH-DBN models tend to be over-flexible and hence have a tendency towards data-overfitting. For my models I use hierarchical Bayesian modelling approaches and concepts to regularize the model flexibility.

Within the DSSC framework, I have already started a fruitful collaboration with various biological research groups from ERIBA. We supervise a joint DSSC PhD project, and the goal of the project is to analyse various data sets collected at ERIBA as well as to develop a general pipeline for analysing post-genomic data sets with Bayesian networks. Our PhD student, Victor Bernal, started his PhD project in August 2016.

PhD profile: MSc Henk van Waarde 

Henk van Waarde is a PhD candidate within the DSSC. van Waarde photoHe is supervised by Dr. Pietro Tesi (ENTEG) and Prof. Dr. Kanat Camlibel (JBI) in the project Complex Dynamical Networks: From Data to Connectivity Structure. The research project concerns networks of dynamical systems. A network of dynamical systems should be understood as a collection of dynamical systems that communicate and/or interact with each other. Often, the connectivity structure of such systems is not known exactly. The connectivity structure is, however, a highly important factor determining the overall behavior of the dynamical network. Hence, this project aims at developing new system-theoretic methods and algorithms to infer the connectivity structure of a network of dynamical systems, on the basis of observed data obtained from the network.

Drs. van Waarde studied Applied Mathematics (Systems, Control and Optimization) at the University of Groningen. His research interests are in systems and control theory, with an emphasis on networks of dynamical systems. Henk van Waarde has previously worked on new graph-theoretic methods for controllability analysis of networked dynamical systems. He also worked on fault detection and isolation in water distribution networks.

The course Introduction to Data Science: look ahead by looking back

The course Introduction to Data Science was inaugurated in November 2016 and was shared by the Data Science specialization in the Astronomy Master’s Programme and the Data Science and Systems Complexity specialization in the Computing Science Programme. The course brought together students with various backgrounds: Mathematics (9), Computer Science (36) and other disciplines including Astronomy, Molecular Biology and Biotechnology, Industrial Engineering & Management. Several of the students were part of an exchange program.

The instructors and DSSC Pioneers Dr. Kerstin Bunte and Dr. Mircea Lungu share with us their approach to the course:

“In order to simulate realistic data science situations, we assigned students to heterogeneous groups making sure that each group had an interdisciplinary background. The assignments were designed in such a way that the students of different backgrounds would have advantage in different parts of the practicals. The most successful groups eventually developed a team environment where the members were sharing their knowledge with the others by teaching the aspects that they had more experience with.

The students had the chance to practice a variety of skills that a data scientist must be familiar with: Collaborative project development including using distributed the git version control system, Reading, summarizing, and presenting relevant state-of-the-art research, Following the steps of the data analysis life cycle including (identifying data sources, collecting and combining various sources, cleaning up, exploring and analyzing the data, and finally discussing the outcomes of their analysis).

To ensure the attractiveness of the practicals we avoided toy examples and focused on offering the students a variety of realistic data science problems that span various domains: Descriptive analysis and visualization applied to data from open APIs, Music and movie recommendations based on association rule mining applied to hundreds of thousands of user preferences scraped from online services such as and OMDB, Clustering of astronomical data representing simulated galaxy evolution with several hundred thousand stars, Text analysis and text summarization applied to literary texts, Classification applied to mixed marketing data.

We are already looking forward to the next iterations of the course whose interdisciplinary character we anticipate to be reinforced as other Master's programmes have already manifested their intention of recommending Introduction to Data Science to their students.”

Kick-Off Meeting Fundamentals of the Universe

Kick-off PosterOn Thursday January 19 the kick-off meeting of the Research Priority Fundamentals of the Universe took place in the Van Swinderen Huys. This Research Priority is an initiative of the Kapteyn Institute, the Van Swinderen Institute, KVI-CART and the Johann Bernoulli Institute and is at the heart of the exemplary route 5 Bouwstenen van Materie en Fundamenten van Ruimte en Tijd (Building blocks of matter and Fundamentals of space and time) of the Dutch Nationale Wetenschaps Agenda (NWA).

An exciting program with both local and national speakers was offered where the three Universes underlying this initiative were presented: the Instrumentation Universe, the Building Blocks of the Universe and the Emergent Universe.

The afternoon program was closed with a panel discussion where among other things the opportunities of this initiative, in particular for attracting financial support, possible activities for the future and the experiences of the DSSC Research Theme, were discussed. Among the follow-up activities will be an informal lunch meeting at the Kapteyn Institute and the organization of the QU7 workshop on April 12 and 13 which this time will be dedicated to the Fundamentals of the Universe Research Priority. 

CALLS FOR PROPOSALS                                                
Horizon 2020                                                                           
EINFRA- 12-2017: Data and Distributed Computing e-infrastructures for Open Science
Deadline: March 29, 2017

EINFRA-21-2017: Platform-driven e-infrastructure innovation
Deadline: March 29, 2017

INFRAINNOV-01-2017: Fostering co-innovation for future detection and imaging technologies
Deadline: March 29, 2017

ICT-05-2017: Customised and low energy computing. 
Deadline: April 25, 2017

ICT-16-2017: Big data PPP: research addressing main technology challenges of the data economy.
Deadline: April 25, 2017

ICT-05-2017: Customised and low energy computing
Deadline: April 25, 2017

The Dutch Data Science Awards

The Dutch Data Science Awards are offered by the Royal Holland Society of Sciences and Humanities and the Big Data Alliance to exceptionally innovative entrepreneurship and scientific research in the field of data science. The awards are divided in three categories: Startup, Institutional and Science. All disciplines and application areas are eligible. More information is available on the website of the Dutch Data Science Awards
Deadline: 21 March. 


DSSC thematic meeting: "Progress of DSSC PhD projects"
27 March, 13.30 - 15.30, 5161.165

Addressed to all DSSC members
PhD projects presentations:
Oscar Portoles Marin (MSc), "Uncovering the information processing underlying the interactions between brain areas." 
Victor Arturo Bernal (MSc), "Clinical Big Data for multifactorial diseases: from molecular profiles to precision medicine.

CIT Data Scientists in research: by Jonas Bulthuis (MSc) and Herbert Kruitbosch (MSc) (CIT)

DSSC updates and announcements
Digital Innovation Forum
10-11 May, Amsterdam

This international event is the industry-driven Digital innovation conference in Europe, showing R&I results and emerging challenges towards a vision on the future for and built by industry. The schedule includes workshops on Smart Energy, Smart Health, Smart Manufacturing, Smart Mobility.

More information is available here
International Supercomputing Conference
18-22 June, Frankfurt

ISC High Performance brings together different academic and commercial disciplines to share knowledge in the field of high performance computing. Topics: Processor Elements & Memory for HPC, Exascale System Developments: Future Concepts, Programming Models & User Experiences, Life Sciences & Drug Design, Earthquake Prediction & Sea Level Rise, HPC Influences on Energy Exploration, Combustion/Turbulences&Extreme Scale Algorithms, Advanced Material Science, Virtual & Simulated Reality, Large Scale Engineering & Cloud Computing, Big Data Analytics - SKA & LHC, Deep Learning Goes HPC, How HPC Influences Industrie 4.0.

More information is available here.
2nd Annual Wind Power Big Data & IoT Forum
30-31 March, Amsterdam
Event addressed to scientists and managers interested in Wind Energy, Wind farm sustainability, Technology, Innovation, Research & Development, Operation & Maintenance, Big Data application, Wind Performance, Wind Data Analytics, Meteorology, Wind Turbine Reliability & Performance Monitoring, Loads Engineering, Machine Learning, Data Science, SCADA Operations, Condition Monitoring, Wind Farm Data Analysis And Performance Optimization, Fleet Analysis.

Registration and information are available here.
BeNeLearn 2017
9-10 June, Eindhoven UT

BeNeLearn is the annual machine learning conference of Belgium and The Netherlands. It serves as a forum for researchers to exchange ideas, present recent work, and foster collaboration in the broad field of Machine Learning and its applications. The main program consists of several major tracks: Special track on Deep Learning, Special track on Complex Networks, and Industry Track. Numerous other topics will also be covered. 

Registration and more information are available here.

27-28 June, Ecole Polytechnique

The TERATEC Forum is a major event in France and Europe that brings together the best international experts in HPC, Simulation and Big Data. It reaffirms the strategic importance of these technologies for developing industrial competitiveness and innovation capacity.

More information is available here.
