To develop both data science and the DSRC as hub of data science expertise, the DSRC aims to encourage cooperation between different areas of informatics research across the two universities. To bootstrap this cooperation, the DSRC has called for proposals in which researchers could apply for two research assistants (0.2fte) working for a full year on a data science topic with an emphasis on projects that span different informatics disciplines. We received 20 thought provoking and interesting project proposals. We were able to fund the following six proposals covering a variety of domains and techniques.
Our student assistants and advisors for 2014 working on data science seed projects pic.twitter.com/ypycbTsEO3
— Data Science A’dam (@dsrcnl) June 27, 2014
The list of projects:
- Accelerated processing of spatio-temporal social graphs
- Students: Jesse Donker, Vlad Tudose
- Supervisors: Claudio Martella (VU) / Ana Varbanescu (UvA)
- Addressing the computational challenges of large spatio-temporal graphs using modern HPC architectures. Focus on designing parallel graph pattern matching algorithms to address a number of interesting crowd dynamics, and implementing them efficiently on many-core CPUs, GPUs, and combinations thereof.
- Using Linked Open Data for Medical Question Answering using a combined KR/IR approach
- Students: Bas Cornelissen and Florian Golemo
- Supervisors: Frank van Harmelen (VU) / Maarten de Rijke (UvA) / David Graus (UvA) / Annette ten Teije (VU)
- “What are the symptoms of diabetes?”, “What are the adverse effects of Amoxicillin?”, “Which drugs have major interactions with tetracycline?”. These are just some of the medical questions for which both the general public and medical practitioners turn to the Web for answers. However, most of the medical content of the Web is unstructured, in the form of text documents, making question answering difficult. This project will consider the quality of medical data sources in the Linked Open Data Cloud and seek algorithms for separating good from bad answers.
- Data Science in the Browser: Distributed Computing and Visualization of Machine Learning Research
- Students: Thomas Schoegje and Said Al Farab
- Supervisors: Ted Meeds (UvA) / Magiel Bruntink (UvA)
- Lighthouse: lighting up the warehouse with a SPARQL
- Renske Augustijn and Andreea Sandu
- Supervisors: Spyros Voulgaris (VU) /Peter Boncz (VU/CWI)
- This project proposes Lighthouse, a parallel and distributed graph processing engine where computations are expressed through a high-level language, and they are automatically translated into optimized Pregel jobs. In particular, we propose to implement a query execution engine for SPARQL on top of Apache Giraph and Hadoop.
- Quantifying Historical Perspectives on WWII
- Students: Cristinia Garbacea and Thomas Schoegje
- Supervisors: Laura Hollink (VU) / Jacco van Ossebruggen (VU/CWI) / Victor de Boer (VU) / Daan Odijk (UvA)
- The Second World War is a defining event in our recent history. A huge amount of digital material has become available to study it, ranging from newspapers published during and after the war, books about the war and, more recently, Web pages about people, places and events that played a role. Each source has a different perspective on what happened, depending on the medium, time and location of publication. In this project we aim to quantify these different perspectives. For this purpose, we employ a data science pipeline for selection, structuring, linking and visualization of WOIIrelated material from NIOD, the National Library of the Netherlands, and Wikipedia. With the data and visualization tools we produce, we provide insight into the volume, selection and depth of WOIIrelated topics across different media, times and locations.
- Students: Karl Lundefall
- Supervisors: Paola Grosso (UvA) / Patricia Lago (VU)
- Processing and analysing large amounts of data requires a proper computing infrastructure. This can in principle be present where the data resides, or only available remotely. Many criteria, e.g. monetary cost, privacy and legal considerations, can be used by the software application to make the decision to process in situ or move the computation and analysis elsewhere. Aim is to develop a calculator as an online tool available to the Data Science community; it will allow assessing the environmental impact of Big Data transfer scenarios.