A seminar organized by the Data Engineering Lab and the ACM AUTH Student Chapter
SparkLab May 2015

SparkLab May 2015


Apostolos N. Papadopoulos, Assistant Professor (papadopo@csd.auth.gr)
Data Engineering Lab, Dept of Informatics, Aristotle University of Thessaloniki
19 May 2015, 16:30, Room A

Spark is a very promising distributed engine for data management and analytics. In this seminar, we are going to demonstrate the most fundamental concepts of Spark using the Scala programming language. In fact, Spark has been written in Scala, and therefore, writting code in Scala, is like talking to Spark in its "mother tongue".

In this seminar, we will study the very basics in cluster computing using the Scala programming language and the exciting Spark engine. Spark steadily becomes the state-of-the-art in cluster computing and big data processing and analytics due to the excellent support it provides for several domains such as: SQL processing, Streaming, Machine Learning and Graphs. In addition, Spark supports three programming languages: Java, Scala and Python. To gain as much as possible from this seminar, you are advised to bring your laptops in class. Also, to participate in the hands-on experience, you will need to either install the required software locally or logon to a remote server using ssh. However, in order to re-execute the examples and to write and test your own code at home, it is better to use the first option, and use the second option only as a fallback. In the seminar, we are going to discuss general issues related to cluster computing and we are going to present shortly Hadoop and its ecosystem. Then we will move on to learn some Scala programming and test some code using the Scala REPL as well as building with Simple Build Tool (sbt). Then, we will turn to Spark, discuss its fundamental concepts and test some code.

Download and read carefully the pdf containing the guidelines related to installation and use of the examples. Moreover, download the examples tarfile that we are going to use. In the guidelines pdf you will find information explaining where to extract the examples, and how to compile and run them. Be sure to install everything BEFORE the seminar.

Scala slides *** Spark slides *** Guidelines *** Source code