In this seminar, we will study the very basics in cluster computing using the Scala programming language
and the exciting Spark engine. Spark steadily becomes the state-of-the-art in cluster computing and big data processing and analytics
due to the excellent support it provides for several domains such as: SQL processing, Streaming, Machine Learning and Graphs.
In addition, Spark supports three programming languages: Java, Scala and Python.
|
To gain as much as possible from this seminar, you are advised to bring your laptops in class. Also, to participate in the hands-on experience,
you will need to either install the required software locally or logon to a remote server using ssh.
However, in order to re-execute the examples and to write and test your own code at home, it is better to use the first option,
and use the second option only as a fallback.
|
In the seminar, we are going to discuss general issues related to cluster computing and we are going to present shortly Hadoop and its ecosystem.
Then we will move on to learn some Scala programming and test some code using the Scala REPL as well as building with Simple Build Tool (sbt).
Then, we will turn to Spark, discuss its fundamental concepts and test some code.
|