Avega | Scala and Spark fundamentals

Big data solutions often require parallelized computations, with data (and corresponding transformations) distributed across several machines. Spark is JVM-based tool for that, and is an open source technology, adopted among other players by Databricks and Microsoft Azure (as part of HDInsight and Synapse).

Spark is written in Scala, a Java-like strongly typed language, which compiles to bytecode and runs on a JVM. It has complete interoperability with Java.

During this seminar we shall briefly look at Spark architecture and capabilities as well as study some Scala-code for handling data. Links for further immersion will of course be provided.

Speaker

Dmitri Apassov, ML and BI consultant at Avega. Recently returning from a Scala/Spark project for Telia.