Big data solutions often require parallelized computations, with data (and corresponding transformations) distributed across several machines. Spark is JVM-based tool for that, and is an open source technology, adopted among other players by Databricks and Microsoft Azure (as part of HDInsight and Synapse).
Spark is written in Scala, a Java-like strongly typed language, which compiles to bytecode and runs on a JVM. It has complete interoperability with Java.
During this seminar we shall briefly look at Spark architecture and capabilities as well as study some Scala-code for handling data. Links for further immersion will of course be provided.
Speaker
Dmitri Apassov, ML and BI consultant at Avega. Recently returning from a Scala/Spark project for Telia.