Databricks launches Project Lightspeed, its next-gen Spark streaming engine
https://techcrunch.com/2022/06/28/databricks-launches-project-lightspeed-its-next-gen-spark-streaming-engine/
At its Data + AI Summit, Databricks today made the requisite number of announcements one would expect from a company’s flagship developer events. Among those are the launch of Delta Lake 2.0, the next version of its platform for building data lakehouses, MLflow 2.0, the next generation of its platform for managing the machine learning pipeline, which now includes MLflow Pipelines with templates for bootstrapping model development, and a couple of announcements around the Apache Spark data analytics engine, which forms part of the core of the Databricks platform.
With Spark Connect, Databricks today announced a new client and server interface for Spark that is based on the DataFrame API. In Spark, a DataFrame is a distributed collection of data that is organized into columns and made available through an API in languages like Scala, Java, Python or R. With Spark Connect, Databricks takes this concept but then decouples the client and server, which the company says will lead to better stability and enables remote connectivity as a built-in feature.