Skip to main content

Blog posts tagged
"big_data"


robgibbon
15 October 2024

Apache Spark 4.0 beta release – try it now

Data Platform Article

Apache Spark is a popular framework for developing distributed, parallel data processing applications. Our solution for Apache Spark on Kubernetes has made significant progress in the past year since we launched, adding support for Apache Iceberg, a new GPU accelerated image using the NVIDIA Spark-RAPIDS plugin, and support for the Volcan ...


robgibbon
15 July 2024

Deploying and scaling Apache Spark on Amazon AWS EKS

Data Platform Article

Move over Hadoop, it’s time for Spark on Kubernetes Apache Spark, a framework for parallel distributed data processing, has become a popular choice for building streaming applications, data lake houses and big data extract-transform-load data processing (ETL). It is horizontally scalable, fault-tolerant, and performs well at high scale. H ...