Spark Performance Optimization Series: #1. Skew

4.8

(280)

Write Review

$ 13.50

Add to Cart

Qty

In stock

Description

In Spark cluster data is typically read in as 128 MB partitions which ensures even distribution of data. However, as the data is transformed (e.g. aggregated), it is possible to have significantly…

Spark Job Optimization: Dealing with Data Skew

Spark Application Optimization for Performance using Qubole Sparklens

Spark Performance Optimization Series: #2. Spill, by Himansu Sekhar, road to data engineering

Spark Performance Optimization Series: #1. Skew, by Himansu Sekhar, road to data engineering

Spark Performance Tuning & Best Practices - Spark By {Examples}

Kubernetes Architecture,Hands On!, by Himansu Sekhar

Spark Performance Tuning: Skewness Part 1, by Wasurat Soontronchai

List of cool blogs focussing on Spark performance optimization., by Sukul Mahadik

Databricks Notebook Promotion using Azure DevOps, by Himansu Sekhar, road to data engineering

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark: Karau, Holden, Warren, Rachel: 9781491943205: : Books

Optimizing the Skew in Spark

Spark's Data Skew Odyssey: Conquering the Chaos, by Bharathkumar V