Apache Spark: Difference between revisions

From Celeste@Hoppinglife
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 1: Line 1:
== Code Generation ==
== Code generation ==


[[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-SparkPlan-WholeStageCodegenExec.html]]
[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-SparkPlan-WholeStageCodegenExec.html A description about whole stage execution].
 
== Interesting papers ==
 
* [http://www.vldb.org/pvldb/vol12/p1850-roy.pdf SparkCruise: Handsfree Computation Reuse in Spark]
 
The idea is finding common subexpressions through logging, and selectively materialize such subquery to improve the performance. This is similar to [https://www.microsoft.com/en-us/research/uploads/prod/2018/03/cloudviews-sigmod2018.pdf CloudView].


== References ==
== References ==


[[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/ Mastering Spark SQL]]
[[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/ Mastering Spark SQL]]

Latest revision as of 01:27, 22 May 2020

Code generation

A description about whole stage execution.

Interesting papers

The idea is finding common subexpressions through logging, and selectively materialize such subquery to improve the performance. This is similar to CloudView.

References

[Mastering Spark SQL]