Apache Spark: Difference between revisions

From Celeste@Hoppinglife
Jump to navigation Jump to search
 
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
== Code generation ==
[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-SparkPlan-WholeStageCodegenExec.html A description about whole stage execution].
== Interesting papers ==
* [http://www.vldb.org/pvldb/vol12/p1850-roy.pdf SparkCruise: Handsfree Computation Reuse in Spark]
The idea is finding common subexpressions through logging, and selectively materialize such subquery to improve the performance. This is similar to [https://www.microsoft.com/en-us/research/uploads/prod/2018/03/cloudviews-sigmod2018.pdf CloudView].
== References ==
== References ==


[[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/ Mastering Spark SQL]]
[[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/ Mastering Spark SQL]]

Latest revision as of 01:27, 22 May 2020

Code generation

A description about whole stage execution.

Interesting papers

The idea is finding common subexpressions through logging, and selectively materialize such subquery to improve the performance. This is similar to CloudView.

References

[Mastering Spark SQL]