Apache Spark: Difference between revisions
Jump to navigation
Jump to search
Hoppinglife (talk | contribs) Created page with "== References == https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/ Mastering Spark SQL" |
Hoppinglife (talk | contribs) No edit summary |
||
| (One intermediate revision by the same user not shown) | |||
| Line 1: | Line 1: | ||
== Code generation == | |||
[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-SparkPlan-WholeStageCodegenExec.html A description about whole stage execution]. | |||
== Interesting papers == | |||
* [http://www.vldb.org/pvldb/vol12/p1850-roy.pdf SparkCruise: Handsfree Computation Reuse in Spark] | |||
The idea is finding common subexpressions through logging, and selectively materialize such subquery to improve the performance. This is similar to [https://www.microsoft.com/en-us/research/uploads/prod/2018/03/cloudviews-sigmod2018.pdf CloudView]. | |||
== References == | == References == | ||
[[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/ Mastering Spark SQL]] | [[https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/ Mastering Spark SQL]] | ||
Latest revision as of 01:27, 22 May 2020
Code generation
A description about whole stage execution.
Interesting papers
The idea is finding common subexpressions through logging, and selectively materialize such subquery to improve the performance. This is similar to CloudView.