Apache Spark 2.0.0 发布，APIs 更新

snake_l · 发表于 2016-10-26 09:07:57

欢迎加入运维网交流群：263444886

　　Apache Spark 2.0.0 发布了，Apache Spark 是一种与 Hadoop 相似的开源集群计算环境，但是两者之间还存在一些不同之处，这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越，换句话说，Spark 启用了内存分布数据集，除了能够提供交互式查询外，它还可以优化迭代工作负载。
　　该版本主要更新APIs，支持SQL 2003，支持R UDF ，增强其性能。300个开发者贡献了2500补丁程序。
　　Apache Spark 2.0.0 APIs更新记录如下：

　　Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface.
　　SparkSession: new entry point that replaces the old SQLContext andHiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.
　　A new, streamlined configuration API for SparkSession
　　Simpler, more performant accumulator API
　　A new, improved Aggregator API for typed aggregation in Datasets

　　Apache Spark 2.0.0 SQL更新记录如下：

　　A native SQL parser that supports both ANSI-SQL as well as Hive QL
　　Native DDL command implementations
　　Subquery support, including
- 　　Uncorrelated Scalar Subqueries
- 　　Correlated Scalar Subqueries
- 　　NOT IN predicate Subqueries (in WHERE/HAVING clauses)
- 　　IN predicate subqueries (in WHERE/HAVING clauses)
- 　　(NOT) EXISTS predicate subqueries (in WHERE/HAVING clauses)
　　View canonicalization support

　　一些新特性：

　　Native CSV data source, based on Databricks’ spark-csv module
　　Off-heap memory management for both caching and runtime execution
　　Hive style bucketing support
　　Approximate summary statistics using sketches, including approximate quantile, Bloom filter, and count-min sketch.

　　性能增强：

　　Substantial (2 - 10X) performance speedups for common operators inSQL and DataFrames via a new technique called whole stage code generation.
　　Improved Parquet scan throughput through vectorization
　　Improved ORC performance
　　Many improvements in the Catalyst query optimizer for common workloads
　　Improved window function performance via native implementations for all window functions
　　Automatic file coalescing for native data sources

　　更多发布信息，可查看发布说明。
　　下载地址：http://spark.apache.org/downloads.html
　　

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[软件发布] Apache Spark 2.0.0 发布，APIs 更新

浏览过的版块

扫码加入运维网微信交流群