设为首页 收藏本站
查看: 799|回复: 0

[软件发布] Apache Spark 1.5.0 正式发布

[复制链接]
发表于 2015-9-22 12:01:54 | 显示全部楼层 |阅读模式
欢迎加入运维网交流群:263444886   DSC0000.jpg
DSC0001.png   Spark 1.5.0 是 1.x 系列的第六个版本,收到 230+ 位贡献者和 80+ 机构的努力,总共 1400+ patches。值得关注的改进如下:
  

  •   APIs:RDD, DataFrame 和 SQL
  •   后端执行:DataFrame 和 SQL
  •   集成:数据源,Hive, Hadoop, Mesos 和集群管理
  •   R 语言
  •   机器学习和高级分析
  •   Spark Streaming
  •   Deprecations, Removals, Configs 和 Behavior 改进

    •   Spark Core
    •   Spark SQL & DataFrames
    •   Spark Streaming
    •   MLlib

  •   已知问题解决
      

    •   SQL/DataFrame
    •   Streaming

  •   Credits
  下载:spark-1.5.0.tgz
  详细改进请看发行说明和更新日志。
  
  新特性列表:
  

  •   [SPARK-1855] - Provide memory-and-local-disk RDD checkpointing
  •   [SPARK-4176] - Support decimals with precision > 18 in Parquet
  •   [SPARK-4751] - Support dynamic allocation for standalone mode

  •   [SPARK-4752] ->
  •   [SPARK-5133] - Feature Importance for Random Forests
  •   [SPARK-5155] - Python API for MQTT streaming
  •   [SPARK-5962] - [MLLIB] Python support for Power Iteration Clustering
  •   [SPARK-6129] - Create MLlib metrics user guide with algorithm definitions and complete code examples.
  •   [SPARK-6390] - Add MatrixUDT in PySpark
  •   [SPARK-6487] - Add sequential pattern mining algorithm PrefixSpan to Spark MLlib
  •   [SPARK-6813] - SparkR style guide
  •   [SPARK-6820] - Convert NAs to null type in SparkR DataFrames
  •   [SPARK-6833] - Extend `addPackage` so that any given R file can be sourced in the worker before functions are run.
  •   [SPARK-6964] - Support Cancellation in the Thrift Server
  •   [SPARK-7083] - Binary processing dimensional join
  •   [SPARK-7254] - Extend PIC to handle Graphs directly
  •   [SPARK-7293] - Report memory used in aggregations and joins
  •   [SPARK-7368] - add QR decomposition for RowMatrix
  •   [SPARK-7387] - CrossValidator example code in Python
  •   [SPARK-7422] - Add argmax to Vector, SparseVector
  •   [SPARK-7440] - Remove physical Distinct operator in favor of Aggregate
  •   [SPARK-7547] - Example code for ElasticNet
  •   [SPARK-7604] - Python API for PCA and PCAModel
  •   [SPARK-7605] - Python API for ElementwiseProduct
  •   [SPARK-7639] - Add Python API for Statistics.kernelDensity

  •   [SPARK-7690] - MulticlassClassificationEvaluator for tuning Multiclass>
  •   [SPARK-7879] - KMeans API for spark.ml Pipelines
  •   [SPARK-7888] - Be able to disable intercept in Linear Regression in ML package
  •   [SPARK-7988] - Mechanism to control receiver scheduling
  •   [SPARK-8019] - [SparkR] Create worker R processes with a command other then Rscript
  •   [SPARK-8124] - Created more examples on SparkR DataFrames
  •   [SPARK-8129] - Securely pass auth secrets to executors in standalone cluster mode
  •   [SPARK-8169] - Add StopWordsRemover as a transformer
  •   [SPARK-8302] - Support heterogeneous cluster nodes on YARN
  •   [SPARK-8313] - Support Spark Packages containing R code with --packages
  •   [SPARK-8344] - Add internal metrics / logging for DAGScheduler to detect long pauses / blocking
  •   [SPARK-8348] - Add in operator to DataFrame Column
  •   [SPARK-8364] - Add crosstab to SparkR DataFrames
  •   [SPARK-8431] - Add in operator to DataFrame Column in SparkR
  •   [SPARK-8446] - Add helper functions for testing physical SparkPlan operators
  •   [SPARK-8456] - Python API for N-Gram Feature Transformer
  •   [SPARK-8479] - Add numNonzeros and numActives to linalg.Matrices
  •   [SPARK-8484] - Add TrainValidationSplit to ml.tuning
  •   [SPARK-8522] - Disable feature scaling in Linear and Logistic Regression

  •   [SPARK-8538] - LinearRegressionResults>

  •   [SPARK-8539] - LinearRegressionSummary>
  •   [SPARK-8551] - Python example code for elastic net
  •   [SPARK-8564] - Add the Python API for Kinesis
  •   [SPARK-8579] - Support arbitrary object in UnsafeRow
  •   [SPARK-8598] - Implementation of 1-sample, two-sided, Kolmogorov Smirnov Test for RDDs
  •   [SPARK-8600] - Naive Bayes API for spark.ml Pipelines
  •   [SPARK-8671] - Add isotonic regression to the pipeline API
  •   [SPARK-8704] - Add missing methods in StandardScaler (ML and PySpark)
  •   [SPARK-8706] - Implement Pylint / Prospector checks for PySpark
  •   [SPARK-8711] - Add additional methods to JavaModel wrappers in trees
  •   [SPARK-8774] - Add R model formula with basic support as a transformer
  •   [SPARK-8777] - Add random data generation test utilities to Spark SQL
  •   [SPARK-8782] - GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)
  •   [SPARK-8798] - Allow additional uris to be fetched with mesos
  •   [SPARK-8807] - Add between operator in SparkR
  •   [SPARK-8847] - String concatination with column in SparkR
  •   [SPARK-8867] - Show the UDF usage for user.
  •   [SPARK-8874] - Add missing methods in Word2Vec ML
  •   [SPARK-8882] - A New Receiver Scheduling Mechanism
  •   [SPARK-8936] - Hyperparameter estimation in LDA
  •   [SPARK-8967] - Implement @since as an annotation
  •   [SPARK-8996] - Add Python API for Kolmogorov-Smirnov Test
  •   [SPARK-9022] - UnsafeProject
  •   [SPARK-9023] - UnsafeExchange
  •   [SPARK-9024] - Unsafe HashJoin
  •   [SPARK-9028] - Add CountVectorizer as an estimator to generate CountVectorizerModel
  •   [SPARK-9112] - Implement LogisticRegressionSummary similar to LinearRegressionSummary
  •   [SPARK-9115] - date/time function: dayInYear
  •   [SPARK-9143] - Add planner rule for automatically inserting Unsafe  Safe row format converters
  •   [SPARK-9178] - UTF8String empty string method
  •   [SPARK-9201] - Integrate MLlib with SparkR using RFormula
  •   [SPARK-9230] - SparkR RFormula should support StringType features
  •   [SPARK-9231] - DistributedLDAModel method for top topics per document
  •   [SPARK-9245] - DistributedLDAModel predict top topic per doc-term instance
  •   [SPARK-9246] - DistributedLDAModel predict top docs per topic
  •   [SPARK-9263] - Add Spark Submit flag to exclude dependencies when using --packages
  •   [SPARK-9381] - Migrate JSON data source to the new partitioning data source
  •   [SPARK-9391] - Support minus, dot, and intercept operators in SparkR RFormula
  •   [SPARK-9440] - LocalLDAModel should save docConcentration, topicConcentration, and gammaShape
  •   [SPARK-9464] - Add property-based tests for UTF8String

  •   [SPARK-9471] - Multilayer perceptron>
  •   [SPARK-9544] - RFormula in Python
  •   [SPARK-9657] - PrefixSpan getMaxPatternLength should return an Int
  •   [SPARK-10106] - Add `ifelse` Column function to SparkR
  Apache Spark 是一种与 Hadoop 相似的开源集群计算环境,但是两者之间还存在一些不同之处,这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越,换句话说,Spark 启用了内存分布数据集,除了能够提供交互式查询外,它还可以优化迭代工作负载。
  Spark 是在 Scala 语言中实现的,它将 Scala 用作其应用程序框架。与 Hadoop 不同,Spark 和 Scala 能够紧密集成,其中的 Scala 可以像操作本地集合对象一样轻松地操作分布式数据集。
  尽管创建 Spark 是为了支持分布式数据集上的迭代作业,但是实际上它是对 Hadoop 的补充,可以在 Hadoo 文件系统中并行运行。通过名为Mesos 的第三方集群框架可以支持此行为。Spark 由加州大学伯克利分校 AMP 实验室 (Algorithms, Machines, and People Lab) 开发,可用来构建大型的、低延迟的数据分析应用程序。
DSC0002.png DSC0003.png

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.iyunv.com/thread-117209-1-1.html 上篇帖子: Apache Cordova iOS 4 发布,支持 WKWebView 下篇帖子: Ultimate Edition 4.6 "Gamers" 发布
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表