Hadoop是Apache提出的一个软件框架（即：开放源码并行运算编程工具和分布式文件系统，与MapReduce和Google档案系统的概念类似）

lanying56123 · 发表于 2016-12-13 07:15:07

　　Apache Hadoopis asoftware frameworkthat supports data-intensivedistributed applicationsunder afree license.[1]It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired byGoogle'sMapReduceandGoogle File System(GFS) papers.
　　Hadoop is a top-levelApacheproject being built and used by a global community of contributors,[2]using theJavaprogramming language.Yahoo!has been the largest contributor[3]to the project, and uses Hadoop extensively across its businesses.[4]
　　Hadoop was created byDoug Cutting,[5]who named it after his son's toy elephant.[6]It was originally developed to support distribution for theNutchsearch engine project.[7]

Contents
[hide]

1Architecture
- 1.1Filesystems
  - 1.1.1Hadoop Distributed File System
  - 1.1.2Other Filesystems
- 1.2Job Tracker and Task Tracker: the MapReduce engine
  - 1.2.1Scheduling
    - 1.2.1.1Fair scheduler
    - 1.2.1.2Capacity scheduler
- 1.3Other applications
2Prominent users
- 2.1Yahoo!
- 2.2Other users
3Hadoop on Amazon EC2/S3 services
4Hadoop at Google and IBM
5Running Hadoop in compute farm environments
- 5.1Grid Engine Integration
- 5.2Condor Integration
6Commercially supported Hadoop-related products
7See also
8References
9Bibliography
10External links

　　http://hadoop.apache.org/

What Is Apache Hadoop?
Who Uses Hadoop?
News
- March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation Awards
- January 2011 - ZooKeeper Graduates
- September 2010 - Hive and Pig Graduate
- May 2010 - Avro and HBase Graduate
- July 2009 - New Hadoop Subprojects
- March 2009 - ApacheCon EU
- November 2008 - ApacheCon US
- July 2008 - Hadoop Wins Terabyte Sort Benchmark

What Is Apache Hadoop?
　　The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.
　　The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
　　The project includes these subprojects:

Hadoop Common: The common utilities that support the other Hadoop subprojects.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.

　　Other Hadoop-related projects at Apache include:

Avro™: A data serialization system.
Cassandra™: A scalable multi-master database with no single points of failure.
Chukwa™: A data collection system for managing large distributed systems.
HBase™: A scalable, distributed database that supports structured data storage for large tables.
Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
Mahout™: A Scalable machine learning and data mining library.
Pig™: A high-level data-flow language and execution framework for parallel computation.
ZooKeeper™: A high-performance coordination service for distributed applications.

Who Uses Hadoop?
　　A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the HadoopPoweredBywiki page.

http://www.oschina.net/p/hadoop
　　Hadoop并不仅仅是一个用于存储的分布式文件系统，而是设计用来在由通用计算设备组成的大型集群上执行分布式应用的框架。
　　下图是Hadoop的体系结构：

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] Hadoop是Apache提出的一个软件框架（即：开放源码并行运算编程工具和分布式文件系统，与MapReduce和Google档案系统的概念类似）

浏览过的版块

扫码加入运维网微信交流群