o2geao 发表于 2016-12-7 06:23:07

Cloudera Developer Training for Apache Hadoop

  ·The Motivation For Hadoop
· Problems with traditional large-scale systems
· Requirements for a new approach
  · Hadoop Basic Concepts
· An Overview of Hadoop
· The Hadoop Distributed File System
· How MapReduce Works
· Anatomy of a Hadoop Cluster
· Other Hadoop Ecosystem Components
  · Writing a MapReduce Program
· The MapReduce Flow
· Examining a Sample MapReduce Program
· Basic MapReduce API Concepts
· The Driver Code
· The Mapper
· The Reducer
· Hadoop’s Streaming API
· Using Eclipse for Rapid Development
  · Integrating Hadoop Into The Workflow
· Relational Database Management Systems
· Storage Systems
· Creating workflows with Oozie
· Importing Data from RDBMSs With Sqoop
· Importing Real-Time Data with Flume
· Accessing HDFS Using FuseDFS and Hoop
  · Delving Deeper Into The Hadoop API
· Using Combiners
· Using LocalJobRunner Mode for Faster Development
· Reducing Intermediate Data with Combiners
· The configure and close methods for MapReduce
  Setup and Teardown
· Writing Partitioners for Better Load Balancing
· Directly Accessing HDFS
· Using The Distributed Cache
  · Using Hive and Pig
· Hive Basics
· Pig Basics
  · Common MapReduce Algorithms
· Sorting and Searching
· Indexing
· Machine Learning with Mahout
· Term Frequency - Inverse Document Frequency
· Word Co-Occurrence
  · Practical Development Tips and Techniques
· Testing with MRUnit
· Debugging MapReduce Code
· Using LocalJobRunner Mode for Easier Debugging
· Eclipse development techniques
· Retrieving Job Information with Counters
· Logging
· Splittable File Formats
· Determining the Optimal Number of Reducers
· Map-Only MapReduce Jobs
· Implementing Multiple Mappers using ChainMapper
  · More Advanced MapReduce Programming
· Custom Writables and WritableComparables
· Saving Binary Data using SequenceFiles and Avro Files
· Creating InputFormats and OutputFormats
  · Joining Data Sets in MapReduce Jobs
· Map-Side Joins
· The Secondary Sort
· Reduce-Side Joins
  · Graph Manipulation in Hadoop
· Introduction to graph techniques
· Representing Graphs in Hadoop
· Implementing a sample algorithm: Single Source
· Shortest Path
  · Creating Workflows with Oozie
· The Motivation for Oozie
· Oozie’s Workflow Definition Format
页: [1]
查看完整版本: Cloudera Developer Training for Apache Hadoop