Hadoop Architecture
Hadoop
Architecture
Edge nodes are the
interface between the Hadoop cluster and the outside network. For this reason,
they’re sometimes referred to as gateway nodes. Most commonly,
edge nodes are used to run client applications and cluster administration
tools. They are often being used as staging areas for the data being
transferred to hadoop system.
Name Node is master and
only store metadata of HDFS – the directory tree of all files in the file
system and tracks the files across the cluster.
Data Node is responsible
for actual data in HDFS. Data Node and
Name Node are in contant communication via Heartbeat. When a DataNode is down,
it does not affect the availability of data or the cluster. NameNode will
arrange for replication for the blocks managed by the DataNode that is not
available.
Secondary Name
Node periodically reads the file system changes logs
and apply them into the fsimage file, thus bringing it up to date. This allows
the namenode to start up faster next time. (IIt is not back up of primary Name
Node)
JobTracker maintains a view
of all available processing resources in the Hadoop cluster and, as application
requests come in, it schedules and deploys them to the TaskTracker nodes for
execution.
Task Tracker manages the processing resources on each slave node in the
form of processing slots — the slots defined for map tasks and reduce tasks, to
be exact. The total number of map and reduce slots indicates how many map and
reduce tasks can be executed at one time on the slave node.
Comments
Post a Comment