Hadoop Architecture

Hadoop Architecture

Edge nodes are the interface between the Hadoop cluster and the outside network. For this reason, they’re sometimes referred to as gateway nodes. Most commonly, edge nodes are used to run client applications and cluster administration tools. They are often being used as staging areas for the data being transferred to hadoop system.

Name Node is master and only store metadata of HDFS – the directory tree of all files in the file system and tracks the files across the cluster.

Data Node is responsible for actual data in HDFS. Data Node and Name Node are in contant communication via Heartbeat. When a DataNode is down, it does not affect the availability of data or the cluster. NameNode will arrange for replication for the blocks managed by the DataNode that is not available.

Secondary Name Node periodically reads the file system changes logs and apply them into the fsimage file, thus bringing it up to date. This allows the namenode to start up faster next time. (IIt is not back up of primary Name Node)

JobTracker maintains a view of all available processing resources in the Hadoop cluster and, as application requests come in, it schedules and deploys them to the TaskTracker nodes for execution.

Task Tracker manages the processing resources on each slave node in the form of processing slots — the slots defined for map tasks and reduce tasks, to be exact. The total number of map and reduce slots indicates how many map and reduce tasks can be executed at one time on the slave node.

Search This Blog

Big Data Hadoop

Hadoop Architecture

Comments

Post a Comment

Popular posts from this blog

Logs & Errors Components of Talend