Hadoop Architecture
Hadoop Architecture Edge nodes are the interface between the Hadoop cluster and the outside network. For this reason, they’re sometimes referred to as gateway nodes. Most commonly, edge nodes are used to run client applications and cluster administration tools. They are often being used as staging areas for the data being transferred to hadoop system. Name Node is master and only store metadata of HDFS – the directory tree of all files in the file system and tracks the files across the cluster. Data Node is responsible for actual data in HDFS. Data Node and Name Node are in contant communication via Heartbeat. When a DataNode is down, it does not affect the availability of data or the cluster. NameNode will arrange for replication for the blocks managed by the DataNode that is not available. Secondary Name Node periodically reads the file system changes logs and apply them into the fsimage file, thus bringing it up to date. This allows the namenode to sta