Hadoop Installation for Windows – Brain Mentors Skip to content

Hadoop Installation for Windows

In this blog you will understand how to download and install Hadoop for windows

Steps to Install Hadoop

  • Install Java JDK 1.8
  • Download Hadoop and extract and place under C drive
  • Set Path in Environment Variables
  • Config files under Hadoop directory
  • Create folder datanode and namenode under data directory
  • Edit HDFS and YARN files
  • Set Java Home environment in Hadoop environment
  • Setup Complete. Test by executing start-all.cmd

There are two ways to install Hadoop, i.e.

  • Single node
  • Multi node

Single node cluster means only one DataNode running and setting up all the NameNode, DataNode, ResourceManager and NodeManager on a single machine.

This is used for studying and testing purposes.

  • So for testing whether the Oozie jobs have scheduled all the processes like collecting, aggregating, storing and processing the data in a proper sequence, we use single node cluster.
  • It can easily and efficiently test the sequential workflow in a smaller environment as compared to large environments which contains terabytes of data distributed across hundreds of machines.

 

While in a Multi node cluster, there are more than one DataNode running and each DataNode is running on different machines. The multi node cluster is practically used in organizations for analyzing Big Data. In real time when we deal with petabytes of data, it needs to be distributed across hundreds of machines to be processed. Thus, here we use multi node cluster.

Setting up a single node Hadoop cluster

Prerequisites to install Hadoop on windows

  • VIRTUAL BOX (For Linux): it is used for installing the operating system on it.
  • OPERATING SYSTEM: You can install Hadoop on Windows or Linux based operating systems. Ubuntu and CentOS are very commonly used.
  • JAVA: You need to install the Java 8 package on your system.
  • HADOOP: You require Hadoop latest version

 

  1. Install Java

– Java JDK Link to download

https://www.oracle.com/java/technologies/javase-jdk8-downloads.html

– extract and install Java in C:\Java

– open cmd and type -> javac -version

  1. Set the path JAVA_HOME Environment variable
  2. Set the path HADOOP_HOME Environment variable
  1. Configurations

Edit file C:/Hadoop-3.3.0/etc/hadoop/core-site.xml,

paste the xml code in folder and save

 

<configuration>

   <property>

       <name>fs.defaultFS</name>

       <value>hdfs://localhost:9000</value>

   </property>

</configuration>

======================================================

 

Rename “mapred-site.xml.template” to “mapred-site.xml” and edit this file C:/Hadoop-3.3.0/etc/hadoop/mapred-site.xml, paste xml code and save this file.

 

<configuration>

   <property>

       <name>mapreduce.framework.name</name>

       <value>yarn</value>

   </property>

</configuration>

======================================================

 

Create folder “data” under “C:\Hadoop-3.3.0”

Create folder “datanode” under “C:\Hadoop-3.3.0\data”

Create folder “namenode” under “C:\Hadoop-3.3.0\data”

 

======================================================

Edit file C:\Hadoop-3.3.0/etc/hadoop/hdfs-site.xml,

paste xml code and save this file.

 

<configuration>

   <property>

       <name>dfs.replication</name>

       <value>1</value>

   </property>

   <property>

       <name>dfs.namenode.name.dir</name>

       <value>/hadoop-3.3.0/data/namenode</value>

   </property>

   <property>

       <name>dfs.datanode.data.dir</name>

       <value>/hadoop-3.3.0/data/datanode</value>

   </property>

</configuration>

======================================================

 

Edit file C:/Hadoop-3.3.0/etc/hadoop/yarn-site.xml,

paste xml code and save this file.

 

<configuration>

   <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

   </property>

   <property>

               <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> 

                <value>org.apache.hadoop.mapred.ShuffleHandler</value>

   </property>

</configuration>

======================================================

 

Edit file C:/Hadoop-3.3.0/etc/hadoop/hadoop-env.cmd

by closing the command line

“JAVA_HOME=%JAVA_HOME%” instead of set “JAVA_HOME=C:\Java”

 

======================================================

  1. Hadoop Configurations

Download

https://github.com/brainmentorspvtltd/BigData_RDE/blob/master/Hadoop%20Configuration.zip

or (for hadoop 3)

https://github.com/s911415/apache-hadoop-3.1.0-winutils

– Copy folder bin and replace existing bin folder in

C:\Hadoop-3.3.0\bin

– Format the NameNode

– Open cmd and type command “hdfs namenode –format”

  1. Testing

– Open cmd and change directory to C:\Hadoop-3.3.0\sbin

– type start-all.cmd

(Or you can start like this)

– Start namenode and datanode with this command

– type start-dfs.cmd

– Start yarn through this command

– type start-yarn.cmd

 

Make sure these apps are running

– Hadoop Namenode

– Hadoop datanode

– YARN Resource Manager

– YARN Node Manager

======================================================

Hadoop installed Successfully…………

======================================================

Sign Up and Start Learning