ssh root@IP_Address -p Port_number<\/pre>\nReplace “root” with a user that has sudo privileges if necessary. Additionally, replace “IP_Address” and “Port_Number” with your server\u2019s respective IP address and SSH port number.<\/p>\n
You can check whether you have the proper Debian version installed on your server with the following command:<\/p>\n
$ lsb_release -a<\/pre>\nYou should get this output:<\/p>\n
No LSB modules are available.\r\nDistributor ID: Debian\r\nDescription: Debian GNU\/Linux 11 (bullseye)\r\nRelease: 11\r\nCodename: bullseye<\/pre>\nBefore starting, you have to make sure that all Ubuntu OS packages installed on the server are up to date. \nYou can do this by running the following commands:<\/p>\n
$ sudo apt update -y\r\n$ sudo apt upgrade -y<\/pre>\n<\/span>Step 2. Create a System User and Generate SSH Key<\/span><\/h2>\nIt is not a good idea to run Hadoop as root, so for security reasons, we will create a new system user:<\/p>\n
$ sudo useradd -r hadoop -m -d \/opt\/hadoop --shell \/bin\/bash<\/pre>\nA user ‘hadoop’ has been created, let’s log in as the user.<\/p>\n
$ su - hadoop<\/pre>\nHadoop requires ssh access to manage its nodes, whether remote or local nodes. To access the nodes without a password, we can generate SSH key and copy the public key to the ~\/.ssh\/authorized_keys file.<\/p>\n
$ ssh-keygen -t rsa<\/pre>\nYou will get an output like this.<\/p>\n
hadoop@debian11:~$ ssh-keygen -t rsa\r\nGenerating public\/private rsa key pair.\r\nEnter file in which to save the key (\/opt\/hadoop\/.ssh\/id_rsa): \r\nCreated directory '\/opt\/hadoop\/.ssh'.\r\nEnter passphrase (empty for no passphrase): \r\nEnter same passphrase again: \r\nYour identification has been saved in \/opt\/hadoop\/.ssh\/id_rsa\r\nYour public key has been saved in \/opt\/hadoop\/.ssh\/id_rsa.pub\r\nThe key fingerprint is:\r\nSHA256:QYHlb6Is9n05OtnR+6i71t4MZeN9gVqGVCoa28aiUXg hadoop@debian11.rosehosting.com\r\nThe key's randomart image is:\r\n+---[RSA 3072]----+\r\n| o+. . |\r\n| oo o |\r\n| . Eo. o |\r\n| o *oo . . |\r\n| . +S+oo ++. |\r\n| .o.oo. =+ o.|\r\n| o.o o =... o|\r\n| . o .o * o= .|\r\n| . o=+*o.+ |\r\n+----[SHA256]-----+<\/pre>\nNext, let’s add hadoop’s public key to the authorized key file, to allow user ‘hadoop’ to log in to the system without a password and only use the SSH key.<\/p>\n
$ cat ~\/.ssh\/id_rsa.pub > ~\/.ssh\/authorized_keys<\/pre>\nLog in to the system through SSH now.<\/p>\n
$ ssh localhost<\/pre>\nYou should be able to log in to SSH without a password now. \nLet’s exit from user ‘hadoop’ and then continue to the next step.<\/p>\n
$ exit<\/pre>\n<\/span>Step 3. Install Java<\/span><\/h2>\nHadoop is written in Java, so we require Java in our system to be able to run Hadoop. Let’s run this command below to install the default JDK for Java from the repository.<\/p>\n
$ sudo apt install default-jdk default-jre -y<\/pre>\nJava should be installed now, you can check and verify it by invoking this command:<\/p>\n
$ sudo java -version<\/pre>\n<\/span>Step 4. Download and Install Hadoop<\/span><\/h2>\nAt the time of writing this article, the latest stable version of Hadoop is version 3.3.2. You can go to their download page at https:\/\/hadoop.apache.org\/releases.html to check the more recent version if any.<\/p>\n
Let’s log in as user ‘hadoop’ to download and extract it, so we do not need to change the file and directory permission.<\/p>\n
$ su - hadoop\r\n$ wget https:\/\/dlcdn.apache.org\/hadoop\/common\/hadoop-3.2.3\/hadoop-3.2.3.tar.gz -O hadoop-3.2.3.tar.gz\r\n$ tar -xzvf hadoop-3.2.3.tar.gz -C \/opt\/hadoop --strip-components=1<\/pre>\nBefore continuing to the next steps, make sure JAVA_HOME<\/strong> is pointing to the correct directory, you can check this by listing \/usr\/lib\/jvm<\/p>\n$ ls \/var\/lib\/jvm<\/pre>\n \nNow, let’s edit \/opt\/hadoop\/.bashrc<\/p>\n
$ nano \/opt\/hadoop\/.bashrc<\/pre>\nInsert the following lines into the file.<\/p>\n
export JAVA_HOME=\/usr\/lib\/jvm\/java-11-openjdk-amd64\r\nexport HADOOP_HOME=\/opt\/hadoop\r\nexport PATH=$PATH:$HADOOP_HOME\/bin\r\nexport PATH=$PATH:$HADOOP_HOME\/sbin\r\nexport HADOOP_MAPRED_HOME=$HADOOP_HOME\r\nexport HADOOP_COMMON_HOME=$HADOOP_HOME\r\nexport HADOOP_HDFS_HOME=$HADOOP_HOME\r\nexport YARN_HOME=$HADOOP_HOME\r\nexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME\/lib\/native\r\nexport HADOOP_OPTS=\"-Djava.library.path=$HADOOP_HOME\/lib\"<\/pre>\nSave the file and exit, then run the command below to activate the newly added environment variables.<\/p>\n
$ source ~\/.bashrc<\/pre>\n<\/span>Step 5. Configure Hadoop<\/span><\/h2>\nHadoop can be configured to run in a single node or multi-node cluster. In this tutorial, we will show you how to set up Hadoop single node cluster or pseudo-distributed mode. There are some files we need to modify in this step, now let’s edit the Hadoop environment file first.<\/p>\n
$ nano \/opt\/hadoop\/etc\/hadoop\/hadoop-env.sh<\/pre>\nAdd the following line to the file.<\/p>\n
export JAVA_HOME=\/usr\/lib\/jvm\/java-11-openjdk-amd64<\/pre>\nEdit core-site.xml file.<\/p>\n
$ nano \/opt\/hadoop\/etc\/hadoop\/core-site.xml<\/pre>\nAdd these lines to the configuration tag.<\/p>\n
<property>\r\n<name>fs.default.name<\/name>\r\n<value>hdfs:\/\/localhost:9000<\/value>\r\n<\/property><\/pre>\n <\/p>\n
Edit hdfs-site.xml file<\/p>\n
$ nano \/opt\/hadoop\/etc\/hadoop\/hdfs-site.xml<\/pre>\nAdd these lines to the configuration tag.<\/p>\n
<property>\r\n<name>dfs.replication<\/name>\r\n<value>1<\/value>\r\n<\/property>\r\n<property>\r\n<name>dfs.namenode.name.dir<\/name>\r\n<value>file:\/opt\/hadoop\/hadoop_tmp\/hdfs\/namenode<\/value>\r\n<\/property>\r\n<property>\r\n<name>dfs.datanode.data.dir<\/name>\r\n<value>file:\/opt\/hadoop\/hadoop_tmp\/hdfs\/datanode<\/value>\r\n<\/property><\/pre>\nSave the file by pressing CTRL + O and exit with CTRL + X<\/p>\n
<\/p>\n
Edit yarn-site.xml file<\/p>\n
$ nano \/opt\/hadoop\/etc\/hadoop\/yarn-site.xml<\/pre>\nAdd these lines to the configuration tag.<\/p>\n
<property>\r\n<name>yarn.nodemanager.aux-services<\/name>\r\n<value>mapreduce_shuffle<\/value>\r\n<\/property>\r\n<property>\r\n<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class<\/name>\r\n<value>org.apache.hadoop.mapred.ShuffleHandler<\/value>\r\n<\/property><\/pre>\n <\/p>\n
The last file to modify is the mapred-site.xml.<\/p>\n
$ nano \/opt\/hadoop\/etc\/hadoop\/mapred-site.xml<\/pre>\nAdd these lines to the configuration tag.<\/p>\n
<property>\r\n<name>mapreduce.framework.name<\/name>\r\n<value>yarn<\/value>\r\n<\/property><\/pre>\nDo not forget to save the file and then exit from the nano editor.<\/p>\n
<\/pre>\nThe files above have been modified, we need to create some directories, run this command:<\/p>\n
$ mkdir -p \/opt\/hadoop\/hadoop_tmp\/hdfs\/{namenode,datanode}<\/pre>\nPrior to starting Hadoop services for the first time, we need to format the namenode.<\/p>\n
$ hdfs namenode -format<\/pre>\nStart namenode and datanode<\/p>\n
$ start-dfs.sh<\/pre>\nIf you see this warning message:<\/p>\n
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable<\/em><\/pre>\nIt means your server OS is 64bit, but Hadoop native library is 32bit. This is expected and you can ignore the warning. If you are not comfortable with it, you can download Hadoop source file then compile it to get the 64bit shared library.<\/p>\n
Now let’s start the YARN resource and node managers.<\/p>\n
$ start-yarn.sh<\/pre>\nThe last one, run this command:<\/p>\n
$ jps<\/pre>\nYou will get an output like this:<\/p>\n
106129 SecondaryNameNode\r\n108050 Jps\r\n105877 NameNode\r\n106375 ResourceManager\r\n105960 DataNode\r\n106458 NodeManager<\/pre>\n <\/p>\n
Now. you can go to http:\/\/YOUR_SERVER_IP_ADDRESS:9870\/ and see the namenode, datanode, etc.<\/p>\n
<\/p>\n
To check the YARN web portal, you can navigate to http:\/\/YOUR_SERVER_IP_ADDRESS:8088\/ \n <\/p>\n
That\u2019s it. You have successfully installed and configured Hadoop on Debian 11 VPS.<\/p>\n
Of course, you don\u2019t have to install Hadoop on Debian 11 if you have a Managed Debian Server<\/a> with us. You can simply ask our support team to install Hadoop on Debian 11 for you. They are available 24\/7 and will be able to help you with the installation.<\/p>\nPS. If you enjoyed reading this blog post on how to install Hadoop on Debian 11, feel free to share it on social networks or simply leave a comment in the comments section. Thanks.<\/p>\n","protected":false},"excerpt":{"rendered":"
The Apache Hadoop or also known as Hadoop is an open-source, Java-based framework that allows for the distributed processing of … <\/p>\n
Read More<\/a><\/p>\n","protected":false},"author":4,"featured_media":41569,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1700,13,1712],"tags":[1962,2015,1603],"yoast_head":"\nHow to Install Hadoop on Debian 11 - RoseHosting<\/title>\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\t \n\t \n\t \n \n \n \n \n \n\t \n\t \n\t \n