Hadoop & Spark學習筆記(一):環境設定、安裝Scala

請在Ubuntu環境下安裝Hadoop與Spark,確保已事先安裝 JAVA

Hadoop: https://hadoop.apache.org/releases.html

Spark: http://spark.apache.org/downloads.html

Image for post
Image for post
出自於:大數據分析與應用:使用Hadoop與Spark(最新版)
Image for post
Image for post
出自於:大數據分析與應用:使用Hadoop與Spark(最新版)

單節點

將所有功能集中在單一電腦,適合單機測試者

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Hadoop的Web UI

開啟瀏覽器

http://master:50070/    #開啟HDFS Web UI網址
http://master:8088/ #Hadoop Resource­Manager Web介面網址

試玩HDFS

hdfs dfs -mkdir -p /user/mis  #在hdfs上,建一個mis的目錄
hdfs dfs -ls -R / #查看hdfs上所有查看目錄資料
#建立test.txt
echo "To be or not to be that is the question" > test.txt
#將poem.txt拷貝到hdfs上的mis目錄
hdfs dfs -copyFromLocal /home/mis/test.txt /user/mis
hdfs dfs -ls -R / #查看hdfs上所有查看目錄資料#試跑一下wordcount的程式
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /user/mis/test.txt /user/mis/output
#查看在/user/mis下產生哪些輸出資料
hdfs dfs -ls -R /user/mis
#查看輸出結果
hdfs dfs -cat /user/mis/output/part-r-00000
#查看執行作業狀態
http://master:8088/
#查看hdfs
http://master:50070/
#設定spark-shell顯示訊息,避免顯示太多訊息(拷貝樣本檔案)
cp /usr/local/spark/conf/log4j.properties.template /usr/local/spark/conf/log4j.properties
sudo gedit /usr/local/spark/conf/log4j.properties將第19行log4j.rootCategory=INFO, console改成log4j.rootCategory=WARN, console#Hadoop + Spark-shellspark-shell #啟動spark

安裝Scala

sudo apt update
sudo apt install scala

Written by

Machine Learning / Deep Learning / Python / Flutter cakeresume.com/yanwei-liu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store