大數(shù)據(jù)持續(xù)升溫, 不熟悉幾個(gè)大數(shù)據(jù)組件, 連裝逼的口頭禪都沒有。 最起碼, 你要會(huì)說(shuō)個(gè)hadoop, hdfs, mapreduce, yarn, kafka, spark, zookeeper, neo4j吧, 這些都是裝逼的必備技能。
關(guān)于spark的詳細(xì)介紹, 網(wǎng)上一大堆, 搜搜便是, 下面, 我們來(lái)說(shuō)單機(jī)版的spark的安裝和簡(jiǎn)要使用。
0. 安裝jdk, 由于我的機(jī)器上之前已經(jīng)有了jdk, 所以這一步我可以省掉。 jdk已經(jīng)是很俗氣的老生常談了, 不多說(shuō), 用java/scala的時(shí)候可少不了。
ubuntu@VM-0-15-ubuntu:~$ java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
ubuntu@VM-0-15-ubuntu:~$
1. 你并不一定需要安裝hadoop, 只需要選擇特定的spark版本即可。你并不需要下載scala, 因?yàn)閟park會(huì)默認(rèn)帶上scala shell. 去spark官網(wǎng)下載, 在沒有hadoop的環(huán)境下, 可以選擇:spark-2.2.1-bin-hadoop2.7, 然后解壓, 如下:
ubuntu@VM-0-15-ubuntu:~/taoge/spark_calc$ ll
total 196436
drwxrwxr-x 3 ubuntu ubuntu 4096 Feb 2 19:57 ./
drwxrwxr-x 9 ubuntu ubuntu 4096 Feb 2 19:54 ../
drwxrwxr-x 13 ubuntu ubuntu 4096 Feb 2 19:58 spark-2.2.1-bin-hadoop2.7/
-rw-r--r-- 1 ubuntu ubuntu 200934340 Feb 2 19:53 spark-2.2.1-bin-hadoop2.7.tgz
2. spark中有python和scala版本的, 下面, 我來(lái)用scala版本的shell, 如下:
ubuntu@VM-0-15-ubuntu:~/taoge/spark_calc/spark-2.2.1-bin-hadoop2.7$ bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/02/02 20:12:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/02 20:12:16 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 172.17.0.15 instead (on interface eth0)
18/02/02 20:12:16 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context Web UI available at http://172.17.0.15:4040
Spark context available as 'sc' (master = local[*], app id = local-1517573538209).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.1
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
來(lái)進(jìn)行簡(jiǎn)單操作:
scala> val lines = sc.textFile("README.md")
lines: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:24
scala> lines.count()
res0: Long = 103
scala> lines.first()
res1: String = # Apache Spark
scala> :quit
ubuntu@VM-0-15-ubuntu:~/taoge/spark_calc/spark-2.2.1-bin-hadoop2.7$
ubuntu@VM-0-15-ubuntu:~/taoge/spark_calc/spark-2.2.1-bin-hadoop2.7$
ubuntu@VM-0-15-ubuntu:~/taoge/spark_calc/spark-2.2.1-bin-hadoop2.7$
ubuntu@VM-0-15-ubuntu:~/taoge/spark_calc/spark-2.2.1-bin-hadoop2.7$ wc -l README.md
103 README.md
ubuntu@VM-0-15-ubuntu:~/taoge/spark_calc/spark-2.2.1-bin-hadoop2.7$ head -n 1 README.md
# Apache Spark
ubuntu@VM-0-15-ubuntu:~/taoge/spark_calc/spark-2.2.1-bin-hadoop2.7$
來(lái)看看可視化的web頁(yè)面, 在Windows上輸入: http://ip:4040
OK, 本文僅僅是簡(jiǎn)單的安裝, 后面我們會(huì)繼續(xù)深入介紹spark.
總結(jié)
以上就是這篇文章的全部?jī)?nèi)容了,希望本文的內(nèi)容對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,謝謝大家對(duì)腳本之家的支持。如果你想了解更多相關(guān)內(nèi)容請(qǐng)查看下面相關(guān)鏈接