纵有疾风起
人生不言弃

Spark源码编译(未完待续)

在这里我们不需要搭建独立的Spark集群,利用Yarn Client调用Hadoop集群的计算资源。

Spark源码编译生成配置包:

解压源码,在根去根目录下执行以下命令(sbt编译我没尝试)

./make-distribution.sh –hadoop 2.4.0 –with-yarn –tgz –with-hive

几个重要参数

–hadoop :指定Hadoop版本 

–with-yarn yarn支持是必须的

–with-hive 读取hive数据也是必须的,反正我很讨厌Shark,以后开发们可以在Spark上自己封装SQL&HQL客户端,也是个不错的选择。

#      –tgz: Additionally creates spark-$VERSION-bin.tar.gz
#      –hadoop VERSION: Builds against specified version of Hadoop.
#      –with-yarn: Enables support for Hadoop YARN.
#      –with-hive: Enable support for reading Hive tables.
#      –name: A moniker for the release target. Defaults to the Hadoop verison

测试:

SPARK_JAR=”hdfs://master001.bj:9000/jar/spark/spark-assembly-1.0.0-hadoop2.4.0.jar” \
./bin/spark-class org.apache.spark.deploy.yarn.Client \
–jar ./lib/spark-examples-1.0.0-hadoop2.4.0.jar \
–class org.apache.spark.examples.JavaWordCount \
–args hdfs://master001.bj:9000/temp/read.txt \
–num-executors 50 \
–executor-cores 1 \
–driver-memory 2048M \
–executor-memory 1000M \
–name “word count on spark”

 

生成jar包,用于Spark应用程序开发

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

 

其余的编译方式参考:http://www.tuicool.com/articles/q6faMv2

Spark1.0.0 的四种编译方法 

 

文章转载于:https://www.cnblogs.com/kxdblog/p/4503562.html

原著是一个有趣的人,若有侵权,请通知删除

未经允许不得转载:起风网 » Spark源码编译(未完待续)
分享到: 生成海报

评论 抢沙发

评论前必须登录!

立即登录