使用Maven打包和运行Scala Spark项目

12 浏览
0 Comments

使用Maven打包和运行Scala Spark项目

我正在用Scala编写一个应用程序,使用Spark。我正在使用Maven打包应用程序,但在构建\"uber\"或\"fat\" jar时遇到问题。\n我面临的问题是,在IDE内运行应用程序或者将非uber-jar版本的依赖项作为java类路径提供时,应用程序可以正常运行,但如果将uber-jar作为类路径,即:\njava -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt\n无法工作。我收到以下错误消息:\nERROR SparkContext: Error initializing SparkContext.\ncom.typesafe.config.ConfigException$Missing: No configuration setting found for key \'akka.version\'\n在pom.xml文件中添加什么内容以及为什么需要添加它才能使其正常工作,我真的很感谢帮助。\n我在网上搜索并找到了以下资源,我尝试过(在pom中看到),但无法使其正常工作:\n1)Spark用户邮件列表:http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-td5615.html\n2)如何打包spark scala应用程序\n我有一个简单的示例来演示这个问题,一个简单的1类项目(src/main/scala/debug/spark_example/Example.scala):\npackage debug.spark_example\nimport org.apache.spark.{SparkConf, SparkContext}\nobject Example {\n def main(args: Array[String]): Unit = {\n val sc = new SparkContext(new SparkConf().setAppName(\"Test\").setMaster(\"local[2]\"))\n val lines = sc.textFile(args(0))\n val lineLengths = lines.map(s => s.length)\n val totalLength = lineLengths.reduce((a, b) => a + b)\n lineLengths.foreach(println)\n println(totalLength)\n }\n }\n下面是pom.xml文件:\n\n 4.0.0\n debug.spark-example\n spark-example\n 0.1-SNAPSHOT\n 2015\n \n 2.11\n .2\n 1.4.1\n \n \n \n scala-tools.org\n Scala-Tools Maven2 Repository\n http://scala-tools.org/repo-releases\n \n \n \n \n scala-tools.org\n Scala-Tools Maven2 Repository\n http://scala-tools.org/repo-releases\n \n \n \n \n org.scala-lang\n scala-library\n ${scala.majorVersion}${scala.minorVersion}\n \n \n org.apache.spark\n spark-core_${scala.majorVersion}\n ${spark.version}\n \n \n \nsrc/main/scala\n \n \n org.scala-tools\n maven-scala-plugin\n \n \n \n compile\n testCompile\n \n \n \n \n \n org.apache.maven.plugins\n maven-eclipse-plugin\n \n true\n \n ch.epfl.lamp.sdt.core.scalabuilder\n \n \n ch.epfl.lamp.sdt.core.scalanature\n \n \n org.eclipse.jdt.launching.JRE_CONTAINER\n ch.epfl.lamp.sdt.launching.SCALA_CONTAINER\n \n \n \n \n maven-assembly-plugin\n 2.4\n \n \n make-assembly\n package\n \n attached\n \n \n \n \n gnu\n \n jar-with-dependencies\n \n \n \n \n org.apache.maven.plugins\n maven-shade-plugin\n 2.2\n \n \n package\n \n shade\n \n \n false\n false\n \n \n *:*\n \n \n \n \n *:*\n \n META-INF/*.SF\n META-INF/*.DSA\n META-INF/*.RSA\n \n \n \n \n \n reference.conf\n \n \n \n \n \n \n \n org.apache.maven.plugins\n maven-surefire-plugin\n 2.7\n \n true\n \n \n \n\n\n \n \n org.scala-tools\n maven-scala-plugin\n \n \n\n\n非常感谢您的帮助。

0
0 Comments

问题的出现原因是使用Shade时未指定转换器。

解决方法是在pom.xml文件中添加以下代码:

<transformer
    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
    <resource>reference.conf</resource>
</transformer>
<transformer
    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
    <manifestEntries>
        <Main-Class>akka.Main</Main-Class>
    </manifestEntries>
</transformer>

0
0 Comments

问题原因:可能是maven插件的顺序问题,项目中同时使用了"maven-assembly-plugin"和"maven-shade-plugin"插件,并且它们都绑定在maven生命周期的相同阶段。当这种情况发生时,maven会按照插件在插件部分中出现的顺序执行它们,所以在这种情况下,先执行assembly插件,然后执行shade插件。

解决方法:根据你尝试运行的输出jar和你的shade转换,你可能希望相反的顺序。然而,对于你的用例,你可能甚至不需要assembly插件。你可以尝试使用target/spark-example-0.1-SNAPSHOT-shaded.jar文件。

感谢你的答案Jeff。我仍然无法让它正常工作。我尝试了反转插件的顺序并使用了shaded jar。反转插件的顺序并没有改变uber jar。当使用shaded jar时,错误信息是:akka.ConfigurationException: Type [akka.dispatch.BoundedControlAwareMessageQueueSemantics] specified as akka.actor.mailbox.requirement [akka.actor.mailbox.bounded-control-aware-queue-based] in config can't be loaded due to [akka.dispatch.BoundedControlAwareMessageQueueSemantics]

0
0 Comments

问题:如何使用Maven打包和运行Scala Spark项目?

原因:必须使用Spark submit脚本来运行程序,而不是使用java命令。另外,似乎只使用jar-with-dependencies而不是shaded jar也可以正常工作。

解决方法:使用以下pom.xml文件配置Maven项目的依赖和构建插件,然后使用Spark submit脚本来运行程序。


    4.0.0
    debug.spark-example
    spark-example
    0.1-SNAPSHOT
    2015
    
        2.11
        .2
        1.4.1
    
    
        
            scala-tools.org
            Scala-Tools Maven2 Repository
            http://scala-tools.org/repo-releases
        
    
    
        
            scala-tools.org
            Scala-Tools Maven2 Repository
            http://scala-tools.org/repo-releases
        
    
    
        
            org.scala-lang
            scala-library
            ${scala.majorVersion}${scala.minorVersion}
        
        
            org.apache.spark
            spark-core_${scala.majorVersion}
            ${spark.version}
        
    
    
        src/main/scala
        
            
                org.scala-tools
                maven-scala-plugin
                
                    
                        
                            compile
                            testCompile
                        
                    
                
            
            
                org.apache.maven.plugins
                maven-eclipse-plugin
                
                    true
                    
                        ch.epfl.lamp.sdt.core.scalabuilder
                    
                    
                        ch.epfl.lamp.sdt.core.scalanature
                    
                
                    org.eclipse.jdt.launching.JRE_CONTAINER
                    ch.epfl.lamp.sdt.launching.SCALA_CONTAINER
                
                
            
            
                maven-assembly-plugin
                2.4
                
                
                    make-assembly
                    package
                    
                        attached
                    
                
            
            
                gnu
                
                    jar-with-dependencies
                
            
        
        
            org.apache.maven.plugins
            maven-surefire-plugin
            2.7
            
                true
            
        
    


    
        
            org.scala-tools
            maven-scala-plugin
        
    


使用Spark submit脚本运行程序的命令如下:

/spark-1.4.1/bin/spark-submit --class debug.spark_example.Example --master local[2] target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar data.txt

0