本文记录Spark编译安装和演示简单demo的详细过程:
- 环境说明
- 编译安装
- Local模式运行demo
[一]、环境说明
- Mac OSX 10.9.2
- Java 1.6.0_65
- Spark 0.9.1
[二]、编译安装
从Spark的官方地址下载源码包:spark-0.9.1.tgz
,解压编译
1 2 3 |
tar -zxvf spark-0.9.1.tgz cd spark-0.9.1 ./sbt/sbt assembly |
该版本默认hadoop的版本是1.0.4 ,如果需要指定hadoop版本,比如2.2.0需要执行如下编译命令:
1 |
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly |
ps:如果之前执行过编译,需要执行 ./sbt/sbt clean
清理后才能重新编译。
默认版本的编译过程日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
micmiu-mbp:spark-0.9.1 micmiu$ ./sbt/sbt assembly Attempting to fetch sbt 0.5% curl: (18) transfer closed with 1101591 bytes remaining to read ######################################################################## 100.0% Launching sbt from sbt/sbt-launch-0.12.4.jar [info] Loading project definition from /Users/micmiu/no_sync/opensource_code/hadoop/spark-0.9.1/project/project [info] Updating {file:/Users/micmiu/no_sync/opensource_code/hadoop/spark-0.9.1/project/project/}default-a315e8... [info] Resolving org.scala-sbt#precompiled-2_10_1;0.12.4 ... [info] Done updating. [info] Compiling 1 Scala source to /Users/micmiu/no_sync/opensource_code/hadoop/spark-0.9.1/project/project/target/scala-2.9.2/sbt-0.12/classes... ...... 省略 [info] Including: hadoop-client-1.0.4.jar ...... 省略 [info] Including: hadoop-client-1.0.4.jar [info] Including: hadoop-core-1.0.4.jar [info] Including: hbase-0.94.6.jar ...................... 省略 by micmiu.com ...................... [info] Checking every *.class/*.jar file's SHA-1. [info] SHA-1: f1b22c5654a3d43095de574d9f0e4c0c56a744fe [info] Packaging /Users/micmiu/no_sync/opensource_code/hadoop/spark-0.9.1/examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar ... [info] Done packaging. [info] Done packaging. [info] Done packaging. [success] Total time: 800 s, completed Apr 11, 2014 1:09:04 AM |
启动Spark shell: ./bin/spark-shell
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
micmiu-mbp:spark-0.9.1 micmiu$ ./bin/spark-shell 14/04/11 01:13:47 INFO spark.HttpServer: Starting HTTP Server 14/04/11 01:13:47 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/04/11 01:13:47 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:62062 Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 0.9.1 /_/ Using Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65) Type in expressions to have them evaluated. Type :help for more information. 14/04/11 01:13:52 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/04/11 01:13:52 INFO Remoting: Starting remoting 14/04/11 01:13:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.1.2:62065] 14/04/11 01:13:52 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.1.2:62065] 14/04/11 01:13:52 INFO spark.SparkEnv: Registering BlockManagerMaster 14/04/11 01:13:52 INFO storage.DiskBlockManager: Created local directory at /var/folders/cc/y1m41gjd64g156v0kjxwscbc0000gn/T/spark-local-20140411011352-43dc 14/04/11 01:13:52 INFO storage.MemoryStore: MemoryStore started with capacity 303.4 MB. 14/04/11 01:13:52 INFO network.ConnectionManager: Bound socket to port 62066 with id = ConnectionManagerId(192.168.1.2,62066) 14/04/11 01:13:52 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/04/11 01:13:52 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager 192.168.1.2:62066 with 303.4 MB RAM 14/04/11 01:13:52 INFO storage.BlockManagerMaster: Registered BlockManager 14/04/11 01:13:52 INFO spark.HttpServer: Starting HTTP Server 14/04/11 01:13:52 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/04/11 01:13:52 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:62067 14/04/11 01:13:52 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.1.2:62067 14/04/11 01:13:52 INFO spark.SparkEnv: Registering MapOutputTracker 14/04/11 01:13:52 INFO spark.HttpFileServer: HTTP File server directory is /var/folders/cc/y1m41gjd64g156v0kjxwscbc0000gn/T/spark-387f4bd8-4802-41bb-a913-7bc8f75034cc 14/04/11 01:13:52 INFO spark.HttpServer: Starting HTTP Server 14/04/11 01:13:52 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/04/11 01:13:52 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:62068 14/04/11 01:13:52 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/04/11 01:13:52 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null} ...... 省略 14/04/11 01:13:52 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/04/11 01:13:52 INFO ui.SparkUI: Started Spark Web UI at http://192.168.1.2:4040 14/04/11 01:13:52 INFO executor.Executor: Using REPL class URI: http://192.168.1.2:62062 2014-04-11 01:13:53.015 java[13946:1003] Unable to load realm info from SCDynamicStore Created spark context.. Spark context available as sc. scala> println("welcome to micmiu.com") welcome to micmiu.com scala> |
ps:看到日志信息包含:Unable to load realm info from SCDynamicStore ,在Mac OSX需要修改jvm参数,vi <SPARK_HOME>conf/spark-env.sh
:
1 |
export SPARK_JAVA_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk" |
[三]、Local模式运行demo
运行官方实现逻辑递归的demo:./bin/run-example org.apache.spark.examples.SparkLR local[2]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
micmiu-mbp:spark-0.9.1 micmiu$ ./bin/run-example org.apache.spark.examples.SparkLR local[2] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/micmiu/no_sync/opensource_code/hadoop/spark-0.9.1/examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/micmiu/no_sync/opensource_code/hadoop/spark-0.9.1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/04/11 01:15:59 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/04/11 01:15:59 INFO Remoting: Starting remoting 14/04/11 01:15:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.1.2:62121] 14/04/11 01:15:59 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.1.2:62121] 14/04/11 01:15:59 INFO spark.SparkEnv: Registering BlockManagerMaster 14/04/11 01:15:59 INFO storage.DiskBlockManager: Created local directory at /var/folders/cc/y1m41gjd64g156v0kjxwscbc0000gn/T/spark-local-20140411011559-7eb8 14/04/11 01:15:59 INFO storage.MemoryStore: MemoryStore started with capacity 74.4 MB. 14/04/11 01:15:59 INFO network.ConnectionManager: Bound socket to port 62122 with id = ConnectionManagerId(192.168.1.2,62122) 14/04/11 01:15:59 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/04/11 01:15:59 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager 192.168.1.2:62122 with 74.4 MB RAM 14/04/11 01:15:59 INFO storage.BlockManagerMaster: Registered BlockManager 14/04/11 01:15:59 INFO spark.HttpServer: Starting HTTP Server 14/04/11 01:15:59 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/04/11 01:15:59 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:62123 14/04/11 01:15:59 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.1.2:62123 14/04/11 01:15:59 INFO spark.SparkEnv: Registering MapOutputTracker ...... 省略 14/04/11 01:16:00 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/04/11 01:16:00 INFO ui.SparkUI: Started Spark Web UI at http://192.168.1.2:4040 14/04/11 01:16:30 INFO spark.SparkContext: Added JAR /Users/micmiu/no_sync/opensource_code/hadoop/spark-0.9.1/examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar at http://192.168.1.2:62124/jars/spark-examples-assembly-0.9.1.jar with timestamp 1397150190914 Initial w: (-0.8066603352924779, -0.5488747509304204, -0.7351625370864459, 0.8228539509375878, -0.6662446067860872, -0.33245457898921527, 0.9664202269036932, -0.20407887461434115, 0.4120993933386614, -0.8125908063470539) On iteration 1 14/04/11 01:16:31 INFO spark.SparkContext: Starting job: reduce at SparkLR.scala:64 14/04/11 01:16:31 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkLR.scala:64) with 2 output partitions (allowLocal=false) 14/04/11 01:16:31 INFO scheduler.DAGScheduler: Final stage: Stage 0 (reduce at SparkLR.scala:64) 14/04/11 01:16:31 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/04/11 01:16:31 INFO scheduler.DAGScheduler: Missing parents: List() 14/04/11 01:16:31 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkLR.scala:62), which has no missing parents 14/04/11 01:16:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkLR.scala:62) 14/04/11 01:16:31 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/04/11 01:16:31 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:16:31 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 551879 bytes in 58 ms 14/04/11 01:16:31 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:16:31 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 551879 bytes in 33 ms 14/04/11 01:16:31 INFO executor.Executor: Running task ID 0 14/04/11 01:16:31 INFO executor.Executor: Running task ID 1 14/04/11 01:16:31 INFO executor.Executor: Fetching http://192.168.1.2:62124/jars/spark-examples-assembly-0.9.1.jar with timestamp 1397150190914 14/04/11 01:16:31 INFO util.Utils: Fetching http://192.168.1.2:62124/jars/spark-examples-assembly-0.9.1.jar to /var/folders/cc/y1m41gjd64g156v0kjxwscbc0000gn/T/fetchFileTemp7745502709566896302.tmp 14/04/11 01:17:02 INFO executor.Executor: Adding file:/var/folders/cc/y1m41gjd64g156v0kjxwscbc0000gn/T/spark-284bd4b9-9b34-42a6-90d4-4d7ac29bd886/spark-examples-assembly-0.9.1.jar to class loader 14/04/11 01:17:02 INFO spark.CacheManager: Partition rdd_0_0 not found, computing it 14/04/11 01:17:02 INFO spark.CacheManager: Partition rdd_0_1 not found, computing it 14/04/11 01:17:02 INFO storage.MemoryStore: ensureFreeSpace(734706) called with curMem=0, maxMem=77974732 14/04/11 01:17:02 INFO storage.MemoryStore: Block rdd_0_0 stored as values to memory (estimated size 717.5 KB, free 73.7 MB) 14/04/11 01:17:02 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Added rdd_0_0 in memory on 192.168.1.2:62122 (size: 717.5 KB, free: 73.7 MB) 14/04/11 01:17:02 INFO storage.BlockManagerMaster: Updated info of block rdd_0_0 14/04/11 01:17:02 INFO storage.MemoryStore: ensureFreeSpace(734706) called with curMem=734706, maxMem=77974732 14/04/11 01:17:02 INFO storage.MemoryStore: Block rdd_0_1 stored as values to memory (estimated size 717.5 KB, free 73.0 MB) 14/04/11 01:17:02 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Added rdd_0_1 in memory on 192.168.1.2:62122 (size: 717.5 KB, free: 73.0 MB) 14/04/11 01:17:02 INFO storage.BlockManagerMaster: Updated info of block rdd_0_1 14/04/11 01:17:02 INFO executor.Executor: Serialized size of result for 0 is 728 14/04/11 01:17:02 INFO executor.Executor: Sending result for 0 directly to driver 14/04/11 01:17:02 INFO executor.Executor: Finished task ID 0 14/04/11 01:17:02 INFO executor.Executor: Serialized size of result for 1 is 728 14/04/11 01:17:02 INFO executor.Executor: Sending result for 1 directly to driver 14/04/11 01:17:02 INFO executor.Executor: Finished task ID 1 14/04/11 01:17:02 INFO scheduler.TaskSetManager: Finished TID 0 in 31523 ms on localhost (progress: 1/2) 14/04/11 01:17:02 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0) 14/04/11 01:17:02 INFO scheduler.TaskSetManager: Finished TID 1 in 31463 ms on localhost (progress: 2/2) 14/04/11 01:17:02 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1) 14/04/11 01:17:02 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkLR.scala:64) finished in 31.537 s 14/04/11 01:17:02 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/04/11 01:17:02 INFO spark.SparkContext: Job finished: reduce at SparkLR.scala:64, took 31.795501 s On iteration 2 14/04/11 01:17:02 INFO spark.SparkContext: Starting job: reduce at SparkLR.scala:64 14/04/11 01:17:02 INFO scheduler.DAGScheduler: Got job 1 (reduce at SparkLR.scala:64) with 2 output partitions (allowLocal=false) 14/04/11 01:17:02 INFO scheduler.DAGScheduler: Final stage: Stage 1 (reduce at SparkLR.scala:64) 14/04/11 01:17:02 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/04/11 01:17:02 INFO scheduler.DAGScheduler: Missing parents: List() 14/04/11 01:17:02 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[2] at map at SparkLR.scala:62), which has no missing parents 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[2] at map at SparkLR.scala:62) 14/04/11 01:17:03 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 2 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 551875 bytes in 25 ms 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID 3 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as 551875 bytes in 8 ms 14/04/11 01:17:03 INFO executor.Executor: Running task ID 3 14/04/11 01:17:03 INFO executor.Executor: Running task ID 2 14/04/11 01:17:03 INFO storage.BlockManager: Found block rdd_0_0 locally 14/04/11 01:17:03 INFO storage.BlockManager: Found block rdd_0_1 locally 14/04/11 01:17:03 INFO executor.Executor: Serialized size of result for 3 is 728 14/04/11 01:17:03 INFO executor.Executor: Sending result for 3 directly to driver 14/04/11 01:17:03 INFO executor.Executor: Finished task ID 3 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Finished TID 3 in 187 ms on localhost (progress: 1/2) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1) 14/04/11 01:17:03 INFO executor.Executor: Serialized size of result for 2 is 728 14/04/11 01:17:03 INFO executor.Executor: Sending result for 2 directly to driver 14/04/11 01:17:03 INFO executor.Executor: Finished task ID 2 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Completed ResultTask(1, 0) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Finished TID 2 in 217 ms on localhost (progress: 2/2) 14/04/11 01:17:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Stage 1 (reduce at SparkLR.scala:64) finished in 0.217 s 14/04/11 01:17:03 INFO spark.SparkContext: Job finished: reduce at SparkLR.scala:64, took 0.338009 s On iteration 3 14/04/11 01:17:03 INFO spark.SparkContext: Starting job: reduce at SparkLR.scala:64 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Got job 2 (reduce at SparkLR.scala:64) with 2 output partitions (allowLocal=false) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Final stage: Stage 2 (reduce at SparkLR.scala:64) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Missing parents: List() 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Submitting Stage 2 (MappedRDD[3] at map at SparkLR.scala:62), which has no missing parents 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 2 (MappedRDD[3] at map at SparkLR.scala:62) 14/04/11 01:17:03 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 4 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 551876 bytes in 6 ms 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Starting task 2.0:1 as TID 5 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Serialized task 2.0:1 as 551876 bytes in 7 ms 14/04/11 01:17:03 INFO executor.Executor: Running task ID 4 14/04/11 01:17:03 INFO executor.Executor: Running task ID 5 14/04/11 01:17:03 INFO storage.BlockManager: Found block rdd_0_0 locally 14/04/11 01:17:03 INFO storage.BlockManager: Found block rdd_0_1 locally 14/04/11 01:17:03 INFO executor.Executor: Serialized size of result for 5 is 728 14/04/11 01:17:03 INFO executor.Executor: Sending result for 5 directly to driver 14/04/11 01:17:03 INFO executor.Executor: Finished task ID 5 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Finished TID 5 in 29 ms on localhost (progress: 1/2) 14/04/11 01:17:03 INFO executor.Executor: Serialized size of result for 4 is 728 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Completed ResultTask(2, 1) 14/04/11 01:17:03 INFO executor.Executor: Sending result for 4 directly to driver 14/04/11 01:17:03 INFO executor.Executor: Finished task ID 4 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Finished TID 4 in 39 ms on localhost (progress: 2/2) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Completed ResultTask(2, 0) 14/04/11 01:17:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Stage 2 (reduce at SparkLR.scala:64) finished in 0.039 s 14/04/11 01:17:03 INFO spark.SparkContext: Job finished: reduce at SparkLR.scala:64, took 0.057357 s On iteration 4 14/04/11 01:17:03 INFO spark.SparkContext: Starting job: reduce at SparkLR.scala:64 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Got job 3 (reduce at SparkLR.scala:64) with 2 output partitions (allowLocal=false) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Final stage: Stage 3 (reduce at SparkLR.scala:64) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Missing parents: List() 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Submitting Stage 3 (MappedRDD[4] at map at SparkLR.scala:62), which has no missing parents 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 3 (MappedRDD[4] at map at SparkLR.scala:62) 14/04/11 01:17:03 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Starting task 3.0:0 as TID 6 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Serialized task 3.0:0 as 551879 bytes in 7 ms 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Starting task 3.0:1 as TID 7 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Serialized task 3.0:1 as 551879 bytes in 6 ms 14/04/11 01:17:03 INFO executor.Executor: Running task ID 6 14/04/11 01:17:03 INFO executor.Executor: Running task ID 7 14/04/11 01:17:03 INFO storage.BlockManager: Found block rdd_0_0 locally 14/04/11 01:17:03 INFO storage.BlockManager: Found block rdd_0_1 locally 14/04/11 01:17:03 INFO executor.Executor: Serialized size of result for 7 is 728 14/04/11 01:17:03 INFO executor.Executor: Serialized size of result for 6 is 728 14/04/11 01:17:03 INFO executor.Executor: Sending result for 7 directly to driver 14/04/11 01:17:03 INFO executor.Executor: Sending result for 6 directly to driver 14/04/11 01:17:03 INFO executor.Executor: Finished task ID 6 14/04/11 01:17:03 INFO executor.Executor: Finished task ID 7 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Completed ResultTask(3, 1) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Finished TID 7 in 29 ms on localhost (progress: 1/2) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Completed ResultTask(3, 0) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Finished TID 6 in 38 ms on localhost (progress: 2/2) 14/04/11 01:17:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Stage 3 (reduce at SparkLR.scala:64) finished in 0.039 s 14/04/11 01:17:03 INFO spark.SparkContext: Job finished: reduce at SparkLR.scala:64, took 0.049715 s On iteration 5 14/04/11 01:17:03 INFO spark.SparkContext: Starting job: reduce at SparkLR.scala:64 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Got job 4 (reduce at SparkLR.scala:64) with 2 output partitions (allowLocal=false) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Final stage: Stage 4 (reduce at SparkLR.scala:64) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Missing parents: List() 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Submitting Stage 4 (MappedRDD[5] at map at SparkLR.scala:62), which has no missing parents 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[5] at map at SparkLR.scala:62) 14/04/11 01:17:03 INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 2 tasks 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Starting task 4.0:0 as TID 8 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 551876 bytes in 6 ms 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Starting task 4.0:1 as TID 9 on executor localhost: localhost (PROCESS_LOCAL) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Serialized task 4.0:1 as 551876 bytes in 6 ms 14/04/11 01:17:03 INFO executor.Executor: Running task ID 8 14/04/11 01:17:03 INFO executor.Executor: Running task ID 9 14/04/11 01:17:03 INFO storage.BlockManager: Found block rdd_0_0 locally 14/04/11 01:17:03 INFO storage.BlockManager: Found block rdd_0_1 locally 14/04/11 01:17:03 INFO executor.Executor: Serialized size of result for 8 is 728 14/04/11 01:17:03 INFO executor.Executor: Sending result for 8 directly to driver 14/04/11 01:17:03 INFO executor.Executor: Serialized size of result for 9 is 728 14/04/11 01:17:03 INFO executor.Executor: Finished task ID 8 14/04/11 01:17:03 INFO executor.Executor: Sending result for 9 directly to driver 14/04/11 01:17:03 INFO executor.Executor: Finished task ID 9 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Completed ResultTask(4, 0) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Finished TID 8 in 27 ms on localhost (progress: 1/2) 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Completed ResultTask(4, 1) 14/04/11 01:17:03 INFO scheduler.TaskSetManager: Finished TID 9 in 20 ms on localhost (progress: 2/2) 14/04/11 01:17:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 14/04/11 01:17:03 INFO scheduler.DAGScheduler: Stage 4 (reduce at SparkLR.scala:64) finished in 0.028 s 14/04/11 01:17:03 INFO spark.SparkContext: Job finished: reduce at SparkLR.scala:64, took 0.036149 s Final w: (5816.075967498865, 5222.008066011391, 5754.751978607454, 3853.1772062206846, 5593.565827145932, 5282.387874201054, 3662.9216051953435, 4890.78210340607, 4223.371512250292, 5767.368579668863) micmiu-mbp:spark-0.9.1 micmiu$ |
参考:
- https://spark.apache.org/screencasts/1-first-steps-with-spark.html
本文介绍到此结束。
—————– EOF @Michael Sun —————–
原创文章,转载请注明: 转载自micmiu – 软件开发+生活点滴[ http://www.micmiu.com/ ]
本文链接地址: http://www.micmiu.com/bigdata/spark/spark-build-run/
0 条评论。