本文的主要目标是描述如何在IntelliJ IDEA 开发工具中导入Nutch源码(以Nutch2.x为例),配置实现Nutch的开发环境。
目录
- 环境参数
- 源码下载
- 配置编译
[……]
本文的主要目标是描述如何在IntelliJ IDEA 开发工具中导入Nutch源码(以Nutch2.x为例),配置实现Nutch的开发环境。
目录
[……]
下面演示的过程是基于目前 Nutch 2.2.1 自己编译配置的版本。
在编译后 bin目录下有两个脚本文件:nutch
和 crawl
,在命令行下执行各命令即可查看具体使用说明:
[cra[……]
Nutch 抓取时错误提示信息:
1 2 3 4 5 6 7 8 9 10 |
FetcherJob: starting FetcherJob: batchId: 1420598193-2940 Fetcher: No agents listed in 'http.agent.name' property. Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: No agents listed in 'http.agent.name' property. at org.apache.nutch.fetcher.FetcherJob.checkConfiguration(FetcherJob.java:240) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:152) at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:219) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:301) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:307) |
原因: 没有配置 http.agent.name
属性值
解决办法: 打开 $NUTCH_[......]
下载github上Nutch源码 2.x 分支 编译有错误信息:
1 2 3 4 |
[javac] /Users/micmiu/no_sync/opensource_code/nutch/nutch-src-github/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:111: cannot access java.lang.AutoCloseable [javac] class file for java.lang.AutoCloseable not found [javac] client = node.client(); [javac] ^ |
解决办法:
指定 jdk1.7+ 重新编译即可。
———[……]
Nutch 2.2.1 编译安装后,执行 nutch inject
命令后报错信息如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
micmiu@micmiu-mbp: ~/tmp/nutch $ nutch inject urls -crawlId micmiublog InjectorJob: starting at 2014-12-31 14:13:19 InjectorJob: Injecting urlDir: urls InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Not a host:port pair: ?6460@micmiu-mbp.local192.168.1.100,62142,1420001742333 at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221) at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282) Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Not a host:port pair: ?6460@micmiu-mbp.local192.168.1.100,62142,1420001742333 at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:127) at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) ... 7 more Caused by: java.lang.IllegalArgumentException: Not a host:port pair: ?6460@micmiu-mbp.local192.168.1.100,62142,1420001742333 at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:60) at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:354) at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94) at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:109) ... 9 more |
一般这样的错误信息是由于 $NUTCH_[......]
目录
[一]、概述
JBoss IIOP可以支持CORBA/IIOP访问部署在JBoss应用服务器中EJB规范定义的企业Bean。下面的两张方法都是有[……]
近期评论