nutch2.1抓取中文网站
2020-12-13 04:56
标签:des style code c tar ext 对nutch添加中文网站抓取功能。 1、中文网页抓取
A、调整mysql配置,避免存入mysql的中文出现乱码。修改
${APACHE_NUTCH_HOME} /runtime/local/conf/gora.properties
############################### #
MySQL properties
# ############################### gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver gora.sqlstore.jdbc.url=jdbc:mysql://10.10.11.252:3306/nutch?
useUnicode=true&characterEncoding=utf8&autoReconnect=true&zeroDateTimeBehavior=convertToNull gora.sqlstore.jdbc.user=devuser gora.sqlstore.jdbc.password=devuser
B、修改 ${APACHE_NUTCH_HOME}
/runtime/local/conf/nutch-site.xml文件
This allows selecting non-English language as default one to
retrieve.
It is a useful setting for search engines build for certain national
group.
property> nutch2.1抓取中文网站,搜素材,soscw.com nutch2.1抓取中文网站 标签:des style code c tar ext 原文地址:http://www.cnblogs.com/haomad/p/3734893.html
上一篇:python实现双向链表
下一篇:Java8基础之泛型