Spark MLlib 贝叶斯分类算法实例具体代码及运行过程详解

Stella981
• 阅读 638
import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel}
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint


//数据格式:类别,特征1 特征2 特征3
//0,1 0 1
//1,0 2 0

object tmp_naive_bayes {

  def main(args: Array[String]){
    //1、构建spark对象
    val conf = new SparkConf().setAppName("naive_bayes").setMaster("local")
    val sc = new SparkContext(conf)
    Logger.getRootLogger.setLevel(Level.WARN)


    //2、读取数据样本
    val data = sc.textFile("C://Users/wpguoc/Desktop/Spark MLlib/navie_bayes_data.txt")
    val parsedData = data.map{ line =>
      val parts = line.split(',')
      LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
    }


    //3、样本数据划分训练样本和测试样本
    val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)
    val tran = splits(0)
    val test = splits(1)


    //4、新建贝叶斯分类模型,并训练
    val model = NaiveBayes.train(tran, lambda = 1.0, modelType = "multinomial")


    //5、对测试样本进行测试
    val predictionAndLabel = test.map(p => (model.predict(p.features), p.label))
    val print_predict = predictionAndLabel.take(20)
    println("贝叶斯分类结果:" + "\n" + "prediction" + "\t" + "label")
    for(i <- 0 to print_predict.length - 1){
      println(print_predict(i)._1+"\t\t\t"+print_predict(i)._2)
    }

    val accuracy = 1.0*predictionAndLabel.filter(x =>x._1 == x._2).count() / test.count()
    println("贝叶斯分类精度" + "\n" + "accuracy: Double = " + accuracy)


    //6、保存模型
    val ModelPath = "C://Users/wpguoc/Desktop/Spark_MLlib/"
    model.save(sc, ModelPath)
    val sameModel = NaiveBayesModel.load(sc, ModelPath)

  }

}

运行过程及结果  

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/12/19 17:32:04 INFO SparkContext: Running Spark version 1.6.3
18/12/19 17:32:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/12/19 17:32:05 INFO SecurityManager: Changing view acls to: wpguoc
18/12/19 17:32:05 INFO SecurityManager: Changing modify acls to: wpguoc
18/12/19 17:32:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(wpguoc); users with modify permissions: Set(wpguoc)
18/12/19 17:32:05 INFO Utils: Successfully started service 'sparkDriver' on port 63424.
18/12/19 17:32:06 INFO Slf4jLogger: Slf4jLogger started
18/12/19 17:32:06 INFO Remoting: Starting remoting
18/12/19 17:32:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.66.80:63437]
18/12/19 17:32:06 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 63437.
18/12/19 17:32:06 INFO SparkEnv: Registering MapOutputTracker
18/12/19 17:32:06 INFO SparkEnv: Registering BlockManagerMaster
18/12/19 17:32:06 INFO DiskBlockManager: Created local directory at C:\Users\wpguoc\AppData\Local\Temp\blockmgr-4d798e34-90b0-4ee9-a811-586a893f4818
18/12/19 17:32:06 INFO MemoryStore: MemoryStore started with capacity 1127.3 MB
18/12/19 17:32:06 INFO SparkEnv: Registering OutputCommitCoordinator
18/12/19 17:32:06 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/12/19 17:32:06 INFO SparkUI: Started SparkUI at http://192.168.66.80:4040
18/12/19 17:32:06 INFO Executor: Starting executor ID driver on host localhost
18/12/19 17:32:06 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63444.
18/12/19 17:32:06 INFO NettyBlockTransferService: Server created on 63444
18/12/19 17:32:06 INFO BlockManagerMaster: Trying to register BlockManager
18/12/19 17:32:06 INFO BlockManagerMasterEndpoint: Registering block manager localhost:63444 with 1127.3 MB RAM, BlockManagerId(driver, localhost, 63444)
18/12/19 17:32:06 INFO BlockManagerMaster: Registered BlockManager
18/12/19 17:32:10 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
18/12/19 17:32:10 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
贝叶斯分类结果:
prediction    label
0.0            0.0
0.0            0.0
2.0            2.0
2.0            2.0
2.0            2.0
贝叶斯分类精度
accuracy: Double = 1.0
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
18/12/19 17:32:14 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

Process finished with exit code 0
点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
Easter79 Easter79
3年前
swap空间的增减方法
(1)增大swap空间去激活swap交换区:swapoff v /dev/vg00/lvswap扩展交换lv:lvextend L 10G /dev/vg00/lvswap重新生成swap交换区:mkswap /dev/vg00/lvswap激活新生成的交换区:swapon v /dev/vg00/lvswap
Wesley13 Wesley13
3年前
Java获得今日零时零分零秒的时间(Date型)
publicDatezeroTime()throwsParseException{    DatetimenewDate();    SimpleDateFormatsimpnewSimpleDateFormat("yyyyMMdd00:00:00");    SimpleDateFormatsimp2newS
Stella981 Stella981
3年前
Python之time模块的时间戳、时间字符串格式化与转换
Python处理时间和时间戳的内置模块就有time,和datetime两个,本文先说time模块。关于时间戳的几个概念时间戳,根据1970年1月1日00:00:00开始按秒计算的偏移量。时间元组(struct_time),包含9个元素。 time.struct_time(tm_y
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Wesley13 Wesley13
3年前
Java日期时间API系列36
  十二时辰,古代劳动人民把一昼夜划分成十二个时段,每一个时段叫一个时辰。二十四小时和十二时辰对照表:时辰时间24时制子时深夜11:00凌晨01:0023:0001:00丑时上午01:00上午03:0001:0003:00寅时上午03:00上午0
Wesley13 Wesley13
3年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Stella981 Stella981
3年前
Docker 部署SpringBoot项目不香吗?
  公众号改版后文章乱序推荐,希望你可以点击上方“Java进阶架构师”,点击右上角,将我们设为★“星标”!这样才不会错过每日进阶架构文章呀。  !(http://dingyue.ws.126.net/2020/0920/b00fbfc7j00qgy5xy002kd200qo00hsg00it00cj.jpg)  2
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
Python进阶者 Python进阶者
1年前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这
美凌格栋栋酱 美凌格栋栋酱
2小时前
Oracle 分组与拼接字符串同时使用
SELECTT.,ROWNUMIDFROM(SELECTT.EMPLID,T.NAME,T.BU,T.REALDEPART,T.FORMATDATE,SUM(T.S0)S0,MAX(UPDATETIME)CREATETIME,LISTAGG(TOCHAR(