常用的分类器有逻辑斯蒂回归分类器和决策树分类器,本文将阐述在scala中使用逻辑斯蒂回归完成判断是否有某字段。
import org.apache.spark.ml.feature._
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.{Pipeline,PipelineModel}
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
val training = spark.createDataFrame(Seq( (0L, "a b c d e spark", 1.0), (1L, "b d", 0.0), (2L, "spark f g h", 1.0), (3L, "hadoop mapreduce", 0.0),(4L, "apache spark",1.0),(5L, "hello spark",1.0))).toDF("id", "text", "label")
val tokenizer = new Tokenizer().setInputCol("text").setOutputCol("words")
val hashingTF = new HashingTF().setNumFeatures(1000).setInputCol(tokenizer.getOutputCol).setOutputCol("features")
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01)
val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr))
val model = pipeline.fit(training)
val test = spark.createDataFrame(Seq((4L, "spark i j k"),(5L, "l m n"),(6L, "spark a"),(7L, "apache hadoop"),(8L, "apache hadoop"),(9L, "apache spark"),(10L, "apache hadoop"),(11L, "apache spark hadoop"))).toDF("id", "text")
model.transform(test).select("id", "text", "probability", "prediction").collect().foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) =>println(s"($id, $text) --> prob=$prob, prediction=$prediction")}
观察分析结果:
(4, spark i j k) --> prob=[0.06514785966116181,0.9348521403388381], prediction=1.0
(5, l m n) --> prob=[0.6594623918792804,0.3405376081207197], prediction=0.0
(6, spark a) --> prob=[0.016899270159272606,0.9831007298407275], prediction=1.0
(7, apache hadoop) --> prob=[0.672723276314924,0.3272767236850759], prediction=0.0
(8, apache hadoop) --> prob=[0.672723276314924,0.3272767236850759], prediction=0.0
(9, apache spark) --> prob=[0.013955984619361126,0.9860440153806388], prediction=1.0
(10, apache hadoop) --> prob=[0.672723276314924,0.3272767236850759], prediction=0.0
(11, apache spark hadoop) --> prob=[0.06887499770773359,0.9311250022922664], prediction=1.0
由分析结果可见,我们建立的模型完成了句子中含有spark的预测。
原文:https://www.cnblogs.com/alichengxuyuan/p/12576831.html