Updated on 2024-08-10 GMT+08:00

Spark Core Sample Projects (Scala)

Function

Collect statistics on female netizens who dwell on online shopping for more than two hours during weekends.

Sample Code

The following code snippets are used as an example. For complete codes, see the com.huawei.bigdata.spark.examples.FemaleInfoCollection class.

val spark = SparkSession
  .builder()
  .appName("CollectFemaleInfo")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()

//Read data. This code indicates the data path that the input parameter args(0) specifies.
val text = spark.sparkContext.textFile(args(0))
//Filter the data information about the time that female netizens spend online.
val data = text.filter(_.contains("female"))
//Aggregate the time that each female netizen spends online.
val femaleData:RDD[(String,Int)] = data.map{line =>
    val t= line.split(',')
    (t(0),t(2).toInt)
}.reduceByKey(_ + _)
//Filter the information about female netizens who spend more than 2 hours online, and export the results.
val result = femaleData.filter(line => line._2 > 120)
result.collect().map(x => x._1 + ',' + x._2).foreach(println)
spark.stop()