Scala Example Code
Function
Collects the information of female netizens who spend more than 2 hours in online shopping on the weekend from the log files.
Example Code
The following code segment is only an example. For details, see com.huawei.bigdata.spark.examples.FemaleInfoCollection.
object FemaleInfoCollection
{
//Table structure, used for mapping the text data to df
case class FemaleInfo(name: String, gender: String, stayTime: Int)
def main(args: Array[String]) {
//Configure Spark application name
val spark = SparkSession
.builder()
.appName("FemaleInfo")
.config("spark.some.config.option", "some-value")
.getOrCreate()
import spark.implicits._
//Convert RDD to DataFrame through the implicit conversion, then register table.
spark.sparkContext.textFile(args(0)).map(_.split(","))
.map(p => FemaleInfo(p(0), p(1), p(2).trim.toInt))
.toDF.registerTempTable("FemaleInfoTable")
//Via SQL statements to screen out the time information of female stay on the Internet , and aggregated the same names.
val femaleTimeInfo =spark.sql("select name,sum(stayTime) as stayTime from FemaleInfoTable where gender = 'female' group by name")
//Filter information about female netizens who spend more than 2 hours online.
val c = femaleTimeInfo.filter("stayTime >= 120").collect().foreach(println)
spark.stop()
}
}
In the preceding code example, data processing logic is implemented by SQL statements. It can also be implemented by invoking the SparkSQL interface in Scala/Java/Python code. For details, see http://spark.apache.org/docs/3.1.1/sql-programming-guide.html#running-sql-queries-programmatically
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.