Scala Sample Code
Function
Collects the information of female netizens who spend more than 2 hours in online shopping on the weekend from the log files.
Sample Code
The following code segment is only an example. For details, see com.huawei.bigdata.spark.examples.FemaleInfoCollection.
object FemaleInfoCollection { //Table structure, used for mapping the text data to df case class FemaleInfo(name: String, gender: String, stayTime: Int) def main(args: Array[String]) { //Configure the Spark application name. val spark = SparkSession .builder() .appName("FemaleInfo") .config("spark.some.config.option", "some-value") .getOrCreate() import spark.implicits._ //Convert RDD to DataFrame through the implicit conversion, then register a table. spark.sparkContext.textFile(args(0)).map(_.split(",")) .map(p => FemaleInfo(p(0), p(1), p(2).trim.toInt)) .toDF.registerTempTable("FemaleInfoTable") //Use SQL statements to filter the data information of the time that female netizens spend online, and aggregate data of the same name. val femaleTimeInfo = spark.sql("select name,sum(stayTime) as stayTime from FemaleInfoTable where gender = 'female' group by name") //Filter the information of female netizens who spend more than 2 hours online and output the result. val c = femaleTimeInfo.filter("stayTime >= 120").collect().foreach(println) spark.stop() } }
In the preceding code example, data processing logic is implemented by SQL statements. It can also be implemented by invoking the SparkSQL interface in Scala/Java/Python code. For details, see http://spark.apache.org/docs/3.1.1/sql-programming-guide.html#running-sql-queries-programmatically
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.