Scala Sample Code
Function
Collects the information of female netizens who spend more than 2 hours in online shopping on the weekend from the log files.
Sample Code
The following code segment is only an example. For details, see com.huawei.bigdata.spark.examples.FemaleInfoCollection.
object FemaleInfoCollection { //Table structure, used for mapping the text data to df case class FemaleInfo(name: String, gender: String, stayTime: Int) def main(args: Array[String]) { //Configure Spark application name val spark = SparkSession .builder() .appName("FemaleInfo") .config("spark.some.config.option", "some-value") .getOrCreate() import spark.implicits._ //Convert RDD to DataFrame through the implicit conversion, then register table. spark.sparkContext.textFile(args(0)).map(_.split(",")) .map(p => FemaleInfo(p(0), p(1), p(2).trim.toInt)) .toDF.registerTempTable("FemaleInfoTable") //Via SQL statements to screen out the time information of female stay on the Internet , and aggregated the same names. val femaleTimeInfo = spark.sql("select name,sum(stayTime) as stayTime from FemaleInfoTable where gender = 'female' group by name") //Filter information about female netizens who spend more than 2 hours online. val c = femaleTimeInfo.filter("stayTime >= 120").collect().foreach(println) spark.stop() } }
In the preceding code example, data processing logic is implemented by SQL statements. It can also be implemented by invoking the SparkSQL interface in Scala/Java/Python code. For details, see http://spark.apache.org/docs/3.1.1/sql-programming-guide.html#running-sql-queries-programmatically
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.