Scala Sample Code
Function
Collects the information of female netizens who spend more than 2 hours in online shopping on the weekend from the log files.
Sample Code
The following code segment is only an example. For details, see com.huawei.bigdata.spark.examples.FemaleInfoCollection.
object FemaleInfoCollection { //Table structure, used for mapping the text data to df case class FemaleInfo(name: String, gender: String, stayTime: Int) def main(args: Array[String]) { //Configure Spark application name val spark = SparkSession .builder() .appName("FemaleInfo") .config("spark.some.config.option", "some-value") .getOrCreate() import spark.implicits._ //Convert RDD to DataFrame through the implicit conversion, then register table. spark.sparkContext.textFile(args(0)).map(_.split(",")) .map(p => FemaleInfo(p(0), p(1), p(2).trim.toInt)) .toDF.registerTempTable("FemaleInfoTable") //Via SQL statements to screen out the time information of female stay on the Internet , and aggregated the same names. val femaleTimeInfo = spark.sql("select name,sum(stayTime) as stayTime from FemaleInfoTable where gender = 'female' group by name") //Filter information about female netizens who spend more than 2 hours online. val c = femaleTimeInfo.filter("stayTime >= 120").collect().foreach(println) spark.stop() } }
In the preceding code example, data processing logic is implemented by SQL statements. It can also be implemented by invoking the SparkSQL interface in Scala/Java/Python code. For details, see http://spark.apache.org/docs/3.1.1/sql-programming-guide.html#running-sql-queries-programmatically
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot