Deze pagina is nog niet beschikbaar in uw eigen taal. We werken er hard aan om meer taalversies toe te voegen. Bedankt voor uw steun.

On this page

Python Sample Code

Updated on 2022-09-14 GMT+08:00

Function

Collects the information of female netizens who spend more than 2 hours in online shopping on the weekend from the log files.

Sample Code

The following code segment is only an example. For details, see collectFemaleInfo.py.

def contains(str, substr):
  if substr in str:
    return True
  return False

if __name__ == "__main__":
  if len(sys.argv) < 2:
    print "Usage: CollectFemaleInfo <file>"
    exit(-1)

  spark = SparkSession \
    .builder \
    .appName("CollectFemaleInfo") \
    .getOrCreate()

  """
  The following programs are used to implement the following functions: 
  1. Read data. This code indicates the data path that the input parameter argv[1] specifies. - text
  2. Filter data about the time that female netizens spend online. - filter
  3. Aggregate the total time that each female netizen spends online. - map/map/reduceByKey
  4. Filter information about female netizens who spend more than 2 hours online. - filter
  """
  inputPath = sys.argv[1]
  result = spark.read.text(inputPath).rdd.map(lambda r: r[0])\
    .filter(lambda line: contains(line, "female")) \
    .map(lambda line: line.split(',')) \
    .map(lambda dataArr: (dataArr[0], int(dataArr[2]))) \
    .reduceByKey(lambda v1, v2: v1 + v2) \
    .filter(lambda tupleVal: tupleVal[1] > 120) \
    .collect()
  for (k, v) in result:
    print k + "," + str(v)

  # Stop SparkContext.
  spark.stop()
Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback