Deze pagina is nog niet beschikbaar in uw eigen taal. We werken er hard aan om meer taalversies toe te voegen. Bedankt voor uw steun.

Connecting Spark to OBS

Updated on 2023-12-20 GMT+08:00

Overview

Apache Spark is a fast and general compute engine for processing large-scale data sets.

Prerequisites

Hadoop has been installed. For details, see Connecting Hadoop to OBS.

Precautions

To reduce output logs, add the following configuration to the /opt/spark-2.3.3/conf/log4j.properties file:

log4j.logger.com.obs= ERROR

Procedure

The following uses Spark 2.3.3 as an example.

  1. Download spark-2.3.3-bin-without-hadoop.tgz and decompress it to the /opt/spark-2.3.3 directory.
  2. Add the following content to the /etc/profile file:

    export SPARK_HOME=/opt/spark-2.3.3
    export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

  3. Configure Spark.

    1. Rename spark-env.sh.template under /opt/spark-2.3.3/conf/ as spark-env.sh and add the following configuration:
      export SPARK_DIST_CLASSPATH=$(hadoop classpath)

      For more configurations, see Apache Hadoop.

    2. Rename log4j.properties.template under /opt/spark-2.3.3/conf/ as log4j.properties.

  4. Check whether the connection is successful:

    $SPARK_HOME/bin/run-example org.apache.spark.examples.JavaWordCount obs://obs-bucket/input/test.txt

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback