Skip to content
Harish Chandra Thuwal edited this page Oct 16, 2017 · 1 revision

Download:

Python Environment

  • Clone repo: git clone https://github.com/dufferzafar/github-analytics

  • Create virutalenv: venv -p /usr/bin/python3.5 env

  • Install dependencies: env/bin/pip install -r requirements.txt

Spark Installation and setup

  1. wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz

  2. tar xzf spark-2.2.0-bin-hadoop2.7.tgz

  3. This is optional

    sudo mv spark-2.2.0-bin-hadoop2.7.tgz /usr/local/spark

  4. Set environment variables i.e add following lines to .bashrc

    • export SPARK_HOME=/usr/local/spark (or path to extracted spark folder)
    • export PATH=$PATH:$SPARK_HOME/bin
  5. Run the command in the terminal to open spark with python shell

    pyspark

  6. Run the command in the terminal to open spark with scala shell

    spark-shell

  7. After running any of the above two commands the spark GUI can be accesed using

    http://localhost:4040

  8. Sample SparkPi Program (to calculate digits of pi)

    run-example org.apache.spark.examples.SparkPi

Clone this wiki locally