Assure you have selected “Ubuntu Server” as AMI and then choose m4.xlarge as instance type.
Finally, connect the instance to the MV through SSH command:
# ssh -i "keyPairFile.pem" ubuntu@PublicIP
$ ssh -i "keypair.pem" ubuntu@31.145.32.255
Now it is time to upload these files to your bucket (previously created with Amazon S3) using the following command:
$ aws s3 cp Application s3://mybucket/Application --recursive
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt install openjdk-8-jdk
$ sudo apt install scala
$ sudo apt install python
$ sudo curl -O
http://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz
$ sudo tar xvf ./spark-2.2.0-bin-hadoop2.7.tgz
$ sudo mkdir /usr/local/spark
$ sudo cp -r spark-2.2.0-bin-hadoop2.7/* /usr/local/spark
$ pip3 install pandas
$ pip3 install matplotlib
$ pip3 install numpy
export PATH="$PATH:/usr/local/spark/bin"
b) Execute source ~/.profile to update PATH in your current session
$ sudo vi /etc/host
We’ve almost got to the interesting part, but first run the following commands:
$ cd into /Application/src/
$ sh run.sh
Now it is time to try the commands to get the results:
$ python3 results.py 0
$ python3 results.py 1
$ python3 results.py 2
$ python3 results.py 3