Develop an Apache Spark application per provided specifications and , using PySpark in Google Colab.

Details

Use the a reference:

  • Create a new notebook in Google Colab
  • Download and upload it to the “Files” section in your Colab notebook (may take a few minutes to upload)
  • Read the Crunchbase Orgs dataset into Spark DataFrame

Implement PySpark code using DataFrames, RDDs or Spark UDF functions:

  1. Find all entities with the name that starts with a letter “F” (e.g. Facebook, etc.):
    • print the count and show() the resulting Spark DataFrame
  2. Find all entities located in New York City:
    • print the count and show() the resulting Spark DataFrame
  3. Add a “Blog” column to the DataFrame with the row entries set to 1 if the “domain” field contains “blogspot.com”, and 0 otherwise.
    • show() only the records with the “Blog” field marked as 1
  4. Find all entities with names that are palindromes (name reads the same way forward and reverse, e.g. madam):
    • print the count and show() the resulting Spark DataFrame 

Testimonials

Apache Spark Distributed Application, using PySpark in Google Colab.
We have updated our contact contact information. Text Us Or WhatsApp Us+1-(309) 295-6991