POC: Jobs recommendation on Apache Spark and BigDL

What is BigDL?

BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.

Deploy BigDL application on Databricks cloud

Step1 build analytics-zoo jar clone analytics zoo to local: git clone https://github.com/intel-analytics/analytics-zoo build job2career-with-dependencies.jar: mvn clean install -DskipTests
Step 2 login on to web with credentials
Step 3 setup cluster
- Clusters -> create cluster
- give a name “intel” set up workers 1, uncheck auto scaling.
- Set up spark configuration here, for example
  - spark.executor.cores 4
  - spark.cores.max 4
  - spark.shuffle.reduceLocality.enabled false
  - spark.shuffle.blockTransferService nio
  - spark.scheduler.minRegisteredResourcesRatio 1.0
  - spark.speculation false
Step 4, upload data and dependency jar
- Data-> create table -> upload data, give a name for example ”NEG50”
- /FileStore/taAbles/Jobs2Career/indexed/indexed/
- /FileStore/tables/Jobs2Career/indexed/NEG50/
- /FileStore/tables/Jobs2Career/lib/job2career_1_0_SNAPSHOT_job-0ca74.jar
Step 5 run job
- Jobs -> Create job -> give a name
- set Jar, Upload jar, give main class “com.intel.analytics.bigdl.apps.job2Career.TrainWithD2VGlove”, give arguments "--inputDir /FileStore/tables/Jobs2Career/indexed/“
- Add dependency lib dbfs:/FileStore/tables/Jobs2Career/lib/job2career_1_0_SNAPSHOT_job-0ca74.jar
- Edit cluster -> existing cluster, choose the one you created
- confirm -> run now -> see results from log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

POC: Jobs recommendation on Apache Spark and BigDL

What is BigDL?

Deploy BigDL application on Databricks cloud

Files

README.md

Latest commit

History

README.md

File metadata and controls

POC: Jobs recommendation on Apache Spark and BigDL

What is BigDL?

Deploy BigDL application on Databricks cloud