Overview
OpenHands assists with Spark migrations in several ways:- Version upgrades: Migrate from Spark 2.x to 3.x, or between 3.x versions
- API modernization: Update deprecated APIs to current best practices
- Framework migrations: Convert from MapReduce, Hive, or other frameworks to Spark
- Cloud migrations: Move Spark workloads between cloud providers or to cloud-native services
Migration Scenarios
Spark Version Upgrades
Upgrading Spark versions often requires code changes due to API deprecations and behavioral differences. Spark 2.x to 3.x Migration:| Spark 2.x | Spark 3.x | Action Required |
|---|---|---|
SQLContext | SparkSession | Replace with SparkSession |
HiveContext | SparkSession with Hive | Update initialization |
Dataset.unionAll() | Dataset.union() | Rename method calls |
DataFrame.explode() | functions.explode() | Use SQL functions |
| Legacy date parsing | Proleptic Gregorian calendar | Review date handling |
Migration from Other Big Data Frameworks
MapReduce to Spark:Cloud Platform Migrations
On-premises to Cloud:- AWS EMR
- Databricks
- Google Dataproc
Code Transformation
API Updates
OpenHands can automatically update deprecated APIs:Performance Optimization
Improve performance during migration:| Anti-pattern | Optimization | Impact |
|---|---|---|
Multiple count() calls | Cache and count once | Reduces recomputation |
| Small file output | Coalesce before write | Fewer files, faster reads |
| Skewed joins | Salting or broadcast | Eliminates stragglers |
| UDFs for simple ops | Built-in functions | Catalyst optimization |
Best Practices Application
Apply modern Spark best practices:Testing and Validation
Job Testing
Create comprehensive tests for migrated jobs:Performance Benchmarking
Compare performance between versions:- Job duration (wall clock time)
- Shuffle read/write bytes
- Peak executor memory
- Task distribution (min/max/median)
- Garbage collection time
Data Validation
Ensure data correctness after migration:- Row-Level
- Aggregate
- Schema
Examples
Complete Spark 2 to 3 Migration
Hive to Spark SQL Migration
EMR to Databricks Migration
Related Resources
- Repository Setup - Configure your Spark repository for OpenHands
- Key Features - OpenHands capabilities overview
- Prompting Best Practices - Write effective prompts

