Working with Hadoop with two datasets -
i'm working hadoop , try make interection function 2 datasets what's best scenario. can load 1 memory , intersect in map function other dataset if dataset large ram memory not solution, thoughts?
thanks answers, i'll try out these see what's best solution.
it's tough maneuver intersection in mapred compared other family of platforms around hadoop api. mentioned hive (super easy intersections if have sql background), may consider: