Working with Hadoop with two datasets -


i'm working hadoop , try make interection function 2 datasets what's best scenario. can load 1 memory , intersect in map function other dataset if dataset large ram memory not solution, thoughts?


thanks answers, i'll try out these see what's best solution.

it's tough maneuver intersection in mapred compared other family of platforms around hadoop api. mentioned hive (super easy intersections if have sql background), may consider:

  • pig
  • cascading (specifically cogroup if memory concern , hashjoin if isn't)

Popular posts from this blog

Php - Delimiter must not be alphanumeric or backslash -

c# - How to change the "Applies To" field under folder auditing options programatically (.NET) -

c++ - Ambiguity when using boost::assign::list_of to construct a std::vector -