Working with Hadoop with two datasets -


i'm working hadoop , try make interection function 2 datasets what's best scenario. can load 1 memory , intersect in map function other dataset if dataset large ram memory not solution, thoughts?


thanks answers, i'll try out these see what's best solution.

it's tough maneuver intersection in mapred compared other family of platforms around hadoop api. mentioned hive (super easy intersections if have sql background), may consider:

  • pig
  • cascading (specifically cogroup if memory concern , hashjoin if isn't)

Popular posts from this blog

How to calculate SNR of signals in MATLAB? -

Php - Delimiter must not be alphanumeric or backslash -

Delphi interface implements -