Working with Hadoop with two datasets -


i'm working hadoop , try make interection function 2 datasets what's best scenario. can load 1 memory , intersect in map function other dataset if dataset large ram memory not solution, thoughts?


thanks answers, i'll try out these see what's best solution.

it's tough maneuver intersection in mapred compared other family of platforms around hadoop api. mentioned hive (super easy intersections if have sql background), may consider:

  • pig
  • cascading (specifically cogroup if memory concern , hashjoin if isn't)

Popular posts from this blog

How to calculate SNR of signals in MATLAB? -

c# - Attempting to upload to FTP: System.Net.WebException: System error -

ios - UISlider customization: how to properly add shadow to custom knob image -