Working with Hadoop with two datasets -

- January 15, 2012

i'm working hadoop , try make interection function 2 datasets what's best scenario. can load 1 memory , intersect in map function other dataset if dataset large ram memory not solution, thoughts?

thanks answers, i'll try out these see what's best solution.

it's tough maneuver intersection in mapred compared other family of platforms around hadoop api. mentioned hive (super easy intersections if have sql background), may consider:

pig
cascading (specifically cogroup if memory concern , hashjoin if isn't)

Search This Blog

Employment & Recruiting

Working with Hadoop with two datasets -

Popular posts from this blog

Php - Delimiter must not be alphanumeric or backslash -

Delphi interface implements -

java - How to create Table using Apache PDFBox -