Hadoop streaming with python on Windows -


i'm using hortonworks hdp windows , have configured master , 2 slaves.

i'm using following command;

bin\hadoop jar contrib\streaming\hadoop-streaming-1.1.0-snapshot.jar -files file:///d:/dev/python/mapper.py,file:///d:/dev/python/reducer.py -mapper "python mapper.py" -reducer "python reduce.py" -input /flume/0424/userlog.mdac-hd1.mdac.local..20130424.1366789040945 -output /flume/o%1 -cmdenv pythonpath=c:\python27

the mapper runs through fine, log reports reduce.py file wasn't found. in exception looks hadoop taskrunner creating symlink reducer mapper.py file.

when check job configuration file, noticed mapred.cache.files set to;

hdfs://mdac-hd1:8020/mapred/staging/administrator/.staging/job_201304251054_0021/files/mapper.py#mapper.py

it looks although reduce.py file being added jar file, it's not being included in configuration correctly , can't found when reducer tries run.

i think command correct, i've tried using -file parameters instead neither file found.

can see or know of obvious reason?

please note, on windows.

edit- i've run locally , worked, looks problem may copying of files round cluster.

still welcome input!

well, thats embarrassing... first question , answer myself.

i found problem renaming hadoop conf file force default settings meant local job tracker.

the job ran , gave me room work out problem is, looks communication around cluster isn't complete need be.


Popular posts from this blog

How to calculate SNR of signals in MATLAB? -

c# - Attempting to upload to FTP: System.Net.WebException: System error -

ios - UISlider customization: how to properly add shadow to custom knob image -