Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I cannot read the output of a mapreduce job #4

Open
fonsoim opened this issue May 13, 2013 · 6 comments
Open

I cannot read the output of a mapreduce job #4

fonsoim opened this issue May 13, 2013 · 6 comments

Comments

@fonsoim
Copy link

fonsoim commented May 13, 2013

I cannot read the output of a mapreduce job.

The code:

data=to.dfs(1:10)
res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v))
print(res())

[1] "/tmp/Rtmpr5Xv1g/file34916a6426bf"

And then....

from.dfs(res)

Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs
...
...

Finally,

hdfs.ls("/tmp/Rtmpr5Xv1g/file34916a6426bf")

permission owner group size modtime
1 -rw------- daniel supergroup 0 2013-05-13 18:24
2 drwxrwxrwt daniel supergroup 0 2013-05-13 18:23
3 -rw------- daniel supergroup 448 2013-05-13 18:24
4 -rw------- daniel supergroup 122 2013-05-13 18:23
file
1 /tmp/Rtmpr5Xv1g/file34916a6426bf/_SUCCESS
2 /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs
3 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00000
4 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00001

I note that /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs is a directory

Why does the program search the file "_logs" when it is a directory??????

Thanks in advance

Alfonso

@piccolbo
Copy link
Collaborator

On Mon, May 13, 2013 at 9:34 AM, fonsoim [email protected] wrote:

I cannot read the output of a mapreduce job.

The code:

data=to.dfs(1:10)
res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v))
print(res())

This is not documented, you are not supposed to do it, it could break in
the next bugfix release, any code using it should be considered incorrect
and you are doing a disservice to the project by posting it. Just so you
know.

[1] "/tmp/Rtmpr5Xv1g/file34916a6426bf"

And then....

from.dfs(res)

Can you post the output of traceback() called immediately after this call?
What versions of rmr2 and hadoop are you using?

Antonio

Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs
...
...

Finally,

hdfs.ls("/tmp/Rtmpr5Xv1g/file34916a6426bf")

permission owner group size modtime
1 -rw------- daniel supergroup 0 2013-05-13 18:24
2 drwxrwxrwt daniel supergroup 0 2013-05-13 18:23
3 -rw------- daniel supergroup 448 2013-05-13 18:24
4 -rw------- daniel supergroup 122 2013-05-13 18:23
file
1 /tmp/Rtmpr5Xv1g/file34916a6426bf/_SUCCESS
2 /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs
3 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00000
4 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00001

I note that /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs is a directory

Why does the program search the file "_logs" when it is a directory??????

Thanks in advance

Alfonso


Reply to this email directly or view it on GitHubhttps://github.com//issues/4
.

@fonsoim
Copy link
Author

fonsoim commented May 21, 2013

Sorry for submitting the same problem in different places.

I do not understand why I am not supposed to do this code.
It is a simple example like in https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md

The versions of rmr2 and hadoop are 2.1.0 and 2.0.0, respectively.

The code:

data=to.dfs(1:10)
res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v))
from.dfs(res)

The error:

from.dfs(res)
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/usr/lib/hadoop-hdfs/bin/hdfs: line 24: /usr/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/usr/lib/hadoop-hdfs/bin/hdfs: line 130: cygpath: command not found
/usr/lib/hadoop-hdfs/bin/hdfs: line 162: exec: : not found
Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/RtmpzXyC7B/file34c6342d57ed/_logs
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1312)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1213)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:170)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44064)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
    at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:972)
    at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
    at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:171)
    at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:138)
    at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:131)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
    at org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:56)
    at org.apache.hadoop.streaming.DumpTypedBytes.dumpTypedBytes(DumpTypedBytes.java:102)
    at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:83)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /tmp/RtmpzXyC7B/file34c6342d57ed/_logs
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1312)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1213)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:170)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44064)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

    at org.apache.hadoop.ipc.Client.call(Client.java:1225)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
    at $Proxy9.getBlockLocations(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
    at $Proxy9.getBlockLocations(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:154)
    at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
    ... 19 more

$key
list()

$val
list()

@piccolbo
Copy link
Collaborator

On Tue, May 21, 2013 at 1:21 AM, fonsoim [email protected] wrote:

Sorry for submitting the same problem in different places.

I do not understand why I am not supposed to do this code.
It is a simple example like in
https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md

Where did you get the res() call and exposing the internal representation
of a big data object? Not from me.

The versions of rmr2 and hadoop are 2.1.0 and 2.0.0, respectively.

How about the OS? Are you running windows? If so unfortunately it's not
supported yet. If you are on linux, let's do this experiment. In R call

to.dfs(1:10, output = "/tmp/ls-test")

At the shell prompt try

hadoop dfs -ls /tmp/ls-test

The first two errors that you get point to hadoop problems independent of R
and this little experiment will help confirm that.

Antonio

The code:

data=to.dfs(1:10)
res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v))
from.dfs(res)

The error:

from.dfs(res)
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/usr/lib/hadoop-hdfs/bin/hdfs: line 24:
/usr/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or
directory
/usr/lib/hadoop-hdfs/bin/hdfs: line 130: cygpath: command not found

This is where I suspect you are running the windows version.

/usr/lib/hadoop-hdfs/bin/hdfs: line 162: exec: : not found
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/RtmpzXyC7B/file34c6342d57ed/_logs
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1312)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1213)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:170)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44064)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:972)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:171)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:138)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:131)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
at org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:56)
at org.apache.hadoop.streaming.DumpTypedBytes.dumpTypedBytes(DumpTypedBytes.java:102)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:83)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
does not exist: /tmp/RtmpzXyC7B/file34c6342d57ed/_logs
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1312)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1213)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:170)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44064)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getBlockLocations(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:154)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
... 19 more

$key
list()

$val
list()


Reply to this email directly or view it on
GitHubhttps://github.com//issues/4#issuecomment-18195085
.

@fonsoim
Copy link
Author

fonsoim commented May 22, 2013

The OS is Ubuntu 12.04

I did your experinent:

to.dfs(1:10, output = "/tmp/ls-test")
hadoop dfs -ls /tmp/ls-test

It works. The HDFS contains the file located in "/tmp/ls-test". Then, I list the file at the shell prompt.

@piccolbo
Copy link
Collaborator

Maybe we have two problems here. One is that you have a configuration
error. It doesn't seem to be very common for googling around, nonetheless
I suspect you won't be up an running until you fix it. Take a look at this
reporthttp://hortonworks.com/community/forums/topic/unable-to-start-the-datanode/and
see if you can get some insight as to what is wrong with your
configuration. The other is from.dfs trying to read the _logs directory.
This is puzzling. There is an explicit filter that discards anything
starting with _. Could you try this in R:

rmr2:::part.list("/tmp/ls-test")

I am not sure what the connection between the two problems could be, but
related or not we need to solve both to make progress. Thanks

Antonio

On Wed, May 22, 2013 at 1:27 AM, fonsoim [email protected] wrote:

The OS is Ubuntu 12.04

I did your experinent:

to.dfs(1:10, output = "/tmp/ls-test")
hadoop dfs -ls /tmp/ls-test

It works. The HDFS contains the file located in "/tmp/ls-test". Then, I
list the file at the shell prompt.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-18264010
.

@kardes
Copy link

kardes commented Mar 20, 2015

hi, is this resolved? I have the same problem thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants