Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocces and/or CUDA error #171

Open
Michelvl92 opened this issue Dec 20, 2024 · 0 comments
Open

Multiprocces and/or CUDA error #171

Michelvl92 opened this issue Dec 20, 2024 · 0 comments

Comments

@Michelvl92
Copy link

Ubuntu 22.04
Intel Xeon 24 cores @2.4Ghz
RTX 3090

This happens after +/- 30 sec in a specific scene with custom RGB-D data. Not sure how to fix.

Traceback (most recent call last):
  File "slam.py", line 252, in <module>
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 110, in __init__
    self.frontend.run()
  File "/MonoGS/MonoGS/utils/slam_frontend.py", line 482, in run
    data = self.frontend_queue.get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 149, in rebuild_cuda_tensor
    storage = storage_cls._new_shared_cuda(
  File "/usr/local/lib/python3.8/dist-packages/torch/storage.py", line 1212, in _new_shared_cuda
    return torch.UntypedStorage._new_shared_cuda(*args, **kwargs)
RuntimeError: CUDA error: invalid resource handle
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
                                                                                                                                                                                                                 Process Process-4:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/MonoGS/MonoGS/gui/slam_gui.py", line 688, in run
    app.run()
  File "/MonoGS/MonoGS/gui/slam_gui.py", line 676, in update
    self.scene_update()
  File "/MonoGS/MonoGS/gui/slam_gui.py", line 662, in scene_update
    self.receive_data(self.q_main2vis)
  File "/MonoGS/MonoGS/gui/slam_gui.py", line 394, in receive_data
    gaussian_packet = get_latest_queue(q)
  File "/MonoGS/MonoGS/gui/gui_utils.py", line 148, in get_latest_queue
    message_latest = q.get_nowait()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 129, in get_nowait
    return self.get(False)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 496, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 630, in SocketClient
    s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    pid, sts = os.waitpid(self.pid, flag)

And with single_thread: True
Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/queues.py", line 239, in _feed obj = _ForkingPickler.dumps(obj) File "/usr/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 569, in reduce_storage fd, size = storage._share_fd_cpu_() File "/usr/local/lib/python3.8/dist-packages/torch/storage.py", line 337, in wrapper return fn(self, *args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/storage.py", line 407, in _share_fd_cpu_ return super()._share_fd_cpu_(*args, **kwargs) RuntimeError: unable to write to file </torch_2080_950024735_489>: No space left on device (28) Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 142, in _serve with self._listener.accept() as conn: File "/usr/lib/python3.8/multiprocessing/connection.py", line 466, in accept answer_challenge(c, self._authkey) File "/usr/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer Bus error (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant