Inconsistent device in `regressor.py` #4

lucasresck · 2024-12-12T18:37:04Z

Dear authors,

Thank you for releasing fast_l1 code together with datamodels.

While running the linear regression step of datamodels, I faced an issue regarding tensors not being in the same device.

After running

python -m datamodels.regression.compute_datamodels \
    -C regression_config.yaml \
    --data.data_path "$tmp_dir/reg_data.beton" \
    --cfg.out_dir "$tmp_dir/reg_results"

I would face something similar to

  File "/path_to_python3.9/site-packages/fast_l1-0.0.1-py3.9.egg/fast_l1/regressor.py", line 221, in train_saga
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

or

  File "/path_to_python3.9/site-packages/fast_l1-0.0.1-py3.9.egg/fast_l1/regressor.py", line 341, in train_saga
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

This happened because in lines 221 and 341 of regressor.py some CPU tensors are being indexed/sliced using other tensors that lie on the GPU, in this case, idx and still_opt_outer:

fast_l1/fast_l1/regressor.py

Lines 221 to 222 in ef7d08d

    
           a_prev[:, :num_keep].copy_(a_table[idx, :num_keep], 
        
                                      non_blocking=True)

fast_l1/fast_l1/regressor.py

Line 341 in ef7d08d

inds_to_swap = inds_to_swap[still_opt_outer[inds_to_swap]]

On the other hand, they are both on the GPU because weight and train_loader in datamodels/datamodels/regression/compute_datamodels.py are on the GPU when train_saga is called:

        regressor.train_saga(weight,
                             bias,
                             train_loader,
                             val_loader,
                             lr=lr,
                             start_lams=max_lam,
                             update_bias=(use_bias > 0),
                             lam_decay=np.exp(np.log(eps)/k),
                             num_lambdas=k,
                             early_stop_freq=early_stop_freq,
                             early_stop_eps=early_stop_eps,
                             logdir=str(log_path))

The text was updated successfully, but these errors were encountered:

lucasresck mentioned this issue Dec 12, 2024

Fix device in regressor.py #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent device in `regressor.py` #4

Inconsistent device in `regressor.py` #4

lucasresck commented Dec 12, 2024

Inconsistent device in regressor.py #4

Inconsistent device in regressor.py #4

Comments

lucasresck commented Dec 12, 2024

Inconsistent device in `regressor.py` #4

Inconsistent device in `regressor.py` #4