watched_tasks: maintain a list of spawned async tasks and propagate errors #47

hnez · 2023-10-02T14:21:18Z

This PR is based on an idea from a discussion in PR #43, where it was noticed that errors from tasks and threads propagate nowhere.

This PR fixed that by assembling a list of threads and tasks that should not complete before the end of the program and propagating all errors from these threads and tasks to main() and beyond.

This touches a lot of places, as we spawn quite a lot of async_std tasks (the tacd logs now say "Spawned 58 tasks tasks and 5 threads").

This PR allows us to add better error handling for tasks and threads (we can for example use the ? operator to propagate errors out and do not have to resort to .unwrap() to at least crash the tacd from within a task on error). Adding this better error handling is out of scope for this PR, as it is already large enough.

hnez · 2023-10-04T05:37:51Z

The failed cargo deny run should be fixed by the dependency updates in #42 and has nothing to do with this PR.

KarlK90

Nice! I like the changes and implementation. I thought whether or not the async runtime provides mechanisms to query running tasks and see if anything failed like tokio-console, but this might be messy (or not even possible). Anyway, there are some minor things that should be corrected but apart from that LGTM.

src/watched_tasks.rs

src/adc/iio/hardware.rs

src/watched_tasks.rs

hnez · 2023-10-06T10:10:47Z

While addressing your comments I have also changed the way ThreadHandle notifies a task about the completion of a thread.
Using a async_std::channel there was somewhat of a hack so I do not have to work with Future and Waker directly, because that sounded somewhat scary.
As it turns out working with Wakers is not scary at all, so we can just use one here:

tacd/src/watched_tasks.rs

Lines 29 to 54 in 69d52f7

    
           impl ThreadHandle { 
        
               fn new<F>(name: String, function: F) -> Result<Self> 
        
               where 
        
                   F: FnOnce() -> TaskResult + Send + 'static, 
        
               { 
        
                   let wake_on_exit = Arc::new(Mutex::new(None::<Waker>)); 
        
                   let wake_on_exit_thread = wake_on_exit.clone(); 
        
                   let handle = thread::Builder::new().name(name).spawn(move || { 
        
                       // We could std::panic::catch_unwind() here in the future to handle 
        
                       // panics inside of spawned threads. 
        
                       let res = function(); 
        
                       // Keep the Mutex locked until exiting the thread to prevent 
        
                       // a race with another thread checking for task completion and 
        
                       // adding a waker. 
        
                       let mut wake_on_exit = wake_on_exit_thread 
        
                           .lock() 
        
                           .map_err(|_| anyhow!("Tried to lock a tainted Mutex"))?; 
        
                       if let Some(waker) = wake_on_exit.take() { 
        
                           waker.wake(); 
        
                       } 
        
                       res 
        
                   })?;

This does however mean that you have to re-review that part.

hnez · 2023-10-06T10:25:04Z

The failed cargo clippy run should be fixed by #49 and has nothing to do with this PR.

KarlK90

This looks good to me now and will definitely improve the error handling story in the code base. Feel free to merge.

jluebbe · 2023-10-06T11:18:53Z

Adding a custom wrapper around standard APIs seems very invasive to me. Is this really such an unusual problem that there is no idiomatic way?

hnez · 2023-10-06T11:57:35Z

Hi @jluebbe

no idiomatic way

Not in async_std itself, so we would have to use a crate for it. I had a look around and found async_nursery which may be coerced into doing what we we want (minus the thread handling), but I would prefer not to use it because it does not look too mature (and does not support threads in addition to tasks).

I quite like the tailor-made solution implemented in this PR and think 206 not-that-dense lines of code are an acceptable complexity compared to adding a new dependency to handle this for us.

I would prefer merging this as-is rather than adding a new dependency.

hnez · 2023-10-06T13:53:13Z

The recently merged pull requests #33 and #34 made some changes to this PR necessary, as both of them spawn tasks.
Please re-review affected files.

hnez · 2023-11-09T11:20:33Z

I've rebased this PR on top of recent main and think it should be the next one we merge.

KarlK90 · 2023-11-09T11:30:36Z

I remember that @jluebbe wanted some written commentary why we choose to implement a bespoke solution and not use async_nursery or tokio's JoinSet. Could you add this to watched_tasks.rs? After that I'm happy to approve this.

The WatchedTasksBuilder makes sure that the program ends if any of the tasks fails and also handles error propagation from the tasks to the main() function. Signed-off-by: Leonard Göhrs <[email protected]>

Execution should stop if any of the important threads ends, not just one of the tasks. Signed-off-by: Leonard Göhrs <[email protected]>

The large re-indentation would have made the commit that adds the wtb.spawn_thread() call rather unreadable, as it mixes formatting and code changes. Instead add this commit that _only_ changes formatting. Signed-off-by: Leonard Göhrs <[email protected]>

Signed-off-by: Leonard Göhrs <[email protected]>

While .expect() still means panicking it is considerd a better kind of panicking than .unwrap(). Signed-off-by: Leonard Göhrs <[email protected]>

This allows us to display an error message if the server setup fails. Signed-off-by: Leonard Göhrs <[email protected]>

Many small functions is a nice thing in general, but a three line function that is only used once seems a bit over the top. Signed-off-by: Leonard Göhrs <[email protected]>

hnez · 2024-01-19T10:54:11Z

I've added a comment …

tacd/src/watched_tasks.rs

Lines 11 to 29 in 6c7a2cd

    
           // This is a wrapper around async_std::task:spawn() that keeps track of the 
        
           // tasks it spawned. This solves the problem of error propagation from tasks 
        
           // for us. 
        
           // When using async_std::task:spawn() you get a handle back that can be used 
        
           // to check if and how the task has completed, but there is no common way in 
        
           // async_std to have a list of tasks to watch at the same time. 
        
           // We want to keep track of the various long running tasks we spawn in the 
        
           // tacd and want to propagate errors from back to main(). 
        
           // We also want the same, with a similar API for actual threads instead of 
        
           // async tasks. WatchedTasks does both. 
        
           // 
        
           // There are other solutions that do basically the same, but did not quite 
        
           // fit our needs: 
        
           // 
        
           //   - async_nursery - Works for async_std but does not handle actual threads 
        
           //     and does not look that great code-wise. 
        
           //   - tokio JoinSet - Does roughly the same as WatchedTasks, but also without 
        
           //     native thread support and - more importantly for tokio (which we do 
        
           //     not use).

… that talks about why we went with our own task spawning wrapper instead of using something off-the-shelf.

KarlK90

@hnez The comment summarizes the discussion very well. You have my approval - feel free to merge.

This was referenced Oct 2, 2023

watched_tasks: improve error handling in spawned tasks and threads #48

Merged

tacd: error propagation instead of unwraping part 1 #43

Open

KarlK90 requested changes Oct 6, 2023

View reviewed changes

src/watched_tasks.rs Outdated Show resolved Hide resolved

src/adc/iio/hardware.rs Outdated Show resolved Hide resolved

src/adc/iio/hardware.rs Outdated Show resolved Hide resolved

src/adc/iio/hardware.rs Outdated Show resolved Hide resolved

KarlK90 reviewed Oct 6, 2023

View reviewed changes

src/watched_tasks.rs Outdated Show resolved Hide resolved

hnez force-pushed the watched-tasks branch from 8b0afc5 to 69d52f7 Compare October 6, 2023 09:51

KarlK90 approved these changes Oct 6, 2023

View reviewed changes

hnez force-pushed the watched-tasks branch from 69d52f7 to cd96a3d Compare October 6, 2023 13:51

KarlK90 approved these changes Oct 6, 2023

View reviewed changes

hnez force-pushed the watched-tasks branch from cd96a3d to 0738860 Compare October 6, 2023 14:03

hnez force-pushed the watched-tasks branch 5 times, most recently from 3114a63 to ea5cc2b Compare November 9, 2023 11:13

hnez self-assigned this Nov 23, 2023

hnez force-pushed the watched-tasks branch 2 times, most recently from f66b1b4 to 0612c98 Compare January 19, 2024 09:32

hnez added 4 commits January 19, 2024 11:05

watched_tasks: spawn long running tasks via special task builder

cf7bd77

The WatchedTasksBuilder makes sure that the program ends if any of the tasks fails and also handles error propagation from the tasks to the main() function. Signed-off-by: Leonard Göhrs <[email protected]>

watched_tasks: also handle threads

6c7a2cd

Execution should stop if any of the important threads ends, not just one of the tasks. Signed-off-by: Leonard Göhrs <[email protected]>

hnez added 4 commits January 19, 2024 11:06

watched_tasks: let spawn_task return Result instead of panicking

dc802bd

Signed-off-by: Leonard Göhrs <[email protected]>

adc: hardware: replace .unwraps() around setup done queue with .expect()

75e027e

While .expect() still means panicking it is considerd a better kind of panicking than .unwrap(). Signed-off-by: Leonard Göhrs <[email protected]>

tacd: finish the http server setup in init()

c823515

This allows us to display an error message if the server setup fails. Signed-off-by: Leonard Göhrs <[email protected]>

tacd: inline run() into main()

396136e

Many small functions is a nice thing in general, but a three line function that is only used once seems a bit over the top. Signed-off-by: Leonard Göhrs <[email protected]>

hnez force-pushed the watched-tasks branch from 0612c98 to 396136e Compare January 19, 2024 10:11

KarlK90 approved these changes Jan 19, 2024

View reviewed changes

hnez merged commit e94c996 into linux-automation:main Jan 19, 2024
10 checks passed

hnez deleted the watched-tasks branch January 19, 2024 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

watched_tasks: maintain a list of spawned async tasks and propagate errors #47

watched_tasks: maintain a list of spawned async tasks and propagate errors #47

hnez commented Oct 2, 2023

hnez commented Oct 4, 2023

KarlK90 left a comment •

edited

Loading

hnez commented Oct 6, 2023

hnez commented Oct 6, 2023

KarlK90 left a comment

jluebbe commented Oct 6, 2023

hnez commented Oct 6, 2023

hnez commented Oct 6, 2023

hnez commented Nov 9, 2023

KarlK90 commented Nov 9, 2023

hnez commented Jan 19, 2024

KarlK90 left a comment

watched_tasks: maintain a list of spawned async tasks and propagate errors #47

watched_tasks: maintain a list of spawned async tasks and propagate errors #47

Conversation

hnez commented Oct 2, 2023

hnez commented Oct 4, 2023

KarlK90 left a comment • edited Loading

Choose a reason for hiding this comment

hnez commented Oct 6, 2023

hnez commented Oct 6, 2023

KarlK90 left a comment

Choose a reason for hiding this comment

jluebbe commented Oct 6, 2023

hnez commented Oct 6, 2023

hnez commented Oct 6, 2023

hnez commented Nov 9, 2023

KarlK90 commented Nov 9, 2023

hnez commented Jan 19, 2024

KarlK90 left a comment

Choose a reason for hiding this comment

KarlK90 left a comment •

edited

Loading