Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tacd: error propagation instead of unwraping part 1 #43

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

KarlK90
Copy link
Member

@KarlK90 KarlK90 commented Sep 28, 2023

Instead of unwrapping Results and potentially panicking we replace it with proper error propagation that can be acted upon in the future e.g. return different exit codes for systemd or display error messages.

@KarlK90 KarlK90 force-pushed the topic/error-handling/all-the-unwraps-part1 branch 2 times, most recently from e4495ac to c0391af Compare September 28, 2023 10:10
Instead of unwrapping Results and potentially panicking we replace it
with proper error propagation that can be acted upon in the future e.g.
return different exit codes for systemd or display error messages.

Signed-off-by: Stefan Kerkmann <[email protected]>
@KarlK90 KarlK90 force-pushed the topic/error-handling/all-the-unwraps-part1 branch from c0391af to a0cb336 Compare September 28, 2023 10:51
Copy link
Member

@hnez hnez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

thanks for taking on the task to make tacd less panicky!

I've added some comments. Mostly around Result propagation inside of async_std tasks and std threads.
This got me thinking about how tasks in the tacd should be monitored better in general.


let (mut src, _) = topic.clone().subscribe_unbounded();

spawn(async move {
while let Some(ev) = src.next().await {
dst.set_value((ev ^ inverted) as _).unwrap();
dst.set_value((ev ^ inverted) as _)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error from the spawned task propagates nowhere and thus silently ignored.
In my opinion that's worse than crashing the tacd.

We could build a wrapped async_std::task::spawn() that works something like BrokerBuilder in that you pass it around everywhere you want to spawn a task during init and it adds the spawned task to an internal list.
The JoinHandles could then be watched for returned errors in main(), so we can break out of it if something happens.

The usage would be something like:

src/digital_io.rs

fn handle_line_wo(
    bb: &mut BrokerBuilder,
    watched: &mut WatchedTaskBuilder,) -> Result<Arc<Topic<bool>>> {
    …
    watched.spawn(async move {});
}

src/main.rs

#[async_std::main]
async fn main() -> Result<()> {
    let mut watched = WatchedTaskBuilder::new();let dig_io = DigitalIo::new(&mut bb, &mut watched, led.out_0.clone(), led.out_1.clone())?;
    …
    watched.run().await
}

That would also give us the opportunity to define tasks that should complete before marking the initialization as successful (i.e. when to notify systemd that we are ready, so that the RAUC slot can be marked good by rauc-mark-good.service) by e.g. spawning them via a special WatchedTaskBuilder::spawn_init() or the likes.

I quite like the idea. Let me mock something up real quick …

Comment on lines 236 to +241
pwr_line: &LineHandle,
discharge_line: &LineHandle,
fail_state: &AtomicU8,
) {
pwr_line.set_value(1 - PWR_LINE_ASSERTED).unwrap();
discharge_line.set_value(DISCHARGE_LINE_ASSERTED).unwrap();
) -> Result<()> {
pwr_line.set_value(1 - PWR_LINE_ASSERTED)?;
discharge_line.set_value(DISCHARGE_LINE_ASSERTED)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If setting the outputs fails the tacd should exit with an error code as fast as possible, so the fail safes can kick in and disable the output for us.
I am not saying that the error-propagating code can not ensure that. I just wanted to write this down as kind of a mental note to check that it is does.

Comment on lines -337 to +343
thread_res_tx
.try_send(Ok((tick, request.clone(), state.clone())))
.unwrap();
thread_res_tx.try_send(Ok((tick, request.clone(), state.clone())))?;

(tick_weak, request, state)
}
Err(e) => {
thread_res_tx.try_send(Err(e)).unwrap();
thread_res_tx.try_send(Err(e))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Result returned from the spawned thread (this time an actual thread and not an async-std task) does not go anywhere (which is part of the reason we have the thread_res_tx solution in the first place).
In this case this would mean the tacd would wait forever for a message on thread_res_rx until the systemd watchdog would kill it.
The .unwrap() based implementation would have at least printed an error message as a clue to what went wrong.

PS: Now that I've had a second look I think dropping thread_res_tx would generate an event on thread_res_rx because it makes the Stream end, so we would at least get a "didn't receive thread result" message.

If you really want to get rid of this .unwrap() (for esthetic reasons - this .unwrap() should never hit) you could use async_std::task::spawn_blocking() instead of thread::spawn to spawn a thread for which you can receive the return value in an async fashion and use e.g. futures_lite::future::race() to check both the thread_res_rx channel, as well as the threads JoinHandle for an event. In this case we would only communicate a sucess via the channel, meaning it does no longer have to carry a Result.

That could also simplify the error handling in the thread a bit as we could ? in other places in the initialization and maybe also in the steady state.

@@ -442,7 +441,7 @@ impl DutPwrThread {
&pwr_line,
&discharge_line,
&state,
);
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again the Results from the thread do not go anywhere and the Results are just silently dropped.
If the WatchedTaskBuilder (the name is still up for discussion) idea manifests we could just add the thread to the tasks to watch.

Comment on lines -455 to +465
discharge_line
.set_value(1 - DISCHARGE_LINE_ASSERTED)
.unwrap();
pwr_line.set_value(PWR_LINE_ASSERTED).unwrap();
discharge_line.set_value(1 - DISCHARGE_LINE_ASSERTED)?;
pwr_line.set_value(PWR_LINE_ASSERTED)?;
state.store(OutputState::On as u8, Ordering::Relaxed);
}
OutputRequest::Off => {
discharge_line.set_value(DISCHARGE_LINE_ASSERTED).unwrap();
pwr_line.set_value(1 - PWR_LINE_ASSERTED).unwrap();
discharge_line.set_value(DISCHARGE_LINE_ASSERTED)?;
pwr_line.set_value(1 - PWR_LINE_ASSERTED)?;
state.store(OutputState::Off as u8, Ordering::Relaxed);
}
OutputRequest::OffFloating => {
discharge_line
.set_value(1 - DISCHARGE_LINE_ASSERTED)
.unwrap();
pwr_line.set_value(1 - PWR_LINE_ASSERTED).unwrap();
discharge_line.set_value(1 - DISCHARGE_LINE_ASSERTED)?;
pwr_line.set_value(1 - PWR_LINE_ASSERTED)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Comment on lines -102 to +103
let max = led.max_brightness().unwrap();
let max = led.max_brightness()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again an error that propagates nowhere.

});
}

topic
Ok(topic)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_color() itself is still infallible. Errors from the spawned task do not propagate out.
No need for the -> Result<…> and Ok(topic).

SystemTime::now().checked_sub(age).unwrap()
SystemTime::now()
.checked_sub(age)
.with_context(|| "couldn't get system time")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the system time is infallible, what's actually checkd here is if SystemTime::now() - age can be expressed as a SystemTime (SystemTime::checked_sub()).
I don't quite know why I did not just use a SystemTime::now() - age here as SystemTime actually implements Sub and this failing would mean something really strange is happening.
Maybe just do that to get rid of this .unwrap().

Comment on lines +91 to +98
match {
|| {
let time = self.in_system_time()?;
let timestamp = time.duration_since(SystemTime::UNIX_EPOCH)?;
let js_timestamp = 1000.0 * timestamp.as_secs_f64();
anyhow::Ok(js_timestamp)
}
}() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that took me a bit to figure out. We should find a solution my brain can parse in a matter of seconds, not minutes.

How about adding a fn as_js_timestamp(&self) -> Result<f64> method to Timestamp and using it here?.

Comment on lines -99 to +113
let _js_timestamp = f64::deserialize(deserializer)?;
// We need both Serialize and Deserialize for Topics, even when they
// are never deserialized in practice like Timestamps.
unimplemented!();
use serde::de::Error;
Err(Error::custom("unused implementation"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me that I always wanted to change the BrokerBuilder so that you do not have to implement Serialize for write-only (e.g. they can only be written from the HTTP API but never read) Topics and Deserialize for read-only Topics, but did not get around to it yet.

Until then this is the nicer workaround.

@hnez
Copy link
Member

hnez commented Sep 29, 2023

Hi,

I've hacked together a prototype that keeps track of running tasks and should propagate errors down the line. Did not do any testing yet though and as of now the cargo tests do not compile, but I think it's a good start.

have a look at my watched-tasks branch.

@hnez
Copy link
Member

hnez commented Oct 2, 2023

I've now cleaned up my work in watched-tasks and created a pull request #47.
It would be cool to hear what you think about it.

@hnez
Copy link
Member

hnez commented Nov 23, 2023

I'd suggest rebasing this once #48 is merged, because it should pave the way to make some of these changes actually work.
I'll assign @KarlK90 to keep track of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants