-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby phase 1 #93
Ruby phase 1 #93
Conversation
* Custom metrics and optional metrics buffer for custom processing | ||
* Typed search attributes, memos, headers, etc | ||
* API key, RPC metadata, HTTP CONNECT proxy support, etc | ||
* Loggers with contextual information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will likely be our own logger abstraction since we need a pluggable logger factory and Ruby stdlib doesn't really offer one
but in general Ruby typing right now is a mess and we likely will hit bugs or advanced cases a yard/RBS combination | ||
library won't work with. | ||
* ❓ How hard do we want to work to avoid duplicate yard/RBS typing? | ||
* 💭 Why not Sorbet? Sorbet can come later or provided separately, but RBS seems to be the newer movement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cretz , the org where I work is very excited about the official Ruby SDK for temporal!
Just wanted to drop a note that we use Sorbet heavily and it seems like it's the much more popular typing system in the community (despite the introduction of RBS). It won't affect our usage much either way (although first-class sorbet typing would make it slightly easier to integrate), but if you were interested in exploring/chatting more about sorbet, just wanted to offer to jump on a synchronous call to talk through concerns/tradeoffs. I've added sorbet to many gems and applications and would be very happy to chat through things.
Thanks for prioritizing this work!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yes, I definitely believe RBS is weaker than Sorbet, and less adopted, and Steep is not a great tool, but I am hoping to avoid non-stdlib deps where we can. We considered no typing at all and that still may be an option. But during PoC, RBS was acceptable for this limited use case, but I expect most users to treat this as an untyped library, including Sorbet users. So I definitely agree it won't affect most people's usage either way. At this point with such low RBS adoption, it essentially just becomes a check on ourselves (we did this successfully in the last incarnation at https://github.com/temporalio/sdk-ruby/tree/v0.1.1).
We were just wanting to avoid maintaining Sorbet annotations in repo. But we would totally encourage external ones at https://github.com/Shopify/rbi-central or https://github.com/sorbet/sorbet-typed or wherever.
Would definitely like to chat on this further. If easier, can come join us in #ruby-sdk
in Slack. The pinned message in that channel also has a link to a survey w/ a request at the end to setup a meeting to discuss Ruby if that's preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filled out the survey, thanks!
I commented in that channel in case you wanted to chat through Sorbet more :)
Thanks again!
* 💭 Why classes instead of methods/functions like every other SDK? There are tradeoffs to both approaches. As | ||
classes, they are more easily referenced, a bit more easily typed in RBS, and metadata can be applied more easily at | ||
the class level. This comes at a cost of not being able to easily share state across classes, but users can provide | ||
something in the constructor multiple activities use. Open to considering activity methods instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might've already discussed this internally, but a middle ground here might be supporting any object that responds to #call
rather than #execute
.
Using #call
would potentially allow simple activities to be implemented as a Proc
or Lambda
, while still allowing people to use custom classes for more complex scenarios.
In the spirit of "there's more than one way to do it", this would give SDK users multiple options for defining activities, and they could choose whichever one is most appropriate for their circumstances:
module Activities
# Constant procs
SomeProcActivity = ->(params) {
# ...
}
# Proc factory methods
def self.another_proc_activity(config)
->(params) {
# can use config to customize proc
}
}
# Maybe some kind of DSL?
DSLActivity = Temporalio.activity(options) do |params|
# ...
end
# Child classes for full control
ChildClassActivity < Temporalio::Activity
def initialize(config)
# ...
end
def call(params)
# ...
end
end
end
# Activity registration using a hash, similar to TS API:
worker1 = Temporalio::Worker.new(
# ...
activities: {
SomeProcActivity: Activities::SomeProcActivity,
AnotherProcActivity: Activities.another_proc_activity(config),
DSLActivity: Activities::DSLActivity,
ChildClassActivity: Activities::ChildClassActivity,
# Even inline activities, which is almost certainly not a good
# idea for production code, but could be useful in some situations.
InlineActivity: ->(params) {
# ...
}
}
# ...
)
And possibly a block-based API on the worker:
worker.register_activity('MyActivity') do |params|
# ...
end
As well as anything that responds to #to_proc
:
worker.register_activity('MyActivity', &:my_activity)
All of these could of course also be supported using the #execute
method, but then there'd be a discrepancy between the SDK's #execute
and the proc's #call
.
Some similar APIs:
- the Rack specification, which makes heavy use of objects that respond to
#call
Logger#formatter
, which is documented as supported either a proc or an instanceLogger::Formatter
(which responds to#call
)- AWS SDK uses a
Callable
interface for various callback-like options in some services
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using #call would potentially allow simple activities to be implemented as a Proc or Lambda, while still allowing people to use custom classes for more complex scenarios.
A couple of important aspects we want for activities are 1) they are easily referenceable by a caller without calling (e.g. Workflow.execute_activity(MyActivityClass, ...
, w/ possible generic typing support w/ RBS/Sorbet), and 2) people can use existing things (e.g. mixins) and there is not much magic.
In the spirit of "there's more than one way to do it", this would give SDK users multiple options for defining activities,
We actually try to avoid this in SDKs where we can at declaration time. Sure some SDKs do have a couple of ways, but in general SDKs say "activities are defined as X" without many values for X and without more magic than we have to have. And remember there is the referenceability concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of important aspects we want for activities are 1) they are easily referenceable by a caller without calling (e.g. Workflow.execute_activity(MyActivityClass, ..., w/ possible generic typing support w/ RBS/Sorbet), and 2) people can use existing things (e.g. mixins) and there is not much magic.
Got it. If I'm understanding them correctly, those all make sense to me as goals / requirements 👍
For (1), is that referring to the ability to call an activity with something like MyActivity.execute
rather than workflow.execute_activity(MyActivity)
? If so, then using a method other than #call
definitely makes sense to me. IIUC, that'll help prevent users from trying to execute activity classes that haven't been wired up to Temporal.
We actually try to avoid this in SDKs where we can at declaration time. Sure some SDKs do have a couple of ways, but in general SDKs say "activities are defined as X" without many values for X and without more magic than we have to have. And remember there is the referenceability concern.
I totally get this, and I'm personally not a huge fan of how Ruby has adopted Perl's TIMTOWTDI philosophy. But I also think a lot of Rubyists definitely view that as one of Ruby's strengths, so just wanted to put in a plug for that way of thinking 😅
That being said, I don't think anyone will avoid using Temporal just because they can't implement activities as procs, so it's probably not worth the effort to implement (& support!) that level of flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that referring to the ability to call an activity with something like
MyActivity.execute
rather thanworkflow.execute_activity(MyActivity)
?
No, we want the latter. We learned from other SDKs that it is best to not make the remote activity invocation look like a regular local method invocation.
they can't implement activities as procs
Maybe we can have sugar for this, e.g. class ProcActivity < Temporalio::Activity
and a class method on Temporalio::Activity
that creates it. But it is the same type of sugar a user can write if we make sure that we support activities as classes.
|
||
### Workflows in Client | ||
|
||
Early thoughts on workflows are that they'll be Ractors. For the purposes of this phase, all that may matter is how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment about this in the Google form as well, but I'm not sure if Ractors would be the right move here. They're still very experimental and still have some fundamental limitations:
-
https://rubykaigi.org/2023/presentations/ko1.html
Ractor was introduced in Ruby 3.0 as an experimental feature to enable parallel programming in Ruby. Now Ractor is not widely used for many reasons. For example, from the perspective of a Ractor implementer, Ractor does not have enough performance and quality. In this talk, we will explain which features are not enough, why they are difficult, and how to solve these difficulties.
-
https://rubykaigi.org/2024/presentations/ko1.html
This talk presents recent updates to Ractor, which enables parallel and concurrent programming on Ruby.
Ractor still lacks fundamental features. For example, we cannot use “require” method and “timeout” methods on non-main Ractors because of synchronization and implementation issues. We will discuss such problems and how to solve them. From a performance point of view, we have introduced the M:N thread scheduler in Ruby 3.3 and we will show the performance analysis with recent improvements.
They also still output scary warnings, even in Ruby 3.3:
irb(main):001> Ractor.new { puts RUBY_VERSION }
(irb):1: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
3.3.4
=> #<Ractor:#2 (irb):1 terminated>
I definitely think that Ractors could be a good solution in a future world where Ractors are more polished, but IMO for now it would be safer to follow the coinbase/temporal-ruby
SDK and use Fibers as the basis for scaling workflow executions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the biggest challenges in SDK development is sandboxing for determinism. There are two main impediments to determinism: state isolation and restricted API use (e.g. disk IO). Ignoring the latter, if we can get the former it may be worth it. JS is lucky in that it can spin up a new V8 instance. But it is a nightmare for Go and Java and .NET and other devs to remember that they cannot share state. We expended a lot of effort making a Python sandbox just to get this state isolation so that globals don't pollute.
We will have to evaluate performance to see if this can work, and we can provide a way to opt-out of Ractors (for the most part none of the Ractor stuff would be visible to users unless they wanted to do bad things like expose something shareable).
Fibers as the basis for scaling workflow executions.
We will definitely still be using fibers (we will have our own fiber scheduler since we need deterministic coroutines), but if we can get state isolation by default, it may be worth the performance penalty. This is more of a phase 2 thing though, workflows aren't going to be done now.
Interestingly enough, deterministic isolated Temporal workflows is like one of the only good use cases I have seen for Ractors, heh. But it's subject to testing to make sure the good limitations are good and the bad limitations aren't too bad (and of course we can allow it to be disabled). Most users will never know there are Ractors under the hood powering workflows.
They also still output scary warnings, even in Ruby 3.3:
This is scary. Maybe it won't be ready. We will evaluate in phase 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, that makes sense! If it's only for workflow-internal things, I think Ractors sound great.
But if the goal is to isolate global state even between activities, I could see that being very challenging with Ractors. For example, I'm not sure if ActiveRecord would would play nicely with Ractors, because (AFAIK) the connection pool wouldn't be able to be shared between Ractors.
Interestingly enough, deterministic isolated Temporal workflows is like one of the only good use cases I have seen for Ractors, heh.
Same, TBH 😅
I want to use them, but haven't really found a use case yet, especially given their current limitations :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the goal is to isolate global state even between activities, I could see that being very challenging with Ractors
Definitely not the goal. Activities will not have anything to do with Ractors at all. Activities are just normal Ruby code where you can do what you want (we will allow different executors so people can choose whether to run fiber/async vs threaded (the default)). In fact, it is a very common use case to share state between activities. For instance, it is very normal to have multiple activity invocations share a database client. This is why we will allow users to instantiate the activity class, meaning the same instance will be called on each attempt (but if you pass the class in, we will instantiate per attempt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great! Thanks for helping me understand!
* API key, RPC metadata, HTTP CONNECT proxy support, etc | ||
* Loggers with contextual information | ||
* Full error hierarchy with `Temporalio::Error` as the root | ||
* ❓ Are there any Ruby libraries so widely adopted they might as well be in the standard library and therefore deserve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActiveSupport is probably the big contender here, but TBH I don't know how much the Temporal would really benefit from having it available.
Optional support for things like ActiveSupport::Duration
might be nice for things like start_delay
, but that could be a pain to implement (& maintain) alongside normal Numeric
arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 We definitely plan to have samples showing integrations, but not sure we need an explicit integration for this. Whatever integrations we do have would be optional/additional because it is really important that we have as few dependencies as possible in the primary SDK because our dependency constraints become users' transitive constraints. I believe we can have only 1 dependency (protobuf) at this time. temporalio-opentelemetry
is the only additional gem I can think we'll have at first. I expect active support, rails, etc samples to exist though.
* Workflow handles exist with all relevant calls. | ||
* Note that we use the kwargs constructor + options struct pattern so that a user can access duped options, change one, | ||
and reconstruct via `Client.new(**my_new_options.to_h)` | ||
* ❓Is there a better way to allow people to easily change an option? We didn't want a `with` pattern because, at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very common pattern in Ruby is to have config methods (including some initializers) accept an optional block that yields the config object for modification:
class Client
def initialize(**options)
@config = Configuration.new(**options)
yield @config if block_given?
# ...
end
end
client = Client.new do |config|
config.namespace = "my-namespace"
end
I think that would feel very natural for many Rubyists, but might be a pretty large departure from how the other SDKs are configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For client connect and initialization (and some others) we are wanting keyword arguments for all but one or two required positional args since the vast majority are optional and we're not really wanting multiple ways to configure the client (e.g. allow both block and kwargs). In this case it's more like Client.connect("localhost:7233", "my-namespace")
which is a shortcut for Client.new(Connection.new("localhost:7233"), "my-namespace")
. There is a PR coming either here in a couple of hours or early tomorrow that will show this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the intentionally-limited PR: temporalio/sdk-ruby#155. Would love feedback there, I sprinkled some PR comments throughout.
* Activity can choose its executor, default is `:default`. | ||
* See worker section for more information on executors. | ||
* Cancellation is reported to the executor. | ||
* For thread pool executor (the default), this raises a cancellation exception via `Thread.raise`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this means that the executor will raise an exception on the activity by calling e.g. activity_thread.raise CancellationError
, that could potentially be really dangerous. Ruby's built-in Timeout
module uses this approach, and it can be problematic:
- https://headius.blogspot.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html
- https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/
- https://github.com/zombocom/rack-timeout/blob/main/doc/risks.md
That being said, the SDK would probably be fine >99% of the time by using Thread.handle_interrupt
, similar to Sidekiq:
But even that can be really subtle: sidekiq/sidekiq#6017 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was a topic of discussion in Python too where this kind of thing is even less common (we have to make a C call to https://docs.python.org/3/c-api/init.html#c.PyThreadState_SetAsyncExc, it's not even in the standard library). We can provide a way to opt-out of this (and we will provide a "shield" way to ensure certain code runs), but we've found it's dangerous to let runaway activities continue to run that cannot get interrupted. If there is some other model in the standard library for propagating cancellation to threaded tasks, we'd like to use it.
I will make a note to make sure we use handle_interrupt
for our exception only and make sure to have tests demonstrate other forms of interrupt.
* 💭 Why no `Temporalio::Common`? Ruby expects to have common things where they make sense instead of under a `Common` | ||
module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have to walk this back a bit. It is unreasonable to have Temporalio::WorkflowIDReusePolicy
, Temporalio::WorkflowIDConflictPolicy
, etc cluttering the top-level (or is it?). However, we don't want RetryPolicy
to be in Common
. So I guess we may need some kinda hybrid rule that says use the Common
module for small or lesser used common classes?
Maybe we should just have common_enums.rb
at that defines top-level, common enums, but still under Temporalio
module.
* It has a class method for `current` that returns from thread/fiber-local and class method shortcuts for all instance | ||
methods. | ||
* 💭 Why class methods equivalents? `Temporalio::Activity::Context.heartbeat` is easier to read than | ||
`Temporalio::Activity::Context.current.heartbeat`. In .NET we made them get current instance first because that's | ||
normal there, and in Python we only offered package-level methods because that's normal there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* It has a class method for `current` that returns from thread/fiber-local and class method shortcuts for all instance | |
methods. | |
* 💭 Why class methods equivalents? `Temporalio::Activity::Context.heartbeat` is easier to read than | |
`Temporalio::Activity::Context.current.heartbeat`. In .NET we made them get current instance first because that's | |
normal there, and in Python we only offered package-level methods because that's normal there. | |
* It has a class method for `current` that returns from thread/fiber-local. |
I think we are going to walk back this decision for also having class methods. I think it's not really common in Ruby to define class and instance methods of the same name, and Singleton
and others show that people are ok calling .current
or something.
|
||
## Clients | ||
|
||
`Temporalio::Client` is a namespace-specific client and `Temporalio::Client::Connection` is a namespace-independent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporalio::Client::Connection
is a namespace-independent connection
Are you still holding on to this statement, given the semantic change introduced by Cloud's API Keys? I'm just curious, not arguing either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Connection
is the Rust Client
and the CloudOperationsClient
here (as in other SDKs) will leverage the Connection
. Can see this separation in practice already at main
of https://github.com/temporalio/sdk-ruby (and in Python SDK and .NET SDK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Merged since all comments settled and we have long since started implementation. |
Summary
For the public, please also join us on
#ruby-sdk
in Slack.