Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Garnet 1.0.49 fails on unknown commands in Lua #902

Open
nightroman opened this issue Jan 7, 2025 · 14 comments
Open

Regression: Garnet 1.0.49 fails on unknown commands in Lua #902

nightroman opened this issue Jan 7, 2025 · 14 comments
Assignees

Comments

@nightroman
Copy link
Contributor

nightroman commented Jan 7, 2025

Describe the bug

Garnet 1.0.49 has issues with processing unknown commands in Lua (1.0.48 works more as expected).

Steps to reproduce the bug

With Garnet 1.0.48 using StackExchange.Redis ScriptEvaluate, we get:

  • invoking redis.call("missing") returns an empty string
  • invoking return redis.call("missing") returns the string ERR unknown command

With Garnet 1.0.49 the above calls result in exceptions and the server log shows the following records (they do not look as something by design):

05::14::27 fail: Session[0] [127.0.0.1:61856] [00480155] During Lua script execution System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.    at KeraLua.NativeMethods.lua_error(IntPtr luaState)    at Garnet.server.LuaStateWrapper.RaiseErrorFromStack() in /_/libs/server/Lua/LuaStateWrapper.cs:line 476    at Garnet.server.LuaRunner.LuaStaticError(Int32 constStringRegistryIndex) in /_/libs/server/Lua/LuaRunner.cs:line 656    at Garnet.server.LuaRunner.ProcessResponse(Byte* ptr, Int32 length) in /_/libs/server/Lua/LuaRunner.cs:line 698    at Garnet.server.LuaRunner.ProcessCommandFromScripting[TGarnetApi](TGarnetApi api) in /_/libs/server/Lua/LuaRunner.cs:line 634
05::14::27 fail: Session[0] [127.0.0.1:61856] [00480155] Error executing Lua script System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.    at KeraLua.NativeMethods.lua_error(IntPtr luaState)    at Garnet.server.LuaStateWrapper.RaiseErrorFromStack() in /_/libs/server/Lua/LuaStateWrapper.cs:line 476    at Garnet.server.LuaStateWrapper.RaiseError(String msg) in /_/libs/server/Lua/LuaStateWrapper.cs:line 466    at Garnet.server.LuaRunner.ProcessCommandFromScripting[TGarnetApi](TGarnetApi api) in /_/libs/server/Lua/LuaRunner.cs:line 642    at Garnet.server.LuaRunner.garnet_call_txn(IntPtr luaStatePtr) in /_/libs/server/Lua/LuaRunner.cs:line 502    at KeraLua.NativeMethods.lua_pcallk(IntPtr luaState, Int32 nargs, Int32 nresults, Int32 errorfunc, IntPtr ctx, IntPtr k)    at KeraLua.Lua.PCall(Int32 arguments, Int32 results, Int32 errorFunctionIndex)    at Garnet.server.LuaRunner.RunCommon[TResponse](TResponse& resp) in /_/libs/server/Lua/LuaRunner.cs:line 1084    at Garnet.server.LuaRunner.RunForSession(Int32 count, RespServerSession outerSession) in /_/libs/server/Lua/LuaRunner.cs:line 853    at Garnet.server.RespServerSession.ExecuteScript(Int32 count, LuaRunner scriptRunner) in /_/libs/server/Lua/LuaCommands.cs:line 264

Expected behavior

The above Lua scripts are intentionally not quite right, using unknown commands.
They should probably cause some errors, e.g. like in Garnet 1.0.48 but not like in 1.0.49

Release version

1.0.49

IDE

.NET 9

OS version

Windows 11

@kevin-montrose
Copy link
Contributor

Hmm, I can't quite reproduce what you're seeing in .49 (commit ce21c248f084744e45bbff08d0ecce0a51326cca). Specifically the log lines.

As to the responses, I think those are correct in .49 and were wrong in earlier Garnet. Comparing to Redis, ScriptEvaluate("redis.call(\"missing\")") returns an ERR the same way ScriptEvaluate("return redis.call(\"missing\")") does.

Could you share a repro script that produces those log lines?

@nightroman
Copy link
Contributor Author

nightroman commented Jan 8, 2025

@kevin-montrose If I use GarnetServer from the latest repo with --lua --lua-transaction-mode then I cannot repro the issue. And the errors are "ERR Unknown Redis command called from script" for both cases, as you say.

I observe problems in a server built from the NuGet package. Have you tried this scenario, by any chance?

@nightroman
Copy link
Contributor Author

My server built from the NuGet package is this https://github.com/nightroman/FarNet.Redis/tree/main/src/GarnetServer
with

    <PackageReference Include="Microsoft.Garnet" Version="1.0.49" />

@nightroman
Copy link
Contributor Author

nightroman commented Jan 8, 2025

FWIW My normal Lua tests pass (not many, a couple). It just abnormal Lua scripts calling missing Redis commands misbehave.

https://github.com/nightroman/FarNet.Redis/blob/main/tests/Scripting.test.ps1
(the last is commented out due to 1.0.49 issues)

@kevin-montrose
Copy link
Contributor

@kevin-montrose If I use GarnetServer from the latest repo with --lua --lua-transaction-mode then I cannot repro the issue. And the errors are "ERR Unknown Redis command called from script" for both cases, as you say.

I observe problems in a server built from the NuGet package. Have you tried this scenario, by any chance?

I tested with the repo at the .49 commit, I'll try with the nuget package.

@kevin-montrose
Copy link
Contributor

Brand new project hosting .49 (command line args (--lua --lua-transaction-mode --logger-level Trace), I cannot repro the exceptions:
Image

Using SE.Redis 2.8.22 in that LinqPad script, .NET 8 for the console app, tested Debug and Release configs.

@nightroman
Copy link
Contributor Author

Please try .NET 9 for the console app. I see problems with .NET 9, but not with .NET 8.

@nightroman
Copy link
Contributor Author

I can also reproduce the issue with the latest code from the Garnet repo, if I use in main/GarnetServer/GarnetServer.csproj

    <TargetFrameworks>net9.0</TargetFrameworks>

@kevin-montrose
Copy link
Contributor

kevin-montrose commented Jan 9, 2025

Yeah, .NET 9 does it. Interesting.

While reported from KeraLua, you get it with a plain p/invoke too.

Digging through lua_error, ultimately it's just a longjmp.

Very strange... but it twigged a memory. .NET team has been working on exception performance, and recently defaults a new implementation on.

If I disable it (setting environment variable DOTNET_LegacyExceptionHandling to 1 ) the error goes away.

So I think this is a .NET regression, I'm putting together a minimum repro and will open an issue over there.

@nightroman, as a sanity check, can you apply that setting and see if the issue goes away?

edit: Oops, misclicked and closed this by accident - reopened. Sorry for the email spam!

@kevin-montrose
Copy link
Contributor

Opened an issue against .NET: dotnet/runtime#111242

@nightroman
Copy link
Contributor Author

nightroman commented Jan 9, 2025

@kevin-montrose I can confirm that setting DOTNET_LegacyExceptionHandling to 1 does the trick and both calls work in the reasonable way.

NB And return redis.pcall("missing") (pcall, not call, not sure what it is :) now works the same way, used to fail "not nice" before (not necessarily in .48, I commented out this test some time ago due to unwanted server error log messages).
UPDATE: nope, sorry, still the same issue but this is a separate issue, if an issue at all.
It gives [string "return redis.pcall("missing")"]:1: attempt to call a nil value (field 'pcall')

@kevin-montrose
Copy link
Contributor

UPDATE: nope, sorry, still the same issue but this is a separate issue, if an issue at all. It gives [string "return redis.pcall("missing")"]:1: attempt to call a nil value (field 'pcall')

Yeah, that's because Garnet's Lua preamble hasn't defined all the things Redis does - so redis.pcall(...) just straight up does not exist (there are plenty of others where that is true too, not just pcall). It's not a regression, but a place for future improvement. I intend to get to it eventually, but it's not top of my list right now.

@badrishc
Copy link
Contributor

Closing as there is a workaround for .NET 9, and we would have to wait for .NET to fix the underlying cause.

@badrishc
Copy link
Contributor

Reopening as I see dotnet/runtime#111242 is closed?

@badrishc badrishc reopened this Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants