Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: wire decoder for String #75

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

riz0id
Copy link
Collaborator

@riz0id riz0id commented Mar 15, 2022

  • Added a Parser RawPrimitive String decoder
  • Added a roundtrip test for the new combinator and the existing String encoder
  • Formatted proto3-wire.cabal
  • Fixed .gitignore

@riz0id riz0id self-assigned this Mar 15, 2022
@riz0id riz0id requested review from evanrelf and j6carey March 15, 2022 04:29
text >= 0.2 && <1.3,
transformers >=0.5.6.2 && <0.6,
unordered-containers >= 0.1.0.0 && <0.3,
utf8-string,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bounds at all?


-- | Convert key-value pairs to a map of keys to a sequence of values with that
-- key, in their reverse occurrence order.
--
-- >>> toMap ([(FieldNumber 1, 3),(FieldNumber 2, 4),(FieldNumber 1, 6)] :: [(FieldNumber,Int)])
-- >>> toMap ([(1,3),(2,4),(1,6)] :: [(FieldNumber,Int)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documentation, so it's probably better to leave in the explicit newtype constructor.

(Maybe ParseError)
deriving (Show, Eq, Ord)
data ParseError
= -- | A 'WireTypeError' occurs when the type of the data in the protobuf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @evanrelf mentioned on another PR, it might be helpful to have a separate PR for widespread style changes to code that you are not otherwise touching. I don't mind a bit of unrelated change, but this is pretty massive.

where
msg = "Wrong wiretype. Expected " ++ expected ++ " but got " ++ show wrong
let msg = "Wrong wiretype. Expected " ++ expected ++ " but got " ++ show wrong
in Left (WireTypeError (pack msg))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this change?

msg = "Failed to parse contents of " ++
expected ++ " field. " ++ "Error from cereal was: " ++ cerealErr
let msg = "Failed to parse contents of " ++ expected ++ " field. " ++ "Error from cereal was: " ++ cerealErr
in Left (BinaryError (pack msg))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this change?

-- | Decode a length-delimited wire field into a 'String'.
string :: Parser RawPrimitive String
string = Parser $ \case
LengthDelimitedField bs -> pure (ByteString.UTF8.toString bs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will silently use replacement characters in place of invalid UTF-8. It would be more consistent with existing practice, and perhaps slightly safer from a security perspective, if we were to fail out of the parser instead.

case decodeUtf8' $ BL.fromStrict bs of
Left err -> Left (BinaryError (pack ("Failed to decode UTF-8: " ++ show err)))
Right txt -> return txt
wrong -> throwWireTypeError "Text" wrong
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems to be incorrect. What is being discussed in the error message is wire format and perhaps protobuf field type ("string"), not how we represent types internally within a Haskell program ("Text"). In this case, we are distinguishing between type 2 and other wire types:

https://developers.google.com/protocol-buffers/docs/encoding#structure

Comment on lines +129 to +132
, roundTrip
"string"
(Encode.string 1)
(one Decode.string "a utf8 stringよ?よ" `at` 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding a test, but what is the motivation for providing a decode-to-String feature? Do you plan to actually serialize to and from String in a real program?

(Please note that proto3-suite does not make use of String in this way.)

Though it is true we already have a String encoder, I am not sure that it was ever really used, and might be dead code. Certainly the prior absence of a decoder suggests that it is not really more than a demonstration.

If you have a use for serializing to and from the String type, then great. I would just like to know that you do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants