Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle literals in FETCH response #72

Open
lllama opened this issue Nov 25, 2021 · 5 comments
Open

How to handle literals in FETCH response #72

lllama opened this issue Nov 25, 2021 · 5 comments

Comments

@lllama
Copy link

lllama commented Nov 25, 2021

Thanks for this library - I've had some good success with it so far.

However, if I have an email with a double quote in the subject, then the FETCH response for the message is split over multiple lines, and uses a literal line for the subject.

Is there a recommended way to handle this? If I do a FETCH for all messages, then the situation seems even trickier to handle, as there seems to be no easy way to tell which line belongs to which message.

The standard lib imaplib seems to bundle the responses into tuples of envelopes and data but we only get the response lines (unless I've missed something).

@bamthomas
Copy link
Collaborator

Ok thanks for this issue and the link. I'm having a look on this soon.

@bamthomas
Copy link
Collaborator

as for #71 the API is not the same as imaplib. It is formatting less the responses from the imapserver. Yet it it is structured with a uniform type (list of bytes chains for IMAP protocol parts, and bytearrays for the data).

This is not related to whether there is double quote or not.

For example i tried with a double quote in subject. The first line should be the app getting the last searched UID, without the body for example with only the uid/flags/subject here.

Then when the user asks for a specific mail, then the body is searched with the second command.

result, lines = await imap_client.uid('fetch', '1950:*', '(UID FLAGS BODY.PEEK[HEADER.FIELDS (SUBJECT)])')
print(lines)
result, lines = await await imap_client.uid('fetch', '1984', 'BODY.PEEK[]')
print(lines)

This will display (I separated each list item for clarity):

[
b'1544 FETCH (UID 1950 FLAGS (NonJunk) BODY[HEADER.FIELDS (SUBJECT)] {68}', 
bytearray(b'Subject: =?utf-8?Q?=5bSlack=5d_Sender_sent_you_a_message?=\r\n\r\n'), 
b')', 
b'1545 FETCH (UID 1951 FLAGS (NonJunk) BODY[HEADER.FIELDS (SUBJECT)] {167}', 
bytearray(b'Subject: [adherents] =?utf-8?Q?AGIT_-_Ne_manquez_pas_vos_procha?=\r\n\t=?utf-8?Q?ins_=C3=A9v=C3=A9nements_sur_la_Responsabi?=\r\n\t=?utf-8?Q?lit=C3=A9_Num=C3=A9rique_!?=\r\n\r\n'),
b')'
...
]

so we see that there are groups of 3 lines

  • first is the FETCH response,
  • second is the data
  • third is the closing parenthesis

Then the BODY.PEEK will display :

[
b'1577 FETCH (UID 1984 BODY[] {3339}', 
bytearray(b'Return-Path: <[email protected]>\r\nDelivered-To: [email protected]\r\nX-Envelope-To: [email protected]\r\nReceived: (...message headers) ------GWONCOFAFSCBGQO9559AI5OF8JL2BA\r\nContent-Type: text/plain;\r\n charset=utf-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nComment c\'est "double quotes" ?\r\n\r\n------GWONCOFAFSCBGQO9559AI5OF8JL2BA\r\nContent-Type: text/html;\r\n charset=utf-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nComment c\'est "double quotes" ?\r\n------GWONCOFAFSCBGQO9559AI5OF8JL2BA--\r\n'), 
b')', 
b'Fetch completed (0.001 + 0.000 secs).'
]

Here is quite the same : first is the IMAP response, then the mail content is always at the index 1, and can be directly passed to python mail API with :

msg = email.message_from_bytes(lines[1])
print(msg['subject'])
# will print 'Test "guillemets" '

@jgosmann
Copy link

I think it would be nice if aioimaplib would provide a more high-level API for the fetch (similar to imaplib). While the current format has some structure, relying on it seems a bit hacky. Especially, if one wants to read out multiple data items (e.g. the UID in addition to the body). Concrete problems I see:

  • I don't think there is any guarantee on the order of data items. If fetching multiple multiline strings an access like lines[1] is not sufficient, but one has to extract the preceding "key". Also, one must be aware that the closing parenthesis might be preceded by additional data items (e.g. b'FLAGS (\\Seen)))
  • The server might sent additional data items that weren't requested. I have only seen this for FLAGS, but I was unable to find anything in RFC3501 restricting this (though the formal syntax differentiates msg-att-dynamic and msg-att-static). This would be a problem if an additional literal is included in the response.
  • While any server will most likely always send the email body as literal, I think, it would also be allowed to use a quoted string which then would not appear on a separate line (I suppose).

To actually be sure to handle all these corner cases, I think it is currently required that the consumer parses the response based on the RFC. I did this here using an implemenation of the grammar with pyparsing. However, this still seems to not work with Office365 IMAP (jgosmann/dmarc-metrics-exporter#17). It also has some rough edges:

  • I need to stitch the "structured" response back together, so that I can parse it myself.
  • The last line of the response is annoying because it only states FETCH completed which is not standardized. Only the part <TAG> OK prefixing is, but that is already stripped away by aiomaplib. Thus, it is hard to tell whether something is just the completing line of the command or some invalid response (or response not supported by the parser).

@lllama
Copy link
Author

lllama commented Jan 10, 2022

I agree with the above - in my app, I'm making a FETCH request for ENVELOPES for mails in my mailbox. response.lines gives me a single line per mail, unless I've included a double quote (") in a subject line. This then triggers the subject to be returned as a literal (I'm using Dovecot in this case), which is included as a bytearray, and then the rest of the envelope is returned as a byte string in the next list element i.e. I get three list elements instead of one.

My understanding is that imaplib will group these three lines into a list of their own, and then the entire response is returned as a list of lists.

I believe that's similar to what @jgosmann is talking about.

@jgosmann
Copy link

My statements about the unknown order of data items and potential additional data items included were correct. RFC3501 actually references RFC2683 with recommendations for implementors and clearly states these points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants