Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fetch and parse full gmail message #5160

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rostaklein
Copy link
Contributor

first part of #4108
related PR #5081

@@ -33,6 +33,23 @@
"internalConsoleOptions": "openOnSessionStart",
"console": "internalConsole",
"cwd": "${workspaceFolder}/packages/twenty-server/"
},
{
"name": "twenty-server - gmail fetch debug",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit overkill, let's remove it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didnt mean to commit, but I needed a way to debug the email fetching, I was hoping that the worker would do that, but it didnt, any tips how to do that? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the worker should do it actually, if you start the worker with the debugger and place a breakpoint it should respect it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It finally works! :) The missing peace for me was actually uncommenting the MESSAGE_QUEUE_TYPE env variable so its not doing the sync type of queue which always landed in one process and the worker didnt pick it up :D Im still learning a lot about this project and there are always some parts which you guys take for granted and im getting surprised :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus i got finally a person created out of the conversation as well 🎉 had to figure out all these conditions that were made with a good intention 😄 (sending a gmail to gmail message didnt do what I wanted it to do)

internalDate: string;
import { gmail_v1 } from 'googleapis';

export type GmailMessageParsedResponse = gmail_v1.Schema$Message & {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of having errors mixed with the message. Also, it seems that the new type is having all fields as optionals. Can we infer a type from gmail_v1.Schema$Message with all fields we need required?

Do we need to have errors here or can we add it as an additional parameter of functions when needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, it was there before, I assume perhaps because of the way how fetch-by-batch.service.ts is implemented? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't you adding the errors to this type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have errors here or can we add it as an additional parameter of functions when needed?

I tried to have a look at this once again and even though im still not completely able to understand what fetch-by-batch.service.ts it apparently adds the error property to the parsed object when parsing the response here https://github.com/twentyhq/twenty/blob/main/packages/twenty-server/src/modules/messaging/services/fetch-by-batch/fetch-by-batch.service.ts#L107

We can of course make the type nicer by making it a literal like gmail_v1.Schema$Message | GmailError but just imagine looping through such an array. Currently it checks whether the error property exists and returns early if it does https://github.com/twentyhq/twenty/blob/main/packages/twenty-server/src/modules/messaging/services/fetch-messages-by-batches/fetch-messages-by-batches.service.ts#L70-L74

Can we infer a type from gmail_v1.Schema$Message with all fields we need required?

I mean, this is the best type we can get, or is it not? It comes straight from Google that clearly says some properties might are optional. When testing, those we need were always defined, but isnt it better to be safe than sorry? Im doing a runtime assertion using:

assert(id);
assert(threadId);
assert(historyId);
assert(internalDate);

This would throw early (message wouldnt be parsed) if some of them are actually missing. Safer approach actually rather than letting it throw further down the line with "cant read property X of undefined" or saving invalid data into the DB.

Let me know what you think about these two topics :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the assert pattern that you are using

I've just remove the errors from your type and it looks good


if (!from) throw new Error('From value is missing');

const participants = [
...formatAddressObjectAsParticipants(from, 'from'),
...formatAddressObjectAsParticipants(to, 'to'),
...formatAddressObjectAsParticipants(cc, 'cc'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rostaklein why are we removing these participants?

@charlesBochet
Copy link
Member

@rostaklein I've rebased your PR on main and made minor changes.

Now I have two questions:

  1. why did we stop fetching participants from cc and bcc in fetch-messages-by-batch.service.ts?
  2. I have tried to import my email. And for some messages, "Message-ID" header is actually "Message-Id". I've added toLowerCase in the header check ==> I could not fully test that this was functionnal because I'm hitting my Google API dev rate limiting. I'll re-try in 1 hour

@charlesBochet
Copy link
Member

charlesBochet commented May 8, 2024

I was able to re-test and it seems to work.

High level strategy

However, you are right the error handling is broken. Here is what we should do IMO:

  • very early (so in fetchAllMessages from FetchMessagesByBatchesService) we should understand if we have an error or not. If we have a 404, let's just drop the message. if we have anything else (429, 403, 400, 500) we should put the message channel syncStatus in FAILED state (we will revisit these status later, they are not precise enough)

We should stop handling an array of errors in the code ; either we ignore for 404 and just drop the message, either we throw and put the whole channel in failed state. Treating these error edge cases early is the best way to have a simple and robust code. Anyway if we get a 429, 403... on a message, we will get it for all messages

Inputs

Could you do this change in the error handling? I believe this should only impact:

  • GmailFetchMessageContentFromCacheService: instead of processing errors array, we should have a try catch that's ready to handle the single error. Keep the current logic when fail (put back the messages in the cache and put the channel in failed state)
  • FetchMessagesByBatchesService: 404: silent + remove the message from the list / other error code: throw, this will be catched by GmailFetchMessageContentFromCacheService

Manual testing

We should make sure that the following cases are covered:

  • nominal case without error
  • 429 when rate limit is hit (just spam, you'll hit it)
  • 403 if token is not valid
  • 404 if the message has been deleted from gmail in the mean time

@rostaklein
Copy link
Contributor Author

rostaklein commented May 9, 2024

why did we stop fetching participants from cc and bcc in fetch-messages-by-batch.service.ts

For some reason I thought the new format is not returning it. I was mistaken. It is there indeed so Ive added it back 😉

404: silent + remove the message from the list / other error code: throw

Good idea. Done. 👍 Not sure how much youll like the solution thoug. I had to do it this way 👇 as thats exactly what we are getting via axios from FetchByBatchesService

export type GmailMessageParsedResponse =
  | gmail_v1.Schema$Message
  | GmailMessageError;

manual testing

I was able to test the 404 by completely deleting the message (also out of the trash), cant hit the limit as im using just a dummy email with 9 messages though. Tried to change token in the middle of request, but also didnt succeed (refresh token i guess?).

Comment on lines +279 to +283
await this.messageChannelRepository.updateSyncStatus(
gmailMessageChannelId,
MessageChannelSyncStatus.FAILED,
workspaceId,
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now being called for every single caught error, not just 429. I think thats what we actually want, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

@charlesBochet
Copy link
Member

I've looked at your code, that's it!
I'll test it with all edge cases tonight

@rostaklein
Copy link
Contributor Author

Let me know whether theres anything else to be done, or we can continue with the further changes..? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants