Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NODE-6136): parse cursor responses on demand #4112

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

nbbeeken
Copy link
Contributor

@nbbeeken nbbeeken commented May 15, 2024

Description

What is changing?

  • Migrate all cursors to use on demand mechanisms for parsing documents as the cursor's are iterated.
  • Fixed the issue with encryption not supporting MongoDBResponse by changing the internal APIs to only operate on Uint8Arrays
  • "decorate" functionality moved into CryptoConnection, supports building the array of keys that were encrypted
  • A new ExplainCursorResponse class defers refactoring explain handling by simulating a cursor response similar to how the driver works now.
  • Preserve the elements generated in the first and only call to parseToElements, previously error detection tossed the element parsing work.
  • Consolidate the definition of "error". A MongoDB command has only failed when ok is set to 0, all other conditions were approximations and led to unnecessary parsing work (remove isError, only check ok === 0)
    • WriteConcernErrors have not been modified but clarified. The makeWriteConcernResultObject would delete properties when ok was set to 0 which is never the case for a writeConcernError because the "command succeeded" the "write concern/requirements failed". The delete logic would not run and the unit tests that asserted the properties did not exist only passed because the mock data fed in was inaccurate.
    • MongoWriteConcernError now takes an entire response (for simplicity it takes a plain object returned from BSON deserialize) if the response contains a writeConcernError property.
Is there new documentation needed for these changes?

No.

What is the motivation for this change?

Performance and enhancing internal capabilities for discrete BSON parsing.

Release Highlight

Cursor responses are now parsed lazily 🦥

MongoDB cursors (find, aggregate, etc.) operate on batches of documents equal to batchSize. Each time the driver runs out of documents for the current batch it gets more (getMore) and returns each document one at a time through APIs like cursor.next() or for await (const doc of cursor).

Prior to this change, the Node.js driver was designed in such a way that the entire BSON response was decoded after it was received. Parsing BSON, just like parsing JSON, is a synchronous blocking operation. This means that throughout a cursor's lifetime invocations of .next() that need to fetch a new batch hold up on parsing batchSize (default 1000) documents before returning to the user.

In an effort to provide more responsiveness, the driver now decodes BSON "on demand". By operating on the layers of data returned by the server, the driver now receives a batch, and only obtains metadata like size, and if there are more documents to iterate after this batch. After that, each document is parsed out of the BSON as the cursor is iterated.

A perfect example of where this comes in handy is our beloved mongosh! 💚

test> db.test.find()
[
	{ _id: ObjectId('665f7fc5c9d5d52227434c65'), ... },
  ...
]
Type "it" for more

That Type "it" for more message would now print after parsing only the documents displayed rather than after the entire batch is parsed.

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken force-pushed the NODE-6136-cursor-response branch 2 times, most recently from df69639 to e6de40a Compare May 16, 2024 14:53
@nbbeeken nbbeeken force-pushed the NODE-6136-cursor-response branch 2 times, most recently from 31efa9c to a37ec31 Compare June 4, 2024 20:28
@nbbeeken nbbeeken marked this pull request as ready for review June 4, 2024 21:25
@baileympearson baileympearson self-requested a review June 5, 2024 19:54
@baileympearson baileympearson self-assigned this Jun 5, 2024
@baileympearson baileympearson added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Jun 5, 2024
Comment on lines 126 to 133
const EMPTY_V = Uint8Array.from([
...[13, 0, 0, 0], // document size = 13 bytes
...[
...[4, 118, 0], // array type (4), "v\x00" basic latin "v"
...[5, 0, 0, 0, 0] // empty document (5 byte size, null terminator)
],
0 // null terminator
]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const EMPTY_V = Uint8Array.from([
...[13, 0, 0, 0], // document size = 13 bytes
...[
...[4, 118, 0], // array type (4), "v\x00" basic latin "v"
...[5, 0, 0, 0, 0] // empty document (5 byte size, null terminator)
],
0 // null terminator
]);
const EMPTY_V = serialize({ v: [] });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

ns,
encrypted,
options,
(responseType ?? MongoDBResponse) as any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come casting is necessary here? I'd expect responseType ?? MongoDBResponse to produce a type of MongoDBResponse, which should satisfy super.command()'s types without casting, and should produce the correct return type without as unknown as MongoDBResponse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I'm committing type crimes. I added a comment clarifying this, but it is not right that we are saying command of T and then passing in MongoDBResponse when responseType is nullish because now that may not match T. I can move to the type crime into the generic or here on the argument, but it will exist until we can only pass in a responseType without defaulting to a different subtype constraint.

typeof MongoDBResponse' is assignable to the constraint of type 'T', but 'T' could be instantiated with a different subtype of constraint 'MongoDBResponseConstructor'

Comment on lines +328 to 332
override shift(options?: BSONSerializeOptions | undefined) {
if (this._length === 0) return null;
this._length -= 1;
return this.toObject(options);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
override shift(options?: BSONSerializeOptions | undefined) {
if (this._length === 0) return null;
this._length -= 1;
return this.toObject(options);
}
override shift(options?: BSONSerializeOptions | undefined) {
return this.toObject(options);
}

This is basically just duck-typing a cursor response and returning the "first document" as the explain result. Any reason we can't just always return the deserialized object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a specific reason beyond defensive programming. I think explain is ripe for a bit of redesign interms of not pretending it is cursor-like, but while it is doing the song and dance of being like a cursor I think it should not accidentally return a truthy object forever from a method that is expected to eventually "empty" out an array.

const response = await super.getMore(batchSize);

this.maxWireVersion = maxWireVersion(this.server);
this._processBatch(response as ChangeStreamAggregateRawResult<TChange>);
// as ChangeStreamAggregateRawResult<TChange>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// as ChangeStreamAggregateRawResult<TChange>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨🪄

@@ -45,7 +44,7 @@ export interface ExecutionResult {
/** The session used for this operation, may be implicitly created */
session?: ClientSession;
/** The raw server response for the operation */
response: Document | CursorResponse;
response: CursorResponse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) move to abstract_cursor.ts maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and rename and better docs while we're at it

@@ -13,7 +13,7 @@ export interface CountDocumentsOptions extends AggregateOptions {
}

/** @internal */
export class CountDocumentsOperation extends AggregateOperation<number> {
export class CountDocumentsOperation extends AggregateOperation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export class CountDocumentsOperation extends AggregateOperation {
export class CountDocumentsOperation extends AggregateOperation<number> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AggregateOperation is not generic and if it were it can't be that a subclass returns an incompatible type from the parent. (ex. Parent -> A -> B, B cannot have a method incompatible with Parent otherwise things that expect "Parent" will break when given Bs) This and the command below are TODO_NODE_3286 related. The operations layer TS does not work well and here in particular because the expectation (and reality) is that every command returns an object, I think returning a "number" should be handled up in the coll.countDocument API. Taking it even further I don't see the value in this subclass? can we just call this.aggregate from within collection.countDocuments?

Stuck the appropriate type annotation on here.

error: new MongoWriteConcernError({}, { errorLabels: ['RetryableWriteError'] }),
maxWireVersion: ABOVE_4_4
},
// {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unintentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed! One test is removed because it never happens (no code nor a label). The others I just reshaped the input to match the command result.

expect(() => new CursorResponse(BSON.serialize({ ok: 1 }))).to.throw(BSONError);
// @ts-expect-error: testing private getter
expect(() => new CursorResponse(BSON.serialize({ ok: 1 })).cursor).to.throw(
MongoUnexpectedServerResponseError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the getters in cursor response throw MongoUnexpectedServerResponseError. Should we be matching on the error message to assert that the error thrown is thrown because the field we're testing is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, added

@@ -48,27 +48,5 @@ describe('Sessions - unit/sessions', function () {
return client.close();
});
});

it('does not mutate command options', function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we're getting rid of this test? It doesn't look like this behavior has changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/mongodb/node-mongodb-native/blob/NODE-6136-cursor-response/test/integration/collection-management/collection.test.ts#L461-L466

Moved here and added those other small countDocument checks when I was fixing that operation up a bit.

@@ -88,6 +88,9 @@ describe('CRUD API explain option', function () {
context(`When explain is ${explainValue}, operation ${name}`, function () {
it(`sets command verbosity to ${explainValue} and includes ${explainValueToExpectation(explainValue)} in the return response`, async function () {
const response = await op.op(explainValue).catch(error => error);
if (response instanceof Error && !(response instanceof MongoServerError)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we even catching the error actually? Looks like we don't use it at all (we have no assertions that it would be an error).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do:

default:
              // for invalid values of explain
              expect(response).to.be.instanceOf(MongoServerError);
              break;

I added this for better debugging, that assertion in the default fails ("not instanceof") but it doesn't print what the error is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Primary Review In Review with primary reviewer, not yet ready for team's eyes
Projects
None yet
2 participants