Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support truncate collection #32994

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PowderLi
Copy link
Contributor

@sre-ci-robot sre-ci-robot added area/internal-api size/XXL Denotes a PR that changes 1000+ lines. labels May 11, 2024
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: PowderLi
To complete the pull request process, please assign yanliang567 after the PR has been reviewed.
You can assign the PR to them by writing /assign @yanliang567 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify bot added the dco-passed DCO check passed. label May 11, 2024
Copy link
Contributor

mergify bot commented May 11, 2024

@PowderLi

Invalid PR Title Format Detected

Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:

  1. Title Format: The PR title must begin with one of these prefixes:
  • feat: for introducing a new feature.
  • fix: for bug fixes.
  • enhance: for improvements to existing functionality.
  • test: for add tests to existing functionality.
  • doc: for modifying documentation.
  • auto: for the pull request from bot.
  1. Description Requirement: The PR must include a non-empty description, detailing the changes and their impact.

Required Title Structure:

[Type]: [Description of the PR]

Where Type is one of feat, fix, enhance, test or doc.

Example:

enhance: improve search performance significantly 

Please review and update your PR to comply with these guidelines.

Copy link
Contributor

mergify bot commented May 11, 2024

@PowderLi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented May 11, 2024

@PowderLi ut workflow job failed, comment rerun ut can trigger the job again.

@PowderLi PowderLi force-pushed the master-truncate-collection branch from 32201a0 to 1a50b05 Compare May 13, 2024 09:52
Copy link
Contributor

mergify bot commented May 13, 2024

@PowderLi ut workflow job failed, comment rerun ut can trigger the job again.

Copy link
Contributor

mergify bot commented May 13, 2024

@PowderLi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link

codecov bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 95.78686% with 34 lines in your changes are missing coverage. Please review.

Project coverage is 82.31%. Comparing base (12e8c6c) to head (71676fe).
Report is 52 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #32994      +/-   ##
==========================================
+ Coverage   82.20%   82.31%   +0.10%     
==========================================
  Files        1009     1012       +3     
  Lines      128884   129688     +804     
==========================================
+ Hits       105945   106747     +802     
+ Misses      18952    18930      -22     
- Partials     3987     4011      +24     
Files Coverage Δ
internal/datacoord/index_meta.go 98.32% <100.00%> (+0.50%) ⬆️
internal/datacoord/index_service.go 95.83% <100.00%> (+1.11%) ⬆️
internal/distributed/datacoord/client/client.go 98.00% <100.00%> (+0.04%) ⬆️
internal/distributed/datacoord/service.go 89.03% <100.00%> (+0.19%) ⬆️
internal/distributed/rootcoord/client/client.go 93.81% <100.00%> (+0.12%) ⬆️
internal/distributed/rootcoord/service.go 81.88% <100.00%> (+0.14%) ⬆️
internal/metastore/catalog.go 100.00% <ø> (ø)
internal/proxy/impl.go 86.58% <100.00%> (+0.15%) ⬆️
internal/proxy/task.go 88.05% <100.00%> (+0.29%) ⬆️
internal/rootcoord/alter_alias_task.go 100.00% <100.00%> (ø)
... and 15 more

... and 35 files with indirect coverage changes

@PowderLi PowderLi force-pushed the master-truncate-collection branch from 1a50b05 to 3bc5311 Compare May 13, 2024 16:18
Copy link
Contributor

mergify bot commented May 13, 2024

@PowderLi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented May 20, 2024

@PowderLi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@PowderLi PowderLi force-pushed the master-truncate-collection branch from ecfd633 to 24e277b Compare May 21, 2024 04:09
@mergify mergify bot added the ci-passed label May 21, 2024
Signed-off-by: PowderLi <min.li@zilliz.com>
@PowderLi PowderLi force-pushed the master-truncate-collection branch from 24e277b to 71676fe Compare May 21, 2024 13:13
@mergify mergify bot added ci-passed and removed ci-passed labels May 21, 2024
// CreateIndexesForTemp returns all indexes created on provided collection.
func (s *Server) CreateIndexesForTemp(ctx context.Context, req *indexpb.CollectionWithTempRequest) (*commonpb.Status, error) {
log := log.Ctx(ctx).With(zap.Int64("collection", req.CollectionID), zap.Int64("tempCollection", req.TempCollectionID))
log.Info("receive CopyIndexes to temp collection request")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also use info level in the following log.

err := s.meta.indexMeta.CreateIndex(&model.Index{
CollectionID: req.CollectionID,
IndexID: int64(0),
IsDeleted: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lead the indexes of the collection removed eventually. I think we can't do this since truncating a collection is not promised to be successful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guard index
forbid new index operations happened in the same collection
if truncate fail, the guard index will be remove while we drop the temp collection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the special IndexID 0 as a constant for better readability.

return merr.Status(err), nil
}
log.Debug("created a guard index(id=0) of target collection")
indexes := s.meta.indexMeta.GetIndexesForCollection(req.GetCollectionID(), "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only copying all these indexes is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what index missed you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry for my expressions. I mean I think copying all these indexes is enough, no need to create a guard index if we don't limit AlterIndex/CreateIndex in datacoord.

}

// DropIndexesForTemp returns all indexes created on provided collection.
func (s *Server) DropIndexesForTemp(ctx context.Context, req *indexpb.CollectionWithTempRequest) (*commonpb.Status, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reuse the datacoord.DropIndex directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to clear the guard index in the target collection whatever truncate success or fail

return err
}
kvMap[key] = string(value)
log.Ctx(ctx).Debug("add new indexes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use info level.

@@ -58,6 +58,7 @@ type Broker interface {
GcConfirm(ctx context.Context, collectionID, partitionID UniqueID) bool

DropCollectionIndex(ctx context.Context, collID UniqueID, partIDs []UniqueID) error
DropTempCollectionIndexes(ctx context.Context, collID UniqueID, tempCollID UniqueID) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DropCollectionIndex is enough.

return t.assignChannels()
}

func (t *copyCollectionTask) genCreateCollectionMsg(ctx context.Context, ts uint64) *ms.MsgPack {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be wonderful if we can make this method a util function, since the createCollectionTask can reuse this.

"github.com/milvus-io/milvus/pkg/util/typeutil"
)

type copyCollectionTask struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, copyCollectionTask did almost the same things with createCollectionTask except the collection name, can we combine these two tasks into one?

log.Info("failed to release temp collection", zap.Error(err))
return merr.Status(err), nil
}
_, err = c.garbageCollector.GcCollectionData(ctx, t.collInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wati for this ts to be synced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, we can make this a redo task, which is very useful to recycle the resources when any error is encounted, especially for releasing collection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just similar to dropping a collection.

}
defer func() {
log.Debug("drop the temp collection async")
c.dropCollectionForTemp(ctx, in, t.collectionID, t.tempCollectionID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we should do this even if the truncating operation is done?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can provide a abstraction which combine the undoTask and redoTask for the truncating workflow, like below:

type generalTask struct{
  undoTask
  redoTask
}

func (t *generalTask) Execute() error {
  ReturnErrIf(t.undoTask.Execute())
  go t.redoTask.Execute()
  return nil
}

And truncating a collection in fact can be simplified to:

undo.AddStep(CreateTempCollection, RemoveTempCollection) // 1
undo.AddStep(CreateIndexesForCollection, nullstep) // 2
undo.AddStep(LoadTempCollection, nullstep) // 3
undo.AddStep(ExchangeCollections, nullstep) // 4
redo.AddStep(RemoveOriginalCollection) // 5

Consider the recovery and atomicity:

  • Fail at step 1, no temp collection was created;
  • Fail at step 2, the step executor will execute the undo step of 1, RemoveTempCollection, which will drop the indexes and releasing the temp collection, rootcoord will also do this when it starts;
  • Fail at 3 or 4, same with 2;
  • We can return when 4 is succeeded, because 4 is atomic so no any side effect will be brought;
  • After 4 is done, the gc of original collection is asynchronous;

@PowderLi PowderLi requested a review from longjiquan May 27, 2024 05:05
@PowderLi PowderLi marked this pull request as draft May 27, 2024 05:05
@sre-ci-robot sre-ci-robot added the do-not-merge/work-in-progress Don't merge even CI passed. label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants