Sharepoint server incremental sync #2335

sjors101 · 2024-04-03T11:59:30Z

Closes #2334

This allows users to run incremental syncs with the Sharepoint server connector. It uses the default timestamp cursor from the framework. Tested agains the Sharepoint 2019 (16.0.0.10337: 1). The test could use some improvement

What is included in the incremental sync:

New objects (sites / libraries / lists / list_items / drive_items) since last sync
Modified objects (sites / libraries / lists / list_items / drive_items) since last sync

What is not included in the incremental sync:

Delete objects (sites / libraries / lists / list_items / drive_items)

Checklists

Pre-Review Checklist

this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
this PR has a meaningful title
this PR links to all relevant github issues that it fixes or partially addresses
if there is no GH issue, please create it. Each PR should have a link to an issue
this PR has a thorough description
Covered the changes with automated tests
Tested the changes locally
Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)

artem-shelkovnikov

Thank you for your contribution!

I skimmed through the code and wondering now - does the incremental sync here optimise calls to Sharepoint Server, or does it filter out the data in the connector fully?

sjors101 · 2024-04-03T12:19:21Z

Thank you for your contribution!

I skimmed through the code and wondering now - does the incremental sync here optimise calls to Sharepoint Server, or does it filter out the data in the connector fully?

It filters out data in the connector. This still requires a lot of calls to the Sharepoint API, since we need to retrieve the modified timestamp. Since 'all' Sharepoint objects are in a list we can filter based on the meta data before downloading the actual drive_items / attachments, which is a big win.

artem-shelkovnikov · 2024-04-03T12:29:32Z

Theoretically it should already work like this, incremental syncs are already enabled for Sharepoint Server connector:

https://github.com/elastic/connectors/blob/main/connectors/sources/sharepoint_server.py#L437-L442:

Since get_docs_incrementally is not specified in the connector, it will use get_docs function and will skip the documents who have their _timestamp field provided that is not later than _timestamp of documents already saved in the index.

Did you by chance test if your implementation is faster or delivers different results, than our default incremental sync for this source?

sjors101 added 3 commits April 3, 2024 11:42

incremental sync

8b617eb

incremental sync

9e2da81

incremental sync

763f531

sjors101 requested a review from a team as a code owner April 3, 2024 11:59

github-actions bot added auto-backport v8.14.0.0 labels Apr 3, 2024

artem-shelkovnikov reviewed Apr 3, 2024

View reviewed changes

artem-shelkovnikov added sharepoint community-driven labels Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharepoint server incremental sync #2335

Sharepoint server incremental sync #2335

sjors101 commented Apr 3, 2024

artem-shelkovnikov left a comment

sjors101 commented Apr 3, 2024

artem-shelkovnikov commented Apr 3, 2024

Sharepoint server incremental sync #2335

Are you sure you want to change the base?

Sharepoint server incremental sync #2335

Conversation

sjors101 commented Apr 3, 2024

Closes #2334

Checklists

Pre-Review Checklist

artem-shelkovnikov left a comment

Choose a reason for hiding this comment

sjors101 commented Apr 3, 2024

artem-shelkovnikov commented Apr 3, 2024