Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Document Tracking: Multi-Website Support and Continuous Relevancy Insights #17

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

HarounAns
Copy link

@HarounAns HarounAns commented Sep 24, 2023

Problem

In the existing design, visibility into the fetched documents is limited to instances right after the index has been freshly seeded. This approach was restrictive as it only catered to one website at a time, which posed challenges in comprehensive tracking and management. There was no provision to determine the relevancy of documents outside of this immediate post-seeding phase. With the new design, we have broadened our scope by accommodating multiple websites simultaneously. Furthermore, it provides insights into any document deemed relevant, eliminating the constraint of relying solely on the most recently seeded ones.

Solution

Implemented a relevantDocs section that showcases the documents fetched across various websites. This enhancement provides more transparency, ensuring that the user can see which documents are being retrieved, regardless of the recent actions with the crawler.

screen-recording-2023-09-24-at-13634-am_CYlbkfQ4.mov

^ Notice how the fetched documents come from different websites!

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

  1. Navigate to the new relevantDocs section.
  2. Ensure that the fetched documents across different websites are being displayed.
  3. Test fetching documents after both loading and not loading the index via the crawler.
  4. Confirm that the displayed documents in the relevantDocs section are consistent with the expected results based on the recent actions with the crawler.

@HarounAns HarounAns marked this pull request as draft September 24, 2023 05:31
@HarounAns HarounAns marked this pull request as ready for review September 24, 2023 05:47
@HarounAns HarounAns changed the title added relevant docs Enhanced Document Tracking: Multi-Website Support and Continuous Relevancy Insights Sep 24, 2023
@HarounAns
Copy link
Author

@rschwabco please let me know your thoughts

@HarounAns
Copy link
Author

@rschwabco do you feel like this PR has any value. If not I can close it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant