Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat - DRAFT - Support plain text #2827

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

RichardoC
Copy link

@RichardoC RichardoC commented Apr 25, 2024

Describe Your Changes

  • Attempts to add support for plaintext files for RAG

Known issue, doesn't actually work and fails with the following when you do make dev and attempt to upload a plain file

jan:dev: Warning: Indexing all PDF objects jan:dev: Error jan:dev: at InvalidPDFExceptionClosure (/Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:45648:36) jan:dev: at Object.<anonymous> (/Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:45651:3) jan:dev: at __w_pdfjs_require__ (/Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:45243:31) jan:dev: at Object.<anonymous> (/Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:52978:24) jan:dev: at __w_pdfjs_require__ (/Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:45243:31) jan:dev: at /Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:45286:19 jan:dev: at /Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:45289:11 jan:dev: at webpackUniversalModuleDefinition (/Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:45222:20) jan:dev: at /Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:45223:4 jan:dev: at Object.<anonymous> (/Users/REDACTED/jan/extensions/@janhq/assistant-extension/dist/node/pdf-b2cddb6e.js:63434:3) jan:dev: at Module._compile (node:internal/modules/cjs/loader:1271:14) jan:dev: at Module._extensions..js (node:internal/modules/cjs/loader:1326:10) jan:dev: at Module.load (node:internal/modules/cjs/loader:1126:32) jan:dev: at Module._load (node:internal/modules/cjs/loader:967:12) jan:dev: at l._load (node:electron/js2c/asar_bundle:2:13642) jan:dev: at Module.require (node:internal/modules/cjs/loader:1150:19) { jan:dev: message: 'Invalid PDF structure' jan:dev: }

Fixes Issues

Self Checklist

  • Added relevant comments, esp in complex areas
  • Updated docs (for bug fixes / features)
  • Created issues for follow-up changes or refactoring needed

@RichardoC
Copy link
Author

I've been using 28e4405 as the basis for this, but can't actually get it to work so figured I'd raise this work in progress PR to get some help

@RichardoC RichardoC changed the title # 1995 - DRAFT - Support plain text Feat - DRAFT - Support plain text Apr 25, 2024
@Van-QA Van-QA requested a review from a team April 26, 2024 02:57
@RichardoC
Copy link
Author

RichardoC commented Apr 26, 2024

Also, not sure this is the right approach for adding this functionality. I'm wondering if it would be better to support a generic "upload file" and then server side decide whether it's a pdf, plain text file, or something else and then choose the right loader in extensions/assistant-extension/src/node/retrieval.ts
It's probably worth adding a variable "filetype" to include with the RAG so the model knows what type of file it was

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: users can add epub and txt files for RAG retrieval functions in Jan
1 participant