Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to show the outputs result on a web-service? or how can i get the result of inferrence for other application? #126

Open
2 of 3 tasks
xujiangyu opened this issue Jan 22, 2024 · 3 comments
Labels
question Further information is requested

Comments

@xujiangyu
Copy link

xujiangyu commented Jan 22, 2024

Prerequisites

Before submitting your question, please ensure the following:

  • I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
  • I have carefully read and followed the instructions in the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Question Details

Please provide a clear and concise description of your question. If applicable, include steps to reproduce the issue or behaviors you've observed.

Additional Context

Please provide any additional information that may be relevant to your question, such as specific system configurations, environment details, or any other context that could be helpful in addressing your inquiry.

@xujiangyu xujiangyu added the question Further information is requested label Jan 22, 2024
@hodlen
Copy link
Collaborator

hodlen commented Jan 22, 2024

Hi @xujiangyu ! If you are referring to the examples/server application, you can access it by entering the server address (e.g., 127.0.0.1:8080) in your browser. This allows you to interact with the model via a simple UI and see the outputs. For more details, please refer to the server documentation. Additionally, all inference outputs from the server are also printed to stdout.

For other applications, most of them print the inference results in the command line. You can find usage instructions in the examples/[application] directory, where each application's README and source code are available.

@xujiangyu
Copy link
Author

Hi @xujiangyu ! If you are referring to the examples/server application, you can access it by entering the server address (e.g., 127.0.0.1:8080) in your browser. This allows you to interact with the model via a simple UI and see the outputs. For more details, please refer to the server documentation. Additionally, all inference outputs from the server are also printed to stdout.

For other applications, most of them print the inference results in the command line. You can find usage instructions in the examples/[application] directory, where each application's README and source code are available.

Thank you for your reply. I wonder how to add background knowledge in the parameters ,such as for RAG flow. I check the parameters of the main func and didn't recognise such a specific parameter.

@hodlen
Copy link
Collaborator

hodlen commented Jan 24, 2024

Adding background knowledge is quite an application layer concept and is no more than injecting information in prompts. This project focuses on the LLM inference and doesn't provide convenient support for that.

I suggest using some wrappers like the llama-cpp-python library (you can use our forked version here), or the server endpoint. And then you can use any mainstream orchestration frameworks like LangChain to easily achieve the RAG workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants