Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to instruct the model for getting proper key value pair as json format, without getting any other text. #154

Open
Dineshkumar-Anandan-ZS0367 opened this issue Apr 26, 2024 · 6 comments

Comments

@Dineshkumar-Anandan-ZS0367

I need to get json results from the paragraph contains key value pairs, but llam3 instruct model return json format with some unwanted string, how to get proper answer from llama3 model.

or

Anyother options in coding or a parameter available to get that result.

@aqib-mirza
Copy link

If you specify the "format" and set it to "json" you will have your desired results.

@Dineshkumar-Anandan-ZS0367
Copy link
Author

llama3 8b instruct model, how to use this format params, can you share? Need a example or prompt related documentation.

@aqib-mirza
Copy link

Here is an example code
"""model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.float16},
device="cuda",
token = "HF-Token"
)

messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak! and return every answer in JSON format"},
{"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
format = "JSON"
)

terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
prompt,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,

)
print(outputs[0]["generated_text"][len(prompt):])"""

@Dineshkumar-Anandan-ZS0367
Copy link
Author

Thanks a ton sir! I will check this.

@Dineshkumar-Anandan-ZS0367
Copy link
Author

Same prompt and same ocr text from image.
Each request the llm gives different results, how can I maintain the results.

Is there any options for this, I understand this is a llm.

Can you suggest some ideas for prompt to extract key value pairs in a paragraph.

@Dineshkumar-Anandan-ZS0367
Copy link
Author

Getting same result as before inspite of using

prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
format = "JSON"
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants