I’m using the google/flan-t5-large model from HuggingFace and encountering problems with the length of its outputs. It consistently returns lengthy conversational replies instead of brief and straightforward responses.
Here’s how I’ve implemented it:
// main.js
const AUTH_TOKEN = "your_token_here"
async function callModel() {
const result = await fetch("https://api-inference.huggingface.co/models/google/flan-t5-large", {
method: "POST",
headers: {
Authorization: `Bearer ${AUTH_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
inputs: "How are you feeling today?",
})
});
const output = await result.json();
console.log(output[0].generated_text.trim());
}
callModel();
And here’s my HTML setup:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Test Page</title>
</head>
<body>
<script src="main.js"></script>
</body>
</html>
I’ve also tried adjusting various parameters:
parameters: {
temperature: 0.5,
top_p: 0.8,
return_full_text: false,
max_length: 50
}
Regardless of the adjustments and different models, I’m still receiving long conversation-like outputs rather than succinct responses. I’m aiming to create a chat interface akin to existing AI assistants, so I really need the responses to be more focused and relevant. Has anyone else faced similar challenges with controlling output length?