Using the Qualcomm AI Inference Suite Directly from a Web Page

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

Applying the Qualcomm AI Inference Suite directly from a web page using JavaScript makes it easy to create and understand how AI inference works in in web solutions.

Qualcomm Technologies in collaboration with Cirrascale has a free-to-try AI inference service that makes it easy for developers to connect and run inference workloads from code. In our last blog, we covered how easy it is to get started with the Python SDK centered around a sentiment analysis scenario. In this one, we’ll show how to access the REST APIs directly from a web page using JavaScript.

Sample scenario

In this example, we are going to have AI make “creative” content for us – haiku. Haiku are simple to craft and typically follow a 3-5-3 syllable layout. In the Japanese language, this isn’t hard to do as many words follow predictable syllabic patterns. In other languages, we just try to get close.

Haiku started as a form of entertainment where a group of people competed to come up with the best one on a chosen theme. Matsuo Bashō (1644-1694) was a renowned Japanese poet, often considered the greatest master of haiku. In typical form, many of these poems were about nature. But with AI we can supply any topic and get something interesting and reflective.

Steps

Create a simple web page for the user interface
Set key and endpoint
Build your prompt from user input
Set up your API request
Call the chat function API
Clean up the result
Display the haiku in the web page

Steps with commentary

Create a simple web page for the user interface

To demonstrate how easy it is, we’ll use straight HTML without any 3^rd party JavaScript frameworks. For the user experience of this application, we don’t need to use React, Vue, Angular or anything else. We simply create a vanilla page to ask the user for inspiration, a display area for the haiku, and a slot for the user’s API key. We use JavaScript and the built-in function onchange to store variables from the user. When the data is ready, we insert it directly into the <div id=”haiku”></div>.

Set key and endpoint

First, we store the API endpoint and a placeholder variable for our API key.

let API_ENDPOINT = "https://aisuite.cirrascale.com/apis/v2/completions";
let API_KEY = "";

Build your prompt from user input

When using LLMs you need to create a prompt that returns only what you actually want. If you are not precise enough, you may get back a lot more of an answer than you intended. In this case, we need to build up our prompt with a set of guidelines for the system to use while ideally returning only the text of a haiku.

let SYSTEM_ROLE = "You are a historical Japanese figure who is an expert at crafting haiku. ";
let USER_ROLE = "Please write a haiku about: ";
let GUARDRAILS = ". Respect the 5-7-5 syllable rule. Respond only with the haiku and no other text.";

Example of AI generated haiku

Later we will concatenate these together with the user’s inspiration to build a prompt that we submit to the inference service.

Set up your API request

We don’t know how long it will take the request to return so we’ll use an asynchronous function to make the API call so that we don’t block the user experience. If the user didn’t supply an API key, we will notify them using another convenience function we created.

async function getNewHaiku(inspiration) {
    let url = API_ENDPOINT;

    // check for presence of API_KEY
    if (API_KEY == "") {
        updateDivContent('haiku', 'Sorry, no poetry without an API key!');
        return;
    }

Next, we set up the options for our API call using the built-in JavaScript function fetch to get a response. The API documentation specifies that we need to make this a POST call and put our key in the header. The body of the post will be JSON containing our prompt, our selected LLM model, a flag for if we want the response to be streaming or not, and finally how many tokens to create.

Haiku are pretty short, so we chose to return only 32 tokens, which makes the response very fast.

let options = {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "Authorization": "Bearer " + API_KEY
    },
    body: JSON.stringify({
        prompt: SYSTEM_ROLE + USER_ROLE + inspiration + GUARDRAILS,
        model: "Llama-3.1-8B",
        stream: false,
        max_tokens: 32
    }),
};

Call the chat function API

We are now ready to make our API call. We’ll follow good practice and let the user know if the server is not responding.

// Retrieve an answer

let response = await fetch(url, options);
let responseOK = response && response.ok;
let data ="";
if (responseOK) {
    data = await response.json();
} else {
    data = "no server response!";
};

Clean up the result

During testing, we found that sometimes the LLM would return extra data at the end of the response or put line returns in the output and sometimes not. To provide a cleaned-up response, some regular expressions and string checking was needed to ensure that only a formatted haiku with proper line breaks was ready to go. There is probably a more elegant way to do it, but this works.

let str = data.choices[0].text;
let regex = /-+/;
str = str.split(regex)[0];
regex = /—+/;
str = str.split(regex)[0];
regex = /(?=[A-Z])(?![a-z])/;
let verses = str.split(regex);
if (verses.length < 5 ) { verses.push(''); }
for (let i = 0; i < verses.length; i++) {
    if (typeof verses[i] === undefined ) {
    verses[i] = '';
    }
}
str = verses[0]+verses[1]+verses[2]+verses[3];

Display the haiku in the web page

Finally, we’ve arrived at the last step for this scenario. We call our helper function to update the content of the page with the haiku and let the user enjoy their custom poem!

updateDivContent("haiku", str);

If you are interested to see all of the code inside a single HTML page, take a look at the GitHub repo.

Other considerations

In this contrived example, we hardcoded the endpoint and LLM model choice. We could extend the example to pull down all the available LLMs and let the user choose.

To avoid having to hardcode the API key, we allowed the user to input their own. If we were going to create an app around this idea, we’d need to make sure all secrets are hidden away.

A useful pattern for creating apps based on a predefined prompt template is to create a simple microservice that acts as single purpose API. The JavaScript code here could be run inside of Node.js in a container to do just that. Any programming language could be used though, and in the next blog post, we’ll explore creating a microservice endpoint that calls the Qualcomm AI Inference Suite behind the scenes.

Try it out yourself

The process of using inference on a scalable platform like the Qualcomm AI Inference Suite is as straightforward as using any other simple API. Try calling from inside your own web project!

Like what you are seeing? Connect with fellow developers, get the latest news and prompt technical support by joining our Developer Discord.

Ray Stephenson
Developer Relations Lead, Cloud, Qualcomm Technologies

If you're building AI or vision-enabled products, you've come to the right place.