Please write a script in python that will display the interaction with this llm in a simple browser. from transformers import LlamaTokenizer, LlamaForCausalLM
import torch
tokenizer = LlamaTokenizer.from_pretrained(“Xianjun/PLLaMa-7b-base”)
model = LlamaForCausalLM.from_pretrained(“Xianjun/PLLaMa-7b-base”).half().to(“cuda”)
instruction = “How to …”
batch = tokenizer(instruction, return_tensors=”pt”, add_special_tokens=False).to(“cuda”)
with torch.no_grad():
output = model.generate(**batch, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(output[0], skip_special_tokens=True)
To display the interaction with the LLM in a simple browser interface, you can use `Flask` for the backend and `HTML` for the front-end. Below is a simple script that integrates your code into a web application:
1. Install Flask if you haven’t already:
“`bash
pip install flask
“`
2. Create the following Python script for the backend (`app.py`):
“`python
from flask import Flask, render_template, request
from transformers import LlamaTokenizer, LlamaForCausalLM
import torch
app = Flask(__name__)
# Load model and tokenizer
tokenizer = LlamaTokenizer.from_pretrained(“Xianjun/PLLaMa-7b-base”)
model = LlamaForCausalLM.from_pretrained(“Xianjun/PLLaMa-7b-base”).half().to(“cuda”)
@app.route(“/”, methods=[“GET”, “POST”])
def index():
response = “”
if request.method == “POST”:
instruction = request.form.get(“instruction”)
if instruction:
# Tokenize input and generate response
batch = tokenizer(instruction, return_tensors=”pt”, add_special_tokens=False).to(“cuda”)
with torch.no_grad():
output = model.generate(**batch, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(output[0], skip_special_tokens=True)
return render_template(“index.html”, response=response)
if __name__ == “__main__”:
app.run(debug=True)
“`
3. Create an `index.html` file inside a `templates` folder in the same directory as `app.py`. This file will be used to render the user interface in the browser:
“`html
Interact with the LLM
Response:
{{ response }}
“`
### How It Works:
– The backend script (`app.py`) uses Flask to serve a simple web application.
– The frontend (`index.html`) provides an input field for users to submit text.
– When the user submits the text, the backend processes the input with the `LlamaForCausalLM` model, generates a response, and displays it in the browser.
### Running the Application:
1. Ensure you have both the `app.py` and `index.html` files correctly set up.
2. Run the script with the following command:
“`bash
python app.py
“`
3. Visit `http://127.0.0.1:5000/` in your browser to interact with the model.
The browser will display the input field and show the response after you submit an instruction.