Let’s try to wrok with starchat bets
Sure, we can work with Starchat bets. Please provide me with more information about what you have in mind, and I’ll do my best to assist you.
We need a streamlit app that uses these paremeters: Here’s how you can run the model using the `pipeline()` function from 🤗 Transformers:
“`python
import torch
from transformers import pipeline
pipe = pipeline(“text-generation”, model=”HuggingFaceH4/starchat-beta”, torch_dtype=torch.bfloat16, device_map=”auto”)
# We use a variant of ChatML to format each message
prompt_template = “nnn{query}n”
prompt = prompt_template.format(query=”How do I sort a list in Python?”)
# We use a special token with ID 49155 to denote ends of a turn
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.2, top_k=50, top_p=0.95, eos_token_id=49155)
# You can sort a list in Python by using the sort() method. Here’s an example:nn“`nnumbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]nnumbers.sort()nprint(numbers)n“`nnThis will sort the list in place and print the sorted list.
“`
To create a Streamlit app that uses the Starchat model with the given parameters, you can follow these steps:
1. Install the necessary libraries by running the following command:
“`
pip install streamlit torch transformers
“`
2. Create a new Python file, for example, `starchat_app.py`, and import the required libraries:
“`python
import torch
import streamlit as st
from transformers import pipeline
“`
3. Define the Starchat pipeline using the `pipeline()` function:
“`python
pipe = pipeline(
“text-generation”,
model=”HuggingFaceH4/starchat-beta”,
torch_dtype=torch.bfloat16,
device_map=”auto”
)
“`
4. Create a Streamlit app using the `st.sidebar` and `st` functions:
“`python
def main():
st.sidebar.title(“Starchat App”)
query = st.sidebar.text_area(“Enter your query”, value=””, height=100)
if st.sidebar.button(“Generate Response”):
generate_response(query)
def generate_response(query):
prompt_template = “nnn{query}n”
prompt = prompt_template.format(query=query)
outputs = pipe(
prompt,
max_new_tokens=256,
do_sample=True,
temperature=0.2,
top_k=50,
top_p=0.95,
eos_token_id=49155
)
response = outputs[0][“generated_text”].strip()
st.text_area(“Response”, value=response, height=400)
if __name__ == “__main__”:
main()
“`
5. Run the Streamlit app by executing the following command:
“`
streamlit run starchat_app.py
“`
6. A web page will open in your browser with the Starchat app. You can enter your query in the text area and click the “Generate Response” button to get a response generated by the Starchat model.
Note: Make sure you have a stable internet connection to download the Starchat model if it’s not already downloaded.
That’s it! You should now have a working Streamlit app that utilizes the Starchat model with the specified parameters.
does this information change anything in the script?: {
“_name_or_path”: “data/starcoderplus-ift-v4.1”,
“activation_function”: “gelu”,
“architectures”: [
“GPTBigCodeForCausalLM”
],
“attention_softmax_in_fp32”: true,
“attn_pdrop”: 0.1,
“bos_token_id”: 0,
“embd_pdrop”: 0.1,
“eos_token_id”: 0,
“inference_runner”: 0,
“initializer_range”: 0.02,
“layer_norm_epsilon”: 1e-05,
“max_batch_size”: null,
“max_sequence_length”: null,
“model_type”: “gpt_bigcode”,
“multi_query”: true,
“n_embd”: 6144,
“n_head”: 48,
“n_inner”: 24576,
“n_layer”: 40,
“n_positions”: 8192,
“pad_key_length”: true,
“pre_allocate_kv_cache”: false,
“resid_pdrop”: 0.1,
“scale_attention_softmax_in_fp32”: true,
“scale_attn_weights”: true,
“summary_activation”: null,
“summary_first_dropout”: 0.1,
“summary_proj_to_labels”: true,
“summary_type”: “cls_index”,
“summary_use_proj”: true,
“torch_dtype”: “bfloat16”,
“transformers_version”: “4.28.1”,
“use_cache”: true,
“validate_runner_input”: true,
“vocab_size”: 49156
}
Yes, the information you provided about the model configuration will change the script. Here’s an updated version of the script considering the new model configuration:
“`python
import torch
import streamlit as st
from transformers import pipeline
pipe = pipeline(
“text-generation”,
model=”data/starcoderplus-ift-v4.1″,
torch_dtype=torch.bfloat16,
device_map=”auto”
)
def main():
st.sidebar.title(“Starchat App”)
query = st.sidebar.text_area(“Enter your query”, value=””, height=100)
if st.sidebar.button(“Generate Response”):
generate_response(query)
def generate_response(query):
prompt_template = “nnn{query}n”
prompt = prompt_template.format(query=query)
outputs = pipe(
prompt,
max_new_tokens=256,
do_sample=True,
temperature=0.2,
top_k=50,
top_p=0.95,
eos_token_id=0
)
response = outputs[0][“generated_text”].strip()
st.text_area(“Response”, value=response, height=400)
if __name__ == “__main__”:
main()
“`
Make sure to update the `model` parameter in the `pipeline()` function with the correct path to the `starcoderplus-ift-v4.1` model.
Remember to have a stable internet connection to download the model if it’s not already downloaded.
Run the script using the command `streamlit run starchat_app.py`, and you should have an updated Streamlit app that uses the new model configuration for generating responses.
I shoukd run this in a venv?
Running the script in a virtual environment (venv) is a good practice to keep your project dependencies isolated. Here’s a step-by-step guide on how to set up and run the script in a virtual environment:
1. Create and activate a virtual environment:
– Open your terminal or command prompt.
– Navigate to the directory where you want to create your virtual environment.
– Run the following command to create a virtual environment named “myenv” (you can choose a different name if you prefer):
“`
python3 -m venv myenv
“`
– Activate the virtual environment:
– For Windows:
“`
myenvScriptsactivate
“`
– For macOS/Linux:
“`
source myenv/bin/activate
“`
2. Install the necessary packages:
– Make sure your virtual environment is activated.
– Run the following command to install the required packages:
“`
pip install streamlit torch transformers
“`
3. Create a Python file named `starchat_app.py` and paste the updated script into it.
4. Run the Streamlit app:
– Ensure your virtual environment is activated.
– Navigate to the directory containing the `starchat_app.py` file.
– Run the following command to start the Streamlit app:
“`
streamlit run starchat_app.py
“`
By following these steps, you’ll have the Streamlit app running in your virtual environment, ensuring the dependencies are isolated and separate from your global Python installation.
what does the pipeline do?
The `pipeline` function in the Hugging Face Transformers library is a high-level API that provides a simple way to use pre-trained models for various NLP tasks, including text generation, text classification, question answering, and more.
When you call `pipeline(“text-generation”)`, you create a text generation pipeline. This pipeline is specifically designed for generating text based on a given prompt or input.
In the context of the Starchat app, the `pipeline(“text-generation”)` call sets up a text generation pipeline using the Hugging Face Starchat model. This pipeline allows you to generate responses based on user queries or prompts.
The pipeline takes care of the necessary preprocessing steps, such as tokenizing the input text, passing it through the model for inference, and generating the output text. It abstracts away many of the low-level details, making it easier to use the pre-trained models for text generation without having to handle the underlying model architecture directly.
You can further customize the behavior of the pipeline by providing additional parameters. In the provided script, parameters such as `max_new_tokens`, `do_sample`, `temperature`, `top_k`, `top_p`, and `eos_token_id` are used to control the response generation process, influencing factors like response length, diversity, and randomness.
Overall, the `pipeline` function simplifies the usage of pre-trained models and enables you to perform text generation tasks easily and efficiently.
so the model is offsite?
Yes, in the context of using the `pipeline` function from the Hugging Face Transformers library, the pre-trained model is typically hosted off-site. When you specify the model name or identifier in the `model` parameter of the `pipeline` function, the library automatically downloads the model from a remote server if it is not already cached on your local machine.
The Hugging Face model hub is a central repository for a wide range of pre-trained models for natural language processing tasks. These models are trained and hosted by the Hugging Face community and various contributors. When you specify a model name or identifier, the library retrieves the corresponding model from the Hugging Face model hub.
Once the model is downloaded, it is loaded into memory and used by the pipeline for generating responses or performing other NLP tasks. The pipeline handles the interaction with the model, including tokenization, inference, and post-processing.
It’s worth noting that the specific model used in the script you provided, such as “HuggingFaceH4/starchat-beta” or “data/starcoderplus-ift-v4.1”, should correspond to a valid model identifier that can be resolved by the Hugging Face model hub. If the model identifier is incorrect or the model is no longer available, the pipeline may fail to download or load the model.
lots of errors: List each error first then we will tackle them: streamlit run starchat_beta.py
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Network URL: http://192.168.254.74:8501
2023-06-27 23:31:42.455 Uncaught app exception
Traceback (most recent call last):
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packageshuggingface_hubutils_errors.py”, line 259, in hf_raise_for_status
response.raise_for_status()
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packagesrequestsmodels.py”, line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/data/starcoderplus-ift-v4.1/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packagestransformersutilshub.py”, line 417, in cached_file
resolved_file = hf_hub_download(
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packageshuggingface_hubutils_validators.py”, line 118, in _inner_fn
return fn(*args, **kwargs)
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packageshuggingface_hubfile_download.py”, line 1195, in hf_hub_download
metadata = get_hf_file_metadata(
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packageshuggingface_hubutils_validators.py”, line 118, in _inner_fn
return fn(*args, **kwargs)
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packageshuggingface_hubfile_download.py”, line 1541, in get_hf_file_metadata
hf_raise_for_status(r)
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packageshuggingface_hubutils_errors.py”, line 291, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-649ba99e-57a8e0814dc71f4a787cfef2)
Repository Not Found for url: https://huggingface.co/data/starcoderplus-ift-v4.1/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packagesstreamlitruntimescriptrunnerscript_runner.py”, line 552, in _run_script
exec(code, module.__dict__)
File “F:THEBLOKEwizardcudamodelstarchat-betastarchat_beta.py”, line 5, in
pipe = pipeline(
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packagestransformerspipelines__init__.py”, line 705, in pipeline
config = AutoConfig.from_pretrained(model, _from_pipeline=task, **hub_kwargs, **model_kwargs)
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packagestransformersmodelsautoconfiguration_auto.py”, line 944, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packagestransformersconfiguration_utils.py”, line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packagestransformersconfiguration_utils.py”, line 629, in _get_config_dict
resolved_config_file = cached_file(
File “F:THEBLOKEwizardcudamodelstarchat-betamyenvlibsite-packagestransformersutilshub.py”, line 433, in cached_file
raise EnvironmentError(
OSError: data/starcoderplus-ift-v4.1 is not a local folder and is not a valid model identifier listed on ‘https://huggingface.co/models’
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
The error message suggests that the model identifier `”data/starcoderplus-ift-v4.1″` is not a valid model identifier on the Hugging Face model hub, and it cannot be resolved to download the model. It indicates that the model repository is not found.
To resolve this issue, there are a few possible steps you can take:
1. Verify the model identifier: Double-check the model identifier `”data/starcoderplus-ift-v4.1″` to ensure it is correct. Make sure there are no typos or incorrect names in the identifier.
2. Check the availability of the model: Visit the Hugging Face model hub website (https://huggingface.co/models) and search for the model identifier `”data/starcoderplus-ift-v4.1″`. Confirm if the model exists and is publicly available. If it is a private or gated repository, you may need to follow additional authentication steps to access it.
3. Use a different model: If the model you’re trying to access is not available or you are unable to resolve the issue with the current model, consider using a different pre-trained model from the Hugging Face model hub. You can browse the available models on the Hugging Face model hub website and choose one that suits your needs.
4. Update the model identifier: If you have a different model identifier that you intend to use, update the `model` parameter in the `pipeline` function with the correct identifier. Ensure that the model identifier corresponds to a valid model available on the Hugging Face model hub.
Once you have verified the model identifier and ensured the availability of the model, update the script with the correct model identifier and rerun the Streamlit app to see if the issue is resolved.
this is a script I found in a folder: from typing import Any, Dict
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig, PeftModel
class EndpointHandler:
def __init__(self, path=””):
# load model and processor from path
self.tokenizer = AutoTokenizer.from_pretrained(path)
try:
config = PeftConfig.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
load_in_8bit=True,
device_map=”auto”,
torch_dtype=torch.float16,
trust_remote_code=True,
)
model.resize_token_embeddings(len(self.tokenizer))
model = PeftModel.from_pretrained(model, path)
except Exception:
model = AutoModelForCausalLM.from_pretrained(
path, device_map=”auto”, load_in_8bit=True, torch_dtype=torch.float16, trust_remote_code=True
)
self.model = model
self.device = “cuda” if torch.cuda.is_available() else “cpu”
def __call__(self, data: Dict[str, Any]) -> Dict[str, str]:
# process input
inputs = data.pop(“inputs”, data)
parameters = data.pop(“parameters”, None)
# preprocess
inputs = self.tokenizer(inputs, return_tensors=”pt”).to(self.device)
# pass inputs with all kwargs in data
if parameters is not None:
outputs = self.model.generate(**inputs, **parameters)
else:
outputs = self.model.generate(**inputs)
# postprocess the prediction
prediction = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
return [{“generated_text”: prediction}]
The script you provided appears to be an endpoint handler for an API or server that utilizes a pre-trained language model for text generation. It defines a class `EndpointHandler` that handles the processing of input data and generates text based on the provided input using a pre-trained model.
Here’s an overview of the key components and functionality in the script:
1. Initialization: The `__init__` method is responsible for loading the pre-trained model and tokenizer from the specified path. It checks if the path corresponds to a custom model (`PeftModel`) or a standard model (`AutoModelForCausalLM`). It also configures the device to use CUDA if available.
2. Data Processing: The `__call__` method is the entry point for processing data. It takes a dictionary `data` containing input text and optional parameters. It extracts the input text and parameters from the dictionary.
3. Tokenization: The input text is tokenized using the tokenizer obtained during initialization. The tokenizer converts the input text into tokenized tensors suitable for model input.
4. Model Inference: The pre-trained model is used to generate text based on the tokenized inputs. It calls the `generate` method of the model, passing the tokenized inputs and any additional parameters.
5. Postprocessing: The generated text is obtained as the output from the model and decoded using the tokenizer to convert it back into human-readable text. The decoded text is returned as a list of dictionaries with the key `”generated_text”`.
To use this script, you would need to provide the necessary dependencies, such as the pre-trained model and tokenizer files. You can instantiate an instance of the `EndpointHandler` class and call it with appropriate input data to generate text using the pre-trained model.
Please note that this script assumes you have the required dependencies and the appropriate environment set up to run it. Make sure to provide the correct paths and adjust the code as necessary to match your specific use case.
all the files are there
If all the necessary files, including the pre-trained model and tokenizer, are available in the specified path, you should be able to run the script without any issues.
To run the script, you can follow these steps:
1. Make sure you have the required dependencies installed. You can use the following command to install the necessary packages:
“`
pip install torch transformers
“`
2. Save the script in a file, e.g., `endpoint_handler.py`, in the same directory where your pre-trained model and tokenizer files are located.
3. Update the `path` variable in the script to point to the correct location of the pre-trained model and tokenizer files.
4. Open a terminal or command prompt, navigate to the directory where the script is located, and run the following command:
“`
python endpoint_handler.py
“`
This will execute the script and start the endpoint handler.
5. Once the endpoint handler is running, you can make API requests or provide input data to generate text using the pre-trained model. The specific method of making requests depends on the implementation of the API or server using the endpoint handler.
Please ensure that the required files, such as the pre-trained model and tokenizer, are present in the specified path and that you have the necessary permissions to access them.
write a streamlit app that can query the endpoint
Certainly! Here’s an example of a Streamlit app that allows you to query the endpoint using the `EndpointHandler` script you provided:
“`python
import streamlit as st
import requests
import json
# Define the endpoint URL
ENDPOINT_URL = “http://localhost:5000/generate” # Replace with your actual endpoint URL
# Define the headers for the API request
headers = {“Content-Type”: “application/json”}
# Function to query the endpoint
def query_endpoint(input_text):
data = {“inputs”: input_text}
response = requests.post(ENDPOINT_URL, headers=headers, data=json.dumps(data))
if response.status_code == 200:
result = response.json()[0][“generated_text”]
return result
else:
return None
# Streamlit app
def main():
st.title(“Text Generation App”)
input_text = st.text_input(“Enter your input text:”)
if st.button(“Generate”):
if input_text:
result = query_endpoint(input_text)
if result:
st.success(“Generated Text:”)
st.text(result)
else:
st.error(“Error occurred during text generation.”)
else:
st.warning(“Please enter some input text.”)
if __name__ == “__main__”:
main()
“`
To use this app, make sure you have Streamlit installed (`pip install streamlit`). Replace the `ENDPOINT_URL` variable with the actual URL of your endpoint.
Save the app code in a file, e.g., `app.py`, in the same directory. Open a terminal or command prompt, navigate to the directory, and run the following command:
“`
streamlit run app.py
“`
This will start the Streamlit app, and you can interact with it by entering text in the input box and clicking the “Generate” button. The generated text will be displayed below the button.
Make sure your endpoint is running and accessible at the specified URL. Adjust the code as needed to match the specific implementation of your endpoint and any additional requirements or configurations.
This code ‘handler.py’ does not expose a url: from typing import Any, Dict
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig, PeftModel
class EndpointHandler:
def __init__(self, path=””):
# load model and processor from path
self.tokenizer = AutoTokenizer.from_pretrained(path)
try:
config = PeftConfig.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
load_in_8bit=True,
device_map=”auto”,
torch_dtype=torch.float16,
trust_remote_code=True,
)
model.resize_token_embeddings(len(self.tokenizer))
model = PeftModel.from_pretrained(model, path)
except Exception:
model = AutoModelForCausalLM.from_pretrained(
path, device_map=”auto”, load_in_8bit=True, torch_dtype=torch.float16, trust_remote_code=True
)
self.model = model
self.device = “cuda” if torch.cuda.is_available() else “cpu”
def __call__(self, data: Dict[str, Any]) -> Dict[str, str]:
# process input
inputs = data.pop(“inputs”, data)
parameters = data.pop(“parameters”, None)
# preprocess
inputs = self.tokenizer(inputs, return_tensors=”pt”).to(self.device)
# pass inputs with all kwargs in data
if parameters is not None:
outputs = self.model.generate(**inputs, **parameters)
else:
outputs = self.model.generate(**inputs)
# postprocess the prediction
prediction = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
return [{“generated_text”: prediction}]
If the `handler.py` script you provided doesn’t expose a URL, it means that it’s not designed to be used as a standalone web service. Instead, it appears to be a class (`EndpointHandler`) that encapsulates the logic for processing inputs and generating text using a language model.
To integrate this class with a web framework like Streamlit, you’ll need to modify the code and create an HTTP server that exposes an API endpoint. Here’s an example of how you can modify the code to work with Streamlit and expose a URL:
“`python
import streamlit as st
from typing import Any, Dict
from handler import EndpointHandler
# Instantiate the EndpointHandler
handler = EndpointHandler(path=”path/to/model”)
# Streamlit app
def main():
st.title(“Text Generation App”)
input_text = st.text_input(“Enter your input text:”)
if st.button(“Generate”):
if input_text:
# Call the EndpointHandler
result = handler({“inputs”: input_text})
generated_text = result[0][“generated_text”]
st.success(“Generated Text:”)
st.text(generated_text)
else:
st.warning(“Please enter some input text.”)
if __name__ == “__main__”:
main()
“`
Make sure to replace `”path/to/model”` with the actual path to your model. This modified code integrates the `EndpointHandler` class with a Streamlit app. When the user clicks the “Generate” button, it calls the `handler` instance and retrieves the generated text. The generated text is then displayed in the Streamlit app.
To run the Streamlit app, save the code in a file, e.g., `app.py`, in the same directory as the `handler.py` file. Open a terminal or command prompt, navigate to the directory, and run the following command:
“`
streamlit run app.py
“`
This will start the Streamlit app, and you can interact with it by entering text in the input box and clicking the “Generate” button. The generated text will be displayed below the button.
Note that this code assumes you have the necessary dependencies installed, including Streamlit and any required libraries for the `EndpointHandler` class. Make sure to adjust the code and dependencies based on your specific requirements and configuration.