Call Batch from Batch – Dr. Trippers Stuff

how to call another batch file from with a file
To call another batch file from within a batch file, you can use the `CALL` command in Windows. The `CALL` command allows you to run another batch file and then return to the original batch file after the called batch file has finished executing. Here’s the syntax:

“`batch
CALL pathtoanother_batch_file.bat
“`

Replace `pathtoanother_batch_file.bat` with the actual path and filename of the batch file you want to call. If the batch file is in the same directory as the current batch file, you can simply provide the filename without the full path.

Here’s an example:

Let’s say you have two batch files, `main.bat` and `helper.bat`, and they are located in the same directory. The contents of `main.bat` might look like this:

“`batch
@echo off
echo This is the main batch file.
echo Calling the helper batch file…
CALL helper.bat
echo Back to the main batch file.
“`

The contents of `helper.bat` might look like this:

“`batch
@echo off
echo This is the helper batch file.
echo Doing some work…
“`

When you run `main.bat`, it will output:

“`
This is the main batch file.
Calling the helper batch file…
This is the helper batch file.
Doing some work…
Back to the main batch file.
“`

As you can see, the `CALL` command executes the `helper.bat` and then returns to the `main.bat` once the execution is complete.
Traceback (most recent call last): File “C:SarahOneMetisobwobwwtext-generation-webuiserver.py”, line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “C:SarahOneMetisobwobwwtext-generation-webuimodulesmodels.py”, line 86, in load_model tokenizer = load_tokenizer(model_name, model) File “C:SarahOneMetisobwobwwtext-generation-webuimodulesmodels.py”, line 103, in load_tokenizer tokenizer = AutoTokenizer.from_pretrained( File “C:UsersSarah.condaenvsamberrlibsite-packagestransformersmodelsautotokenization_auto.py”, line 714, in from_pretrained return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File “C:UsersSarah.condaenvsamberrlibsite-packagestransformerstokenization_utils_base.py”, line 1796, in from_pretrained raise EnvironmentError( OSError: Can’t load tokenizer for ‘modelsorca_mini_13B-GPTQ’. If you were trying to load it from ‘https://huggingface.co/models’, make sure you don’t have a local directory with the same name. Otherwise, make sure ‘modelsorca_mini_13B-GPTQ’ is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.
The error traceback you provided seems to be related to an issue with loading a tokenizer from the Hugging Face Transformers library in Python. The error message indicates that the tokenizer cannot be loaded for the model named `’modelsorca_mini_13B-GPTQ’`.

Here are some possible reasons and steps to troubleshoot the issue:

1. **Check if the model exists:** Ensure that the model `’modelsorca_mini_13B-GPTQ’` actually exists and is accessible. If you are trying to load a pre-trained model from Hugging Face’s model hub, verify that the model name is correct and it’s available in their model repository.

2. **Internet connection:** If you are trying to download the model from the Hugging Face model hub, ensure that you have a working internet connection, as the library may need to download the tokenizer and model files.

3. **Tokenization Library:** Make sure that you have the required version of the Transformers library installed. If you have an outdated version, it may cause issues with loading the model and tokenizer. You can update the library using `pip`:

“`bash
pip install –upgrade transformers
“`

4. **Local Directory Conflict:** As per the error message, ensure that you don’t have a local directory named `’modelsorca_mini_13B-GPTQ’` in the current working directory where you are running the script. This local directory might be conflicting with the model loading process.

5. **Path Separators:** In the traceback, the model name appears to be `’modelsorca_mini_13B-GPTQ’`. Note that using backward slashes (“) in the model name path might cause issues on certain platforms (e.g., Unix-based systems). It’s better to use forward slashes (`/`) or `os.path.join()` to construct paths in a platform-independent way.

6. **Verify Tokenizer Compatibility:** Ensure that the model and tokenizer versions are compatible. Sometimes, a mismatch between the model version and tokenizer version can cause loading issues. Try using a compatible version of the tokenizer with the specific model version you are loading.

If you have checked the above points and are still facing issues, you may need to provide more context or code snippets for further assistance. Additionally, if the model you are trying to use is a custom model or not available in the Hugging Face model hub, you may need to ensure that all the relevant files for the tokenizer are present in the specified path `’modelsorca_mini_13B-GPTQ’`.
lets try this one with a prompt loop: import transformers

name = ‘mosaicml/mpt-7b’

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 83968 # (input + output) tokens can now be up to 83968

model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
trust_remote_code=True
)

The code you provided seems to be loading a pre-trained language model using the Hugging Face Transformers library and setting a custom `max_seq_len` for the model. It looks like you are using a causal language model (`AutoModelForCausalLM`) which can generate text given a prompt.

However, you mentioned that you want to add a prompt loop, so the model generates text multiple times based on different prompts. To do that, you can use a loop to repeatedly generate text from the model with different prompts. Here’s an example of how you can create a prompt loop:

“`python
import transformers

name = ‘mosaicml/mpt-7b’

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 83968 # (input + output) tokens can now be up to 83968

model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
trust_remote_code=True
)

tokenizer = transformers.AutoTokenizer.from_pretrained(name)

while True:
prompt = input(“Enter your prompt (type ‘exit’ to quit): “)

if prompt.lower() == ‘exit’:
print(“Exiting prompt loop.”)
break

input_ids = tokenizer.encode(prompt, return_tensors=”pt”)

with transformers.no_grad():
output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_token_id=model.config.pad_token_id)

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(“Generated text:”)
print(generated_text)
“`

This code will allow you to repeatedly enter prompts, and the model will generate text based on each prompt until you type ‘exit’ to quit the loop.

Make sure you have installed the required libraries by running `pip install transformers` before executing the code. Additionally, note that generating long sequences with a large `max_seq_len` might require substantial computational resources and may take some time to complete. You can adjust the `max_length` parameter in the `generate()` method to control the length of the generated text.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-storywriter:
– modeling_mpt.py
– flash_attn_triton.py
– meta_init_context.py
– param_init_fns.py
– attention.py
– blocks.py
– hf_prefixlm_converter.py
– custom_embedding.py
– adapt_tokenizer.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.63it/s]
Instantiating an MPTForCausalLM model from E:userssarah.cachehuggingfacemodulestransformers_modulesmosaicmlmpt-7b-storywritera5e85ae1941e31bb705adbcafce9b0dfd6f3a48bmodeling_mpt.py
You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:19<00:00, 39.95s/it]
Enter your prompt (type 'exit' to quit): hello
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:SarahOnelilithLangchainmos.py:30 in │
│ │
│ 27 │ │
│ 28 │ input_ids = tokenizer.encode(prompt, return_tensors=”pt”) │
│ 29 │ │
│ ❱ 30 │ with transformers.no_grad(): │
│ 31 │ │ output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_t │
│ 32 │ │
│ 33 │ generated_text = tokenizer.decode(output[0], skip_special_tokens=True) │
│ │
│ C:UsersSarah.condaenvsamberrlibsite-packagestransformersutilsimport_utils.py:1165 in │
│ __getattr__ │
│ │
│ 1162 │ │ │ module = self._get_module(self._class_to_module[name]) │
│ 1163 │ │ │ value = getattr(module, name) │
│ 1164 │ │ else: │
│ ❱ 1165 │ │ │ raise AttributeError(f”module {self.__name__} has no attribute {name}”) │
│ 1166 │ │ │
│ 1167 │ │ setattr(self, name, value) │
│ 1168 │ │ return value │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module transformers has no attribute no_grad
The error message indicates that there is no attribute named `no_grad` in the `transformers` module, which is causing the `AttributeError` when the script is trying to access it. This is because the `no_grad()` context manager for disabling gradient computation is not available in the `transformers` module. Instead, this functionality is typically found in deep learning libraries like PyTorch or TensorFlow.

It looks like the issue is in this section of the code:

“`python
with transformers.no_grad():
output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_token_id=model.config.pad_token_id)
“`

To fix this issue, you should use the corresponding functionality from the deep learning library that the Transformers library is built upon. If you are using PyTorch as the backend, you should use `torch.no_grad()` as follows:

“`python
import torch

# …

with torch.no_grad():
output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_token_id=model.config.pad_token_id)
“`

If you are using TensorFlow as the backend, you should use `tf.GradientTape` to disable gradient computation. However, since the error message doesn’t specify which backend is being used, I’ve assumed PyTorch in this case. If you are using TensorFlow, let me know, and I can provide the relevant code modification for that as well.
Enter your prompt (type ‘exit’ to quit): hello
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:SarahOnelilithLangchainmos.py:30 in │
│ │
│ 27 │ │
│ 28 │ input_ids = tokenizer.encode(prompt, return_tensors=”pt”) │
│ 29 │ │
│ ❱ 30 │ with transformers.no_grad(): │
│ 31 │ │ output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_t │
│ 32 │ │
│ 33 │ generated_text = tokenizer.decode(output[0], skip_special_tokens=True) │
│ │
│ C:UsersSarah.condaenvsamberrlibsite-packagestransformersutilsimport_utils.py:1165 in │
│ __getattr__ │
│ │
│ 1162 │ │ │ module = self._get_module(self._class_to_module[name]) │
│ 1163 │ │ │ value = getattr(module, name) │
│ 1164 │ │ else: │
│ ❱ 1165 │ │ │ raise AttributeError(f”module {self.__name__} has no attribute {name}”) │
│ 1166 │ │ │
│ 1167 │ │ setattr(self, name, value) │
│ 1168 │ │ return value │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module transformers has no attribute no_grad
Apologies for the confusion. It appears that the code snippet is trying to use the `no_grad()` context manager from the `transformers` module, which doesn’t exist. The correct way to use the `no_grad()` context manager is through the deep learning library that Transformers is built upon (PyTorch or TensorFlow).

Assuming you are using PyTorch as the backend, you should use `torch.no_grad()` instead of `transformers.no_grad()`:

“`python
import transformers
import torch

name = ‘mosaicml/mpt-7b’

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 83968 # (input + output) tokens can now be up to 83968

model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
trust_remote_code=True
)

tokenizer = transformers.AutoTokenizer.from_pretrained(name)

while True:
prompt = input(“Enter your prompt (type ‘exit’ to quit): “)

if prompt.lower() == ‘exit’:
print(“Exiting prompt loop.”)
break

input_ids = tokenizer.encode(prompt, return_tensors=”pt”)

with torch.no_grad(): # Use torch.no_grad() instead of transformers.no_grad()
output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_token_id=model.config.pad_token_id)

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(“Generated text:”)
print(generated_text)
“`

Make sure you have the `torch` library installed (`pip install torch`) before running the code. This should resolve the `AttributeError` and allow the prompt loop to work as expected. If you are using TensorFlow as the backend, let me know, and I can provide the relevant code modification for that as well.
It is responding but it is not using cuda – it is resetting elewhere in code to cpu
If your code is resetting to use the CPU instead of CUDA (GPU), it’s likely that the device is being set to CPU somewhere else in the code or there might be an issue with the device assignment. To ensure that the model and computation are performed on the GPU, you need to verify a few things:

1. **Device Assignment:** Make sure you are explicitly setting the model and data to be processed on the GPU. You can do this by using the `to()` method in PyTorch.

Before the prompt loop, add the following line to ensure the model and input tensors are on the GPU:

“`python
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
model.to(device)
“`

In the prompt loop, update the `input_ids` tensor assignment to also be on the GPU:

“`python
input_ids = tokenizer.encode(prompt, return_tensors=”pt”).to(device)
“`

2. **Device Checking:** Before and during the prompt loop, print the device to ensure it is correctly set to CUDA:

“`python
print(f”Using device: {device}”)
“`

3. **CUDA Availability:** Ensure that CUDA is available on your system and that you have a compatible GPU. Additionally, verify that you have installed the correct version of PyTorch that supports CUDA. You can check the availability of CUDA with `torch.cuda.is_available()`.

4. **GPU Memory:** If there is not enough GPU memory available, PyTorch will fall back to using the CPU. Make sure you have enough free GPU memory to handle the model and data.

If the issue persists, it might be helpful to review the rest of the code to check if there are any other places where the device might be reset to CPU or where the model’s device is being changed inadvertently. If you still face problems, providing more context or code snippets would be beneficial for further assistance.
Using device: cuda
Instantiating an MPTForCausalLM model from E:userssarah.cachehuggingfacemodulestransformers_modulesmosaicmlmpt-7b-storywritera5e85ae1941e31bb705adbcafce9b0dfd6f3a48bmodeling_mpt.py
You are using config.init_device=’cpu’, but you can also use config.init_device=”meta” with Composer + FSDP for fast initialization.
The message you are seeing indicates that you are using a configuration (`config`) with `init_device=’cpu’`, but it suggests that you can use `config.init_device=”meta”` with Composer + FSDP (FullyShardedDataParallel) for fast initialization.

This message is not an error, but rather a suggestion to potentially improve performance when using certain distributed training methods with the Hugging Face Transformers library.

If you are not explicitly setting `config.init_device` in your code, it means the library is using its default value, which is `’cpu’`. However, if you want to take advantage of faster initialization with Composer + FSDP, you can set `config.init_device` to `’meta’` when creating the model:

“`python
import transformers

name = ‘mosaicml/mpt-7b’

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.init_device = ‘meta’ # Set the init_device to “meta” for faster initialization

config.max_seq_len = 83968 # (input + output) tokens can now be up to 83968

model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
trust_remote_code=True
)

# …
“`

Please note that setting `config.init_device` to `’meta’` might be beneficial for distributed training scenarios using the Composer + FSDP setup, but it might not provide noticeable improvements for a single GPU setup or small models. If you are not using distributed training, you can leave the `config.init_device` as `’cpu’`, and it should not affect the performance of your code.
I only want the model us=ing cuda
Apologies for any confusion. If you want the model to use CUDA (GPU) for inference but not for initialization, you can explicitly move the model to the GPU using the `to()` method during inference. Here’s how you can modify your code to ensure the model runs on the GPU while generating text:

“`python
import transformers
import torch

name = ‘mosaicml/mpt-7b’

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 83968 # (input + output) tokens can now be up to 83968

model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
trust_remote_code=True
)

tokenizer = transformers.AutoTokenizer.from_pretrained(name)

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
model.to(device)

while True:
prompt = input(“Enter your prompt (type ‘exit’ to quit): “)

if prompt.lower() == ‘exit’:
print(“Exiting prompt loop.”)
break

input_ids = tokenizer.encode(prompt, return_tensors=”pt”).to(device)

with torch.no_grad():
output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_token_id=model.config.pad_token_id)

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(“Generated text:”)
print(generated_text)
“`

In this updated version, we explicitly move the model to the GPU using `model.to(device)` before entering the prompt loop. This ensures that both the model and the input tensor (`input_ids`) are on the GPU, enabling the model to perform inference using CUDA.

By setting `device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)`, we allow the code to check whether CUDA is available on the system. If CUDA is available, the model will be moved to the GPU (`device=”cuda”`); otherwise, it will be moved to the CPU (`device=”cpu”`).

With this modification, the model will use the GPU for inference, but not for the model’s initialization. This is useful when you want to perform inference using a pre-trained model on the GPU but avoid GPU memory allocation during the initialization phase, which can be beneficial if you have limited GPU resources.
IndexError: index -1 is out of bounds for dimension 1 with size 0
The “IndexError: index -1 is out of bounds for dimension 1 with size 0″ error typically occurs when you are trying to access an element of an empty list or tensor or using an invalid index that is out of the allowed range. To troubleshoot and fix this issue, consider the following possibilities:

1. **Empty Input:** The error may occur if the `input_ids` tensor or the `output` tensor is empty. It means that the `tokenizer.encode()` method is returning an empty list, which results in an empty tensor after converting it to PyTorch tensor with `return_tensors=”pt”`. Ensure that the `prompt` input is valid and the tokenizer is correctly encoding it.

You can add a check to verify that `input_ids` is not empty before proceeding with the model’s inference:

“`python
input_ids = tokenizer.encode(prompt, return_tensors=”pt”)
if input_ids.numel() == 0:
print(“Empty input. Please provide a valid prompt.”)
continue
input_ids = input_ids.to(device)
“`

2. **Model Output:** The error can also be caused by an issue with the model’s output. If `output` is empty or has size 0, it can lead to this error when trying to access an element that does not exist. Make sure that the model is generating valid output for the given input.

3. **Prompt Loop:** Check the loop structure and ensure that the error is not related to unexpected behavior during the prompt loop, such as incorrectly handling the ‘exit’ command.

4. **Tokenizer Version:** Make sure you are using a compatible version of the tokenizer with the model. An outdated or incompatible tokenizer version may cause issues.

5. **Input Data Format:** Ensure that the `prompt` input is in the correct format expected by the tokenizer. Some tokenizers may require additional arguments or preprocessing, depending on the model’s architecture.

Without the complete code or more context, it’s challenging to pinpoint the exact cause of the error. However, I hope the above points can help you identify the issue and resolve it. If the error persists, providing more code or context will be helpful in diagnosing the problem further.
This model is too slow. Let’s try this one with all of the same code as before: TheBloke/orca_mini_3B-GGML
If the previous model was too slow for your needs, you can try using a smaller model to speed up the inference process. The `’TheBloke/orca_mini_3B-GGML’` model is a smaller model compared to `’mosaicml/mpt-7b’`, so it may perform faster while still providing useful results.

To use the `’TheBloke/orca_mini_3B-GGML’` model, you can simply replace the `name` variable in your existing code and re-run the script. Here’s the updated code with the new model:

“`python
import transformers
import torch

name = ‘TheBloke/orca_mini_3B-GGML’ # Use the new model name