PrivateGPT is what this conversation is about. PrivateGPT is a model that takes input from the user in form of pdfs, aand other files and ‘ingests’ them for use by the user. I want you to help me understand the model and the code and suggest features and enhancements that we can add together. What file(s) would you first like to see, or, what question do you have?
Before diving into the code and features of PrivateGPT, I would like to ask some clarifying questions to better understand the scope and requirements of the model:
1. What is the intended use case for PrivateGPT? Is it for personal or business use?
2. Can you provide more details on what you mean by “ingests” files? Does the model analyze the contents of the files and extract relevant information?
3. What programming language is the code written in?
4. Can you provide a sample input file and the expected output from the model?
Once I have a better understanding of the requirements, I can provide suggestions on features and enhancements for the model. Additionally, sharing the relevant code files would be helpful in reviewing the current implementation of the model.
G:PrivateGPTprivateGPT>python privategpt.py
llama.cpp: loading model from G:/PrivateGPT/models/ggml-model-q4_0.bin
llama.cpp: can’t use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = ‘ggml’ (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1000
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113748.20 KB
llama_model_load_internal: mem required = 5809.33 MB (+ 2052.00 MB per state)
………………………………………………………………………………………
.
llama_init_from_file: kv self size = 1000.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Using embedded DuckDB with persistence: data will be stored in: db
gptj_model_load: loading model from ‘G:/PrivateGPT/models/ggml-gpt4all-j-v1.3-groovy.bin’ – please wait …
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
gptj_model_load: ggml ctx size = 4505.45 MB
gptj_model_load: memory_size = 896.00 MB, n_mem = 57344
gptj_model_load: …………………………….. done
gptj_model_load: model size = 3609.38 MB / num tensors = 285
Thank you for sharing the output from running the `privategpt.py` script. It appears that the script is loading two different models, one from `G:/PrivateGPT/models/ggml-model-q4_0.bin` and the other from `G:/PrivateGPT/models/ggml-gpt4all-j-v1.3-groovy.bin`.
Without seeing the source code for `privategpt.py` and understanding the intended use case for PrivateGPT, it is difficult to suggest specific features and enhancements. However, based on the model loading output, it appears that PrivateGPT is using two different models for language generation. It may be worth exploring ways to combine these models or use them in a complementary manner to improve the quality of the generated text.
Additionally, it may be useful to add support for more file types beyond PDFs to increase the versatility of the model. Depending on the intended use case, it may also be worth exploring ways to improve the efficiency of file ingestion and text generation.
Selecting the right local models and the power of LangChain you can run the entire pipeline locally, without any data leaving your environment, and with reasonable performance.
ingest.py uses LangChain tools to parse the document and create embeddings locally using LlamaCppEmbeddings. It then stores the result in a local vector database using Chroma vector store.
privateGPT.py uses a local LLM based on GPT4All-J or LlamaCpp to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
GPT4All-J wrapper was introduced in LangChain 0.0.162.
Thank you for providing additional information about the PrivateGPT pipeline. It sounds like the pipeline is designed to operate locally and maintain data privacy by keeping data within the user’s environment.
It’s interesting that LangChain tools are being used to parse the documents and create embeddings locally using LlamaCppEmbeddings. This approach may be advantageous for users who are working with sensitive or proprietary data that cannot be shared outside their environment.
The use of a local LLM based on GPT4All-J or LlamaCpp to understand questions and create answers is also a good approach for maintaining data privacy. Using a similarity search to extract context from the local vector store is an efficient way to locate the relevant piece of information from the local database.
It would be useful to explore ways to further optimize the pipeline for performance without compromising data privacy. For example, implementing techniques like caching to reduce the number of times data needs to be reprocessed may help improve performance. Additionally, incorporating techniques like quantization and pruning to reduce the model size may also improve performance without sacrificing accuracy.
I am not getting any reply but this:
Enter a query: what is your max token length?
llama_print_timings: load time = 408.78 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 408.64 ms / 8 tokens ( 51.08 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 411.40 ms
gpt_tokenize: unknown token ‘Γ’
gpt_tokenize: unknown token ‘Ç’
gpt_tokenize: unknown token ‘Ö’
gpt_tokenize: unknown token ‘Γ’
gpt_tokenize: unknown token ‘Ç’
gpt_tokenize: unknown token ‘Ö’
lit into 15 chunks of text (max. 500 tokens each)
llama.cpp: loading model from G:/PrivateGPT/models/ggml-model-q4_0.bin
llama.cpp: can’t use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = ‘ggml’ (old version with low tokenizer quality and no mmap support)
It looks like the model is encountering some unknown tokens that it cannot understand. This could be because the input query contains characters or symbols that are not recognized by the model’s tokenizer.
One thing you could try is to preprocess your input query to remove any non-standard characters or symbols before passing it to the model. Additionally, you may want to check if the model’s tokenizer supports the character encoding used in your query.
It looks like the PrivateGPT pipeline is using a LlamaCpp model based on the GGML format, which is an older format that doesn’t support memory mapping. This may explain why the model cannot use mmap.
It’s worth noting that newer formats may provide better tokenizer quality and improved performance, so it may be worth exploring the possibility of updating the model to a newer format if feasible. This could potentially improve the accuracy and speed of the model.
How do I update the model?
Updating the model would depend on several factors, including the specific model format and how the model was trained. Here are some general steps that you could follow:
1. Identify the current model format: You mentioned that the current model is based on the GGML format, which is an older format. You could check if there is a newer version of the GGML format that supports memory mapping, or if there are other formats that could be used.
2. Determine if there is an updated version of the model: If there is an updated model available, you would need to obtain it and ensure that it is compatible with the PrivateGPT pipeline. This would involve checking if the input/output formats of the updated model are compatible with the existing code.
3. Retrain the model (if necessary): If there is no updated model available or if the updated model is not compatible with the existing code, you may need to retrain the model using new data. This would involve collecting and preprocessing data, training the model, and validating its performance.
4. Update the PrivateGPT pipeline: Once you have an updated model that is compatible with the pipeline, you would need to update the code to use the new model. This would involve modifying the code that loads and uses the model.
It’s worth noting that updating the model can be a complex task that requires knowledge of machine learning, natural language processing, and software development. It may be helpful to consult with a data scientist or software developer who has experience with these areas.
llama.cpp: loading model from G:/PrivateGPT/models/ggml-model-q4_0.bin
llama.cpp: can’t use mmap because tensors are not aligned; convert to new format to avoid this
The error message indicates that the current model based on the `ggml-model-q4_0.bin` file cannot use memory mapping (mmap) due to tensor alignment issues. To resolve this, one potential solution is to convert the model to a new format that supports memory mapping. Here are some steps you can follow:
1. Identify the target format: Research and identify a newer model format that supports both the desired features (such as memory mapping) and alignment requirements.
2. Convert the model: Look for tools or scripts provided by the framework or library you are using to convert the model from the current GGML format to the target format. Check the documentation or community resources for guidance on the conversion process.
3. Update the model loading code: Once the model is converted to the new format, modify the model loading code in `llama.cpp` to load the model from the new file or files associated with the updated format. Ensure that the code is compatible with the new model format, as there may be changes in the model loading and inference procedures.
4. Test and validate: After making the necessary modifications, thoroughly test the pipeline to ensure that the converted model is functioning correctly. Verify that the memory mapping is now working as expected without any alignment issues.
Note that the specific steps and tools required for the conversion process depend on the framework or library used for training and deploying the model. Consulting the documentation or seeking support from the framework’s community can provide more specific guidance tailored to your setup.
TypeError: ‘ModelInfo’ object is not subscriptable
Traceback:
File “C:UsersSarahAppDataLocalProgramsPythonPython310libsite-packagesstreamlitruntimescriptrunnerscript_runner.py”, line 565, in _run_script
exec(code, module.__dict__)
File “G:LCCHuggerhftest.py”, line 39, in
with st.expander(f”Model ID: {models[model_id][‘modelId’]}”): # Use the modelI
The error message you encountered indicates a `TypeError` stating that a ‘ModelInfo’ object is not subscriptable. The error occurs on line 39 of the `hftest.py` script.
Based on the code snippet you provided, the issue is likely related to accessing the ‘modelId’ property of the ‘models’ dictionary using the ‘model_id’ variable. To resolve this issue, make sure that the ‘models’ dictionary is defined and that the ‘model_id’ variable has the correct value assigned to it.
Here are a few steps you can take to troubleshoot and resolve the issue:
1. Check the definition of the ‘models’ dictionary: Ensure that the ‘models’ dictionary is properly defined and contains the necessary key-value pairs. Make sure that it is initialized and populated correctly before the problematic line of code.
2. Verify the ‘model_id’ variable: Double-check that the ‘model_id’ variable has the correct value assigned to it before using it to access the ‘modelId’ property. Ensure that the variable is initialized and assigned a valid key from the ‘models’ dictionary.
3. Confirm the existence of the ‘modelId’ property: Ensure that the ‘ModelInfo’ object within the ‘models’ dictionary indeed has a ‘modelId’ property. Verify the structure of the ‘ModelInfo’ object to ensure that the ‘modelId’ property is accessible using square brackets for subscripting.
By following these steps, you should be able to identify and resolve the issue causing the ‘TypeError’ and successfully access the ‘modelId’ property from the ‘ModelInfo’ object.
from transformers.hub_api import HfApi
It seems that you are trying to import the `HfApi` class from the `transformers.hub_api` module. However, it’s important to note that as of my knowledge cutoff in September 2021, there is no `HfApi` class in the `transformers` library.
If you are using an outdated version of the `transformers` library, I recommend updating it to the latest version by running `pip install –upgrade transformers`.
If you are looking for functionality related to the Hugging Face model hub, such as retrieving model information, you can use the `HfApi` class from the `huggingface_hub` library. To install it, run `pip install huggingface_hub`.
Here’s an example of how to import the `HfApi` class from `huggingface_hub`:
“`python
from huggingface_hub import HfApi
“`
Make sure you have the `huggingface_hub` library installed, and then you should be able to use the `HfApi` class to interact with the Hugging Face model hub.
AttributeError: ‘ModelInfo’ object has no attribute ‘as_dict’
Traceback:
File “C:UsersSarahAppDataLocalProgramsPythonPython310libsite-packagesstreamlitruntimescriptrunnerscript_runner.py”, line 565, in _run_script
exec(code, module.__dict__)
File “G:LCCHuggerhftest.py”, line 14, in
st.write(x.as_dict())
The error message you received states that the `’ModelInfo’` object does not have an attribute called `’as_dict’`. The error occurs on line 14 of the `hftest.py` script.
Based on the code snippet you provided, it seems like you are trying to access the `’as_dict’` method of a `’ModelInfo’` object called `’x’`. However, the `’ModelInfo’` object you are working with does not have this method defined.
To resolve this issue, you have a few options:
1. Check the documentation: Review the documentation or resources related to the `’ModelInfo’` class you are using. Verify if there is an alternative method or property available to retrieve the desired information.
2. Update the code: If the `’ModelInfo’` class does not provide a direct method or property to retrieve the information as a dictionary, you may need to modify the code accordingly. Consider accessing individual properties or attributes of the `’ModelInfo’` object to extract the desired information and then manually create a dictionary if needed.
Here’s an example that demonstrates accessing specific properties of the `’ModelInfo’` object and creating a dictionary manually:
“`python
model_dict = {
‘property1’: x.property1,
‘property2’: x.property2,
# Add more properties as needed
}
st.write(model_dict)
“`
Remember to replace `’property1’`, `’property2’`, and other properties with the actual attributes or properties of the `’ModelInfo’` object that you want to include in the dictionary.
By reviewing the documentation or modifying the code accordingly, you should be able to resolve the `AttributeError` and retrieve the desired information from the `’ModelInfo’` object.
How to find all attributes of modelInfo?
To find all the attributes of the `ModelInfo` object, you can use the `dir()` function in Python. The `dir()` function returns a list of names in the specified object, including attributes, methods, and other object details.
Here’s an example of how you can use `dir()` to list all attributes of a `ModelInfo` object:
“`python
x = ModelInfo() # Replace ModelInfo() with your actual ModelInfo object
attributes = dir(x)
for attribute in attributes:
print(attribute)
“`
Running this code will print out a list of all attributes and methods available in the `ModelInfo` object. You can then examine the output to identify the specific attributes you are interested in.
Keep in mind that `dir()` will return all attributes, including built-in attributes and methods inherited from parent classes. It’s helpful to refer to the documentation or the source code of the `ModelInfo` class to understand the purpose and usage of each attribute.
python privategpt.py
Traceback (most recent call last):
File “G:PrivateGPTprivategpt.py”, line 21, in
from constants import CHROMA_SETTINGS
File “G:PrivateGPTconstants.py”, line 11, in
CHROMA_SETTINGS = Settings(
File “pydanticenv_settings.py”, line 39, in pydantic.env_settings.BaseSettings.__init__
File “pydanticmain.py”, line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Settings
persist_directory
none is not an allowed value (type=type_error.none.not_allowed)
The error message suggests that there is a validation error for the `Settings` object in the `constants.py` file. Specifically, it indicates that the value for the `persist_directory` attribute is set to `none`, which is not an allowed value according to the defined validation rules.
To fix this issue, you should provide a valid value for the `persist_directory` attribute in the `CHROMA_SETTINGS` object. Make sure to set a proper directory path where you want the persistence data to be stored.
Here’s an example of how you can modify the `constants.py` file to specify a valid `persist_directory` value:
“`python
from pydantic import BaseModel
class Settings(BaseModel):
persist_directory: str # Replace ‘none’ with a valid directory path
CHROMA_SETTINGS = Settings(
persist_directory=”/path/to/persistence/directory” # Set a valid directory path here
)
“`
Replace `”/path/to/persistence/directory”` with the actual path where you want the persistence data to be stored.
After making these changes, try running `privategpt.py` again, and it should resolve the validation error.
Approximately how much VrAM would I need for a 13 billion parameter model?
Estimating the exact VRAM (Video Random Access Memory) requirements for a deep learning model can be challenging, as it depends on several factors, including the architecture, data type, and specific implementation details. However, I can provide a rough estimation to give you an idea.
As a rule of thumb, the VRAM requirements for a deep learning model are typically proportional to the number of parameters in the model. Assuming a conservative estimate of approximately 4 bytes per parameter, a 13 billion parameter model would require approximately 52 billion bytes of VRAM.
Converting this to a more commonly used unit, it amounts to approximately 52 GB of VRAM. Keep in mind that this is just an estimate, and the actual VRAM usage can vary depending on factors such as the model’s structure, the size of the input data, and the specific deep learning framework being used.
When working with large models, it’s crucial to ensure that your GPU has sufficient VRAM to accommodate the model and any additional memory requirements for data, gradients, and optimization algorithms. If your GPU does not have enough VRAM, you may need to consider alternative strategies such as model parallelism or distributed training to fit the model into memory.