-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: input 3 is none #7614
Comments
Hi @jds250, Thanks for trying. I believe that it should not be necessary to get the root to run your model. |
Hi, I am using the branch release/0.4, and here is my step to reproduce, it seems that exporting pte file is included in the llama.py script, which is in the examples/qualcomm/oss_scripts/llama2, and the pte file is generated in the llama_qnn folder. Step 1: Setup
Step2: Prepare ModelDownload and preapre stories110M model # tokenizer.model & stories110M.pt:
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"
# tokenizer.bin:
python -m extension.llm.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin
# params.json:
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json Step3: Run default examplesDefault example generates the story based on the given prompt, "Once". # 16a4w quant:
python llama.py -b /home/jds/executorch/build-android -s 1f1fa994 -m SM8650 --ptq 16a4w --checkpoint stories110M.pt --params params.json --tokenizer_model tokenizer.model --tokenizer_bin tokenizer.bin --prompt "what is python?" --pre_gen_pte /home/jds/executorch/examples/qualcomm/oss_scripts/llama2/llama2_qnn/ |
Got it. Let me clarify one thing. |
Oh, I see. It's a bug to set input. We have a fix in this PR. If possible, could you use main branch? |
yes, I have compiled it first |
Thank you! I will try it again |
BTW, if you are interested in llama 3.2, we have provided this script to export and run. To enhance user experience, we will integrate our script for Llama as soon as possible. |
Hi @shewu-quic, I am experiencing a very similar issue where the model does not respond, and in the logcat, I see the error Environment:
Steps Tried:
Request for Help:Could you please advise if there are any additional fixes or specific steps to resolve these issues? Thank you for your support! I appreciate any guidance you can provide. |
Hi @michaelk77, Thanks for trying.
Feel free to let me know if you need any further assistance! |
Hi @shewu-quic, Thank you for your response and clarification! Runtime Environment:
Updated Status:
PTE Generation:I generated the python examples/qualcomm/oss_scripts/llama3_2/llama.py \
-b build-android \
-m SM8475 \
--checkpoint "consolidated.00.pth" \
--params "original_params.json" \
--ptq 16a4w \
--model_size 1B \
--tokenizer_model "tokenizer.model" \
--prompt "what is 1+1" \
--temperature 0 \
--model_mode kv \
--prefill_seq_len 32 \
--kv_seq_len 128 \
--compile_only Model Source:I am using the model files from Meta Llama 3.2 1B Instruct on Hugging Face. If you need additional logs or further details, please let me know. I appreciate your assistance! |
Thanks for your information, Could you please use the following command to run pte?
|
Thank you for providing the command to run the PTE. I have executed the provided command with a minor addition to specify my device using the python examples/qualcomm/oss_scripts/llama3_2/llama.py \
-b build-android \
-m SM8475 \
--checkpoint "consolidated.00.pth" \
--params "original_params.json" \
--ptq 16a4w \
--model_size 1B \
--tokenizer_model "tokenizer.model" \
--prompt "what is 1+1" \
--temperature 0 \
--model_mode kv \
--prefill_seq_len 32 \
--kv_seq_len 128 \
--pre_gen_pte ${path_to_your_pte_directory} \
-s # my device code from ADB. Observations:
Log Details:Here is the relevant portion of the logcat output during execution:
Performance Stats:The
Could you let me know if there’s any misconfiguration or additional step I should take? Thank you for your assistance! |
Hi @michaelk77 Sorry for late reply. |
Title
Error: input 3 is none
when running Llama example in QNN ExecuTorch on AndroidDescription
I followed the instructions in the [Llama2 README](https://github.com/pytorch/executorch/blob/main/examples/qualcomm/oss_scripts/llama2/README.md) to run the
llama.py
script using QNN ExecuTorch on Android. The execution process fails with the errorinput 3 is none
, and metadata seems to be read from the model twice during execution.Steps to Reproduce
Environment setup:
Run the following command:
python llama.py -b executorch/build-android -s 112dhb -m SM8650 \ --ptq 16a4w --checkpoint stories110M.pt --params params.json \ --tokenizer_model tokenizer.model --tokenizer_bin tokenizer.bin \ --prompt "what is python?" \ --pre_gen_pte executorch/examples/qualcomm/oss_scripts/llama2/llama2_qnn/
LOG
So I found there is no output in my output file.
adb logcat
BTW I also notice that there is some fastrpc error: (maybe I don't have the root)
I wonder if it is necessary to get the root to deploy our model?
The text was updated successfully, but these errors were encountered: