Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation example does not work #1346

Open
3 of 6 tasks
ShankarRIntel opened this issue Jan 4, 2025 · 5 comments
Open
3 of 6 tasks

Translation example does not work #1346

ShankarRIntel opened this issue Jan 4, 2025 · 5 comments
Assignees
Labels
aitce bug Something isn't working

Comments

@ShankarRIntel
Copy link

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

1.1

Description

Translation example does not work after all the servers are up after docker compose

Reproduce steps

sratnesh@fm2r81s1gaudi:/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$ docker compose up -d
[+] Running 6/6
✔ Network gaudi_default Created 0.2s
✔ Container tgi-gaudi-server Started 1.4s
✔ Container llm-tgi-gaudi-server Started 1.9s
✔ Container translation-gaudi-backend-server Started 2.3s
✔ Container translation-gaudi-ui-server Started 2.7s
✔ Container translation-gaudi-nginx-server Started 3.2s
sratnesh@fm2r81s1gaudi:
/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$ curl http://${host_ip}:8008/generate
-X POST
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}'
-H 'Content-Type: application/json'
curl: (7) Failed to connect to 192.168.1.11 port 8008 after 0 ms: Connection refused
sratnesh@fm2r81s1gaudi:/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$ curl http://${host_ip}:9000/v1/chat/completions
-X POST
-d '{"query":"Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"}'
-H 'Content-Type: application/json'
Internal Server Errorsratnesh@fm2r81s1gaudi:
/workspace/GenAIExamples/Translation/docker_compcurl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{plication/json" -d '{
"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
sratnesh@fm2r81s1gaudi:/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$ curl http://${host_ip}:${NGINX_PORT}/v1/translation
-H "Content-Type: application/json"
-d '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
sratnesh@fm2r81s1gaudi:
/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$

Raw log

sratnesh@fm2r81s1gaudi:~/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$ docker compose up -d
[+] Running 6/6
 ✔ Network gaudi_default                       Created                                                                                                                                                              0.2s
 ✔ Container tgi-gaudi-server                  Started                                                                                                                                                              1.4s
 ✔ Container llm-tgi-gaudi-server              Started                                                                                                                                                              1.9s
 ✔ Container translation-gaudi-backend-server  Started                                                                                                                                                              2.3s
 ✔ Container translation-gaudi-ui-server       Started                                                                                                                                                              2.7s
 ✔ Container translation-gaudi-nginx-server    Started                                                                                                                                                              3.2s
sratnesh@fm2r81s1gaudi:~/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$ curl http://${host_ip}:8008/generate \
  -X POST \
  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
  -H 'Content-Type: application/json'
curl: (7) Failed to connect to 192.168.1.11 port 8008 after 0 ms: Connection refused
sratnesh@fm2r81s1gaudi:~/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$ curl http://${host_ip}:9000/v1/chat/completions \
  -X POST \
  -d '{"query":"Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"}' \
  -H 'Content-Type: application/json'
Internal Server Errorsratnesh@fm2r81s1gaudi:~/workspace/GenAIExamples/Translation/docker_compcurl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{plication/json" -d '{
     "language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
sratnesh@fm2r81s1gaudi:~/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$ curl http://${host_ip}:${NGINX_PORT}/v1/translation \
    -H "Content-Type: application/json" \
    -d '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
sratnesh@fm2r81s1gaudi:~/workspace/GenAIExamples/Translation/docker_compose/intel/hpu/gaudi$
@ShankarRIntel ShankarRIntel added the bug Something isn't working label Jan 4, 2025
@yinghu5 yinghu5 self-assigned this Jan 6, 2025
@yinghu5
Copy link
Collaborator

yinghu5 commented Jan 6, 2025

Hi ShankarRIntel,

Thank you a lot for reporting those issue! We will investigate them.

As we are refactoring the code, maybe bring some unknow issues. From the error message:
ailed to connect to 192.168.1.11 port 8008 after 0 ms: Connection refused and
Internal Server Errors
It seems the docker image or the network are broken

Could you please help to get some check information by command

  1. docker ps -a

  2. docker logs tgi-gaudi-server -t

To download the llm model may need some times as
https://opea-project.github.io/latest/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.html mentioned:
image

and if possible, check OPEA version:
3. git status
environment, are you using Intel IDC or private cluster? please try
4. hl-smi
5. hostname -i

Thanks

@yinghu5
Copy link
Collaborator

yinghu5 commented Jan 9, 2025

Hi @ShankarRIntel ,

Any updates about your machine configuration (like networks for LLM downloads, Gaudi driver etc) and the examples?

For your reference, in order to show how the OPEA example works, we release some videos on the AI Software Catalog: https://aiswcatalog.intel.com/explore:
• Code Generation: Intel® AI SW Catalog
• Audio Q&A: Intel® AI SW Catalog
• Visual Q&A: Intel® AI SW Catalog
• Code Translation: Intel® AI SW Catalog
• Content Summarization: Intel® AI SW Catalog
• Chat Q&A: https://aiswcatalog.intel.com/solution/aimodel-18c8e8c2-31d1-4af4-b0a2-ab8613a3c3b9
And FAQ Gen and Multi-Lingual QnA will available soon

thanks

@yinghu5 yinghu5 added the aitce label Jan 10, 2025
@ShankarRIntel
Copy link
Author

ShankarRIntel commented Jan 13, 2025 via email

@yinghu5
Copy link
Collaborator

yinghu5 commented Jan 13, 2025

Hi ShankarRIntel ,

I will contact you by team and let's try them. thank you!

@xiguiw
Copy link
Collaborator

xiguiw commented Jan 14, 2025

@ShankarRIntel

I built docker image from source code, I run Translation on Gaud2 1.19 successfully.
I cannot reproduce your issue at my side.

Did you pull docker image 1.1 or build docker images from source code?
Would you please provide details instructions to reproduce this issue?
For example, The document link you refer to, your code commit ID (GenAIExampls and GenAIComp), steps to setup environment, and steps to set environment variables, http_proxy used in your network etc.

Please kindly provide the logs. Thanks!

From you logs, the docker start in seconds.

✔ Network gaudi_default Created 0.2s
✔ Container tgi-gaudi-server Started 1.4s
✔ Container llm-tgi-gaudi-server Started 1.9s
✔ Container translation-gaudi-backend-server Started 2.3s
✔ Container translation-gaudi-ui-server Started 2.7s
✔ Container translation-gaudi-nginx-server Started 3.2s

"curl: (7) Failed to connect to 192.168.1.11 port 8008 after 0 ms: Connection refused"

  1. The server is not started when you send the first curl command.
  2. Next it shows "internal server error"
    The LLM model cannot be ready in seconds as it's a 13B model and torch model conversion to safetensors.

Her are my test environment:

+-----------------------------------------------------------------------------+
| HL-SMI Version:                              hl-1.19.0-fw-57.1.0.0          |
| Driver Version:                                     1.19.0-2427ed8          |
|-------------------------------+----------------------+----------------------+
| AIP  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncor-Events|
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | AIP-Util  Compute M. |
|===============================+======================+======================|
|   0  HL-225              N/A  | 0000:33:00.0     N/A |                 141  |
| N/A   26C   N/A  71W /  600W  |   768MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   1  HL-225              N/A  | 0000:9a:00.0     N/A |                  18  |
| N/A   27C   N/A  90W /  600W  |   768MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
...|-------------------------------+----------------------+----------------------+
|   7  HL-225              N/A  | 0000:b4:00.0     N/A |                   2  |
| N/A   26C   N/A  80W /  600W  |   768MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes:                                               AIP Memory |
|  AIP       PID   Type   Process name                             Usage      |
|=============================================================================|
|   0        N/A   N/A    N/A                                      N/A        |
|   1        N/A   N/A    N/A                                      N/A        |
...
|   7        N/A   N/A    N/A                                      N/A        |
+=============================================================================+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aitce bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants