-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of LLM-Assisted Translations Backend #3100
Conversation
|
||
|
||
class Command(BaseCommand): | ||
help = "Refines machine translations using OpenAI with specified formality" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to fit "alternative" into a formality concept. Maybe just "specified characteristics"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes more sense.
) | ||
parser.add_argument("english_text", type=str, help="The original English text") | ||
parser.add_argument( | ||
"slovenian_text", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to support more than Slovenian, this code should probably start locale agnostic?
pontoon/gpt/openai_connection.py
Outdated
# Define system messages for different translation tasks | ||
system_messages = { | ||
"informal": """You will be provided with a passage in English, along with its machine-generated translation in Slovenian. Your objective is to revise the Slovenian translation to ensure it utilizes simpler language. Please adhere to the following guidelines to achieve this: | ||
- Simplify Vocabulary: Use common, everyday words that are easily understood by a broad audience, avoiding technical jargon, idiomatic expressions, and complex terms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels too much? Why should it avoid idiomatic expressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to make the text more accessible to a broader audience who may not understand specific idiomatic expressions because of the cultural nuances associated with them. But I see your point, the translation may be oversimplified.
pontoon/gpt/openai_connection.py
Outdated
- Shorten and Simplify Sentences: Break down long, complex sentences into shorter, more manageable ones. Aim for clarity and conciseness in each sentence. | ||
- Use Basic Grammar Structures: Avoid complex grammatical constructions. Stick to simple tenses and straightforward sentence structures. | ||
- Clarity is Key: Ensure the translation conveys the original message in the clearest possible manner, without ambiguity or unnecessary complexity. | ||
- Engage a Wide Audience: The simplified translation should be accessible and easily understandable to people with varying levels of Slovenian proficiency, including young readers and non-native speakers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly: do we really want that simple of a language?
pontoon/gpt/openai_connection.py
Outdated
- Consistent Simplicity: Maintain a consistent level of simplicity throughout the translation. The aim is to make the text as accessible as possible without sacrificing accuracy or meaning. | ||
The goal is to produce a translation that accurately reflects the original English text, but in a way that is more approachable and easier to understand for all Slovenian speakers, regardless of their language proficiency.""", | ||
"formal": """You will be presented with text in English, accompanied by its machine-generated translation in Slovenian. Your task is to refine the Slovenian translation to ensure it adheres to a higher level of formality. In doing so, consider the following guidelines: | ||
- Upgrade Language Use: Elevate the language by selecting more sophisticated vocabulary and phrases that are appropriate for formal contexts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to drop this. One thing is to ask for formal structure ("Sie" vs "Du" in German), but this is going one step further, asking for super-formal vocabulary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Based on the feedback received from the prompt evaluation I think this guideline may have caused the output to be extremely formal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work with the patch and thorough instructions in the PR!
Please de-hard-code Slovenian from openai_connection.py. For English it is OK to stay hard-coded.
Please also rebase against main
in order to resolve conflict in requirements.
We just landed the python upgrade, so please rebase again if you already did. Note that |
662cfef
to
83dde9f
Compare
83dde9f
to
5f223e6
Compare
5f223e6
to
ad93cf0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Please make sure you validate the target language.
I also left some notes for stylistic improvements, but nothing major.
|
||
|
||
class Command(BaseCommand): | ||
help = "Refines machine translations using OpenAI with specified characteristics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help = "Refines machine translations using OpenAI with specified characteristics" | |
help = "Refines machine translations using OpenAI ChatGPT with specified characteristics" |
"target_text", type=str, help="The machine-generated translation to refine" | ||
) | ||
parser.add_argument( | ||
"locale_name", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're actually using language, not locale, so I'd replace all locale*
variables with language*
.
pontoon/gpt/openai_connection.py
Outdated
self.client = OpenAI() | ||
|
||
def get_translation( | ||
self, english_text, translated_text, characteristic, target_locale_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should raise a ValueErorr
if target_locale_name
(should be target_language_name
) doesn't exist in the pontoon.base.models.Locale
table.
pontoon/gpt/openai_connection.py
Outdated
def get_translation( | ||
self, english_text, translated_text, characteristic, target_locale_name | ||
): | ||
system_messages = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this hard to read. Please declare each of the three characteristics as variables and then use them like this:
system_messages = {
"informal": informal,
"formal": formal,
"alternative": alternative,
}
pontoon/gpt/openai_connection.py
Outdated
- Clarity is Key: Ensure the translation conveys the original message in the clearest possible manner, without ambiguity or unnecessary complexity. | ||
- Consistent Simplicity: Maintain a consistent level of simplicity throughout the translation. | ||
The goal is to produce a translation that accurately reflects the original English text, but in a way that is more approachable and easier to understand for all {target_locale_name} speakers, regardless of their language proficiency.""", | ||
"formal": f"""You will be presented with text in English, accompanied by its machine-generated translation in {target_locale_name}. Your task is to refine the {target_locale_name} translation to ensure it adheres to a higher level of formality. In doing so, consider the following guidelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this part of the message rephrased from the one used for informal
in line 12? Can't we use the same message and only modifiy the vital part - "utilizes simpler language" vs "adheres to a higher level of formality".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to remove the specific guidelines in the formal prompt? Like the "Adjust the tone" and "Formal addressing"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm only referring to the differences between lines 12 and 16:
- provided with vs presented with
- along with vs accompanied by
- Your objective vs Your task
- to revise the translation vs to refine the {target_locale_name} translation
- Please adhere to the following guidelines to achieve this vs In doing so, consider the following guidelines
pontoon/gpt/openai_connection.py
Outdated
# Extract the content attribute from the response | ||
translation = ( | ||
response.choices[0].message.content | ||
if response.choices[0].message.content | ||
else "" | ||
) | ||
return translation.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Extract the content attribute from the response | |
translation = ( | |
response.choices[0].message.content | |
if response.choices[0].message.content | |
else "" | |
) | |
return translation.strip() | |
# Extract the content attribute from the response | |
translation = response.choices[0].message.content or "" | |
return translation.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, can we just use this?
return response.choices[0].message.content.strip()
In other words, is it possible that response.choices[0].message.content
is not a string (e.g. None
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this works as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Almost there!
@@ -0,0 +1,37 @@ | |||
from django.core.management.base import BaseCommand | |||
from pontoon.machinery.gpt.openai_connection import OpenAIService |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the gpt
folder and move the file a level up. I would also rename the file to openai_service.py
to match the class name (OpenAIService
).
|
||
formal = ( | ||
f"You will be provided with text in English, along with its machine-generated translation in {target_language}. " | ||
"Your objective is to revise the {target_language} translation to ensure it adheres to a higher level of formality. Adhere to the following guidelines to achieve this:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verb adhere is now used twice. Maybe use "utilize" in the first case?
) | ||
|
||
alternative = ( | ||
f"You will provided text in English along with its machine-generated translation in {target_language}. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"You will be provided with text in English, along with its machine-generated translation in {target_language}. "
|
||
alternative = ( | ||
f"You will provided text in English along with its machine-generated translation in {target_language}. " | ||
"Your objective is to provide an alternative translation, Adhere to the following guidelines to achieve this:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace comma before "Adhere" with a period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Addresses Issue #3068
This PR introduces the backend infrastructure for the LLM-Assisted Translations in Pontoon project. The implementation is divided into two components:
openai_service.py
: This module serves as the interface to the OpenAI API, using thegpt-4-0125-preview
model. It defines prompts for three translation refinements—informal, formal, and alternative—tailoring the output to meet specific stylistic or formal requirements.refine_translation.py
: A Django management command that facilitates user interaction with the translation refinement feature directly from the command line. Users can input an English source string along with a machine translation in any given language and specify the desired refinement type. The command returns a refined translation.This backend setup allows users to use the text generation capabilities of GPT-4 for enhancing translation quality and adaptability.
Usage Instructions:
To utilize the translation refinement feature, execute the following command in the terminal, substituting
<characteristic>
withinformal
,formal
, oralternative
based on the required refinement type:Examples:
For an Alternative Translation:
For an Informal Translation:
For an Formal Translation:
Handling Placeholders and Special Characters:
When entering commands, it's essential to handle placeholders
({ placeholder })
and special characters correctly:'
around texts containing special characters or when placeholders are used. If your text includes single quotes, escape them using'\''
.Example with Placeholders:
Next Steps:
The current backend implementation is the first phase of the LLM-Assisted Translations project. Future developments will focus on integrating this backend with the front-end interface, enabling input and output handling for translations across various locales supported by the feature.