Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of LLM-Assisted Translations Backend #3100

Merged
merged 5 commits into from
Feb 26, 2024

Conversation

ayanaar
Copy link
Contributor

@ayanaar ayanaar commented Feb 15, 2024

Addresses Issue #3068

This PR introduces the backend infrastructure for the LLM-Assisted Translations in Pontoon project. The implementation is divided into two components:

  1. openai_service.py: This module serves as the interface to the OpenAI API, using the gpt-4-0125-preview model. It defines prompts for three translation refinements—informal, formal, and alternative—tailoring the output to meet specific stylistic or formal requirements.

  2. refine_translation.py: A Django management command that facilitates user interaction with the translation refinement feature directly from the command line. Users can input an English source string along with a machine translation in any given language and specify the desired refinement type. The command returns a refined translation.

This backend setup allows users to use the text generation capabilities of GPT-4 for enhancing translation quality and adaptability.

Usage Instructions:

To utilize the translation refinement feature, execute the following command in the terminal, substituting <characteristic> with informal, formal, or alternative based on the required refinement type:

python manage.py refine_translation <characteristic> "English text" "Translated text" "locale_name"

Examples:

For an Alternative Translation:

python manage.py refine_translation alternative "Email resent. Add { $accountsEmail } to your contacts to ensure smooth delivery." "E-pošta ponovno poslana. Dodajte { $accountsEmail } med svoje stike in si zagotovite nemoteno dostavo." "Slovenian"

For an Informal Translation:

python manage.py refine_translation informal "Your subscription has been confirmed." "Vaša naročnina je bila potrjena." "Slovenian"

For an Formal Translation:

python manage.py refine_translation formal "We appreciate your feedback." "Vaše povratne informacije cenimo." "Slovenian"

Handling Placeholders and Special Characters:

When entering commands, it's essential to handle placeholders ({ placeholder }) and special characters correctly:

  • Placeholders: Ensure placeholders within the text are enclosed in quotes to prevent the shell from interpreting them as commands or variables.
  • Special Characters: Use single quotes ' around texts containing special characters or when placeholders are used. If your text includes single quotes, escape them using '\''.

Example with Placeholders:

python manage.py refine_translation alternative "Please update your profile here: { $profileUrl }" 'Prosimo, posodobite svoj profil tukaj: { $profileUrl }' "Slovenian"

Next Steps:

The current backend implementation is the first phase of the LLM-Assisted Translations project. Future developments will focus on integrating this backend with the front-end interface, enabling input and output handling for translations across various locales supported by the feature.

@ayanaar ayanaar requested a review from mathjazz February 15, 2024 21:12


class Command(BaseCommand):
help = "Refines machine translations using OpenAI with specified formality"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to fit "alternative" into a formality concept. Maybe just "specified characteristics"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes more sense.

)
parser.add_argument("english_text", type=str, help="The original English text")
parser.add_argument(
"slovenian_text",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to support more than Slovenian, this code should probably start locale agnostic?

# Define system messages for different translation tasks
system_messages = {
"informal": """You will be provided with a passage in English, along with its machine-generated translation in Slovenian. Your objective is to revise the Slovenian translation to ensure it utilizes simpler language. Please adhere to the following guidelines to achieve this:
- Simplify Vocabulary: Use common, everyday words that are easily understood by a broad audience, avoiding technical jargon, idiomatic expressions, and complex terms.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels too much? Why should it avoid idiomatic expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to make the text more accessible to a broader audience who may not understand specific idiomatic expressions because of the cultural nuances associated with them. But I see your point, the translation may be oversimplified.

- Shorten and Simplify Sentences: Break down long, complex sentences into shorter, more manageable ones. Aim for clarity and conciseness in each sentence.
- Use Basic Grammar Structures: Avoid complex grammatical constructions. Stick to simple tenses and straightforward sentence structures.
- Clarity is Key: Ensure the translation conveys the original message in the clearest possible manner, without ambiguity or unnecessary complexity.
- Engage a Wide Audience: The simplified translation should be accessible and easily understandable to people with varying levels of Slovenian proficiency, including young readers and non-native speakers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly: do we really want that simple of a language?

- Consistent Simplicity: Maintain a consistent level of simplicity throughout the translation. The aim is to make the text as accessible as possible without sacrificing accuracy or meaning.
The goal is to produce a translation that accurately reflects the original English text, but in a way that is more approachable and easier to understand for all Slovenian speakers, regardless of their language proficiency.""",
"formal": """You will be presented with text in English, accompanied by its machine-generated translation in Slovenian. Your task is to refine the Slovenian translation to ensure it adheres to a higher level of formality. In doing so, consider the following guidelines:
- Upgrade Language Use: Elevate the language by selecting more sophisticated vocabulary and phrases that are appropriate for formal contexts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to drop this. One thing is to ask for formal structure ("Sie" vs "Du" in German), but this is going one step further, asking for super-formal vocabulary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Based on the feedback received from the prompt evaluation I think this guideline may have caused the output to be extremely formal.

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work with the patch and thorough instructions in the PR!

Please de-hard-code Slovenian from openai_connection.py. For English it is OK to stay hard-coded.

Please also rebase against main in order to resolve conflict in requirements.

@mathjazz
Copy link
Collaborator

mathjazz commented Feb 22, 2024

We just landed the python upgrade, so please rebase again if you already did.

Note that typing-extensions is no longer a required black dependency in Python 3.10+, which should solve the problem of installing openai. You can remove the black upgrade in the lint.in file and run make requirements again.

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Please make sure you validate the target language.

I also left some notes for stylistic improvements, but nothing major.



class Command(BaseCommand):
help = "Refines machine translations using OpenAI with specified characteristics"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help = "Refines machine translations using OpenAI with specified characteristics"
help = "Refines machine translations using OpenAI ChatGPT with specified characteristics"

"target_text", type=str, help="The machine-generated translation to refine"
)
parser.add_argument(
"locale_name",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're actually using language, not locale, so I'd replace all locale* variables with language*.

self.client = OpenAI()

def get_translation(
self, english_text, translated_text, characteristic, target_locale_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should raise a ValueErorr if target_locale_name (should be target_language_name) doesn't exist in the pontoon.base.models.Locale table.

def get_translation(
self, english_text, translated_text, characteristic, target_locale_name
):
system_messages = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this hard to read. Please declare each of the three characteristics as variables and then use them like this:

system_messages = {
    "informal": informal,
    "formal": formal,
    "alternative": alternative,
}

- Clarity is Key: Ensure the translation conveys the original message in the clearest possible manner, without ambiguity or unnecessary complexity.
- Consistent Simplicity: Maintain a consistent level of simplicity throughout the translation.
The goal is to produce a translation that accurately reflects the original English text, but in a way that is more approachable and easier to understand for all {target_locale_name} speakers, regardless of their language proficiency.""",
"formal": f"""You will be presented with text in English, accompanied by its machine-generated translation in {target_locale_name}. Your task is to refine the {target_locale_name} translation to ensure it adheres to a higher level of formality. In doing so, consider the following guidelines:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of the message rephrased from the one used for informal in line 12? Can't we use the same message and only modifiy the vital part - "utilizes simpler language" vs "adheres to a higher level of formality".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to remove the specific guidelines in the formal prompt? Like the "Adjust the tone" and "Formal addressing"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm only referring to the differences between lines 12 and 16:

  • provided with vs presented with
  • along with vs accompanied by
  • Your objective vs Your task
  • to revise the translation vs to refine the {target_locale_name} translation
  • Please adhere to the following guidelines to achieve this vs In doing so, consider the following guidelines

Comment on lines 45 to 51
# Extract the content attribute from the response
translation = (
response.choices[0].message.content
if response.choices[0].message.content
else ""
)
return translation.strip()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Extract the content attribute from the response
translation = (
response.choices[0].message.content
if response.choices[0].message.content
else ""
)
return translation.strip()
# Extract the content attribute from the response
translation = response.choices[0].message.content or ""
return translation.strip()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can we just use this?

return response.choices[0].message.content.strip()

In other words, is it possible that response.choices[0].message.content is not a string (e.g. None)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this works as well

@ayanaar ayanaar requested a review from mathjazz February 26, 2024 19:13
Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Almost there!

@@ -0,0 +1,37 @@
from django.core.management.base import BaseCommand
from pontoon.machinery.gpt.openai_connection import OpenAIService
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the gpt folder and move the file a level up. I would also rename the file to openai_service.py to match the class name (OpenAIService).


formal = (
f"You will be provided with text in English, along with its machine-generated translation in {target_language}. "
"Your objective is to revise the {target_language} translation to ensure it adheres to a higher level of formality. Adhere to the following guidelines to achieve this:\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verb adhere is now used twice. Maybe use "utilize" in the first case?

)

alternative = (
f"You will provided text in English along with its machine-generated translation in {target_language}. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You will be provided with text in English, along with its machine-generated translation in {target_language}. "


alternative = (
f"You will provided text in English along with its machine-generated translation in {target_language}. "
"Your objective is to provide an alternative translation, Adhere to the following guidelines to achieve this:\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace comma before "Adhere" with a period.

@ayanaar ayanaar requested a review from mathjazz February 26, 2024 19:50
Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@mathjazz mathjazz merged commit 3e03b17 into mozilla:main Feb 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants