Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma Model Storing and Loading after Fine tuning #67

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Danish202Gupta
Copy link

Hi there, I encountered a strange bug after trying to load the gemma-2b model using kerasnlp.

My finetuning code is the following:

` def fine_tune(self, X, y):
data = generate_training_prompts(X, y)

enable lora-finetuning

self.model.backbone.enable_lora(rank=self.config['lora_rank'])

# Reduce the input sequence length to limit memory usage
self.model.preprocessor.sequence_length = self.config['tokenization_max_length']

# Use AdamW (a common optimizer for transformer models)
optimizer = keras.optimizers.AdamW(
    learning_rate=self.config['learning_rate'],
    weight_decay=self.config['weight_decay'],
)

# Exclude layernorm and bias terms from decay
optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])

self.model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=optimizer,
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
    sampler=self.config['sampler'],
)

self.model.fit(data, epochs=self.config['epochs'], batch_size=self.config['batch_size'])

# Define the directory name
fine_tuned_dir_name = f'fine_tuned_{self.config["basemodel"]}_{datetime.now().strftime("%Y%m%d_%H%M%S")}'
fine_tuned_dir_path = os.path.join('models', fine_tuned_dir_name)

# Create the directory if it doesn't exist
if not os.path.exists(fine_tuned_dir_path):
    os.makedirs(fine_tuned_dir_path)

# Save only the weights in the directory with a specific name
weights_file_path = os.path.join(fine_tuned_dir_path, 'weights.keras')
self.model.save(weights_file_path)

# Save model configuration within the same directory
model_config = create_model_config(self.config, np.unique(
    y).tolist())  # Ensure you have `class_names` defined or adapt as necessary
config_filename = os.path.join(fine_tuned_dir_path, 'model_config.json')
with open(config_filename, 'w') as json_file:
    json.dump(model_config, json_file, indent=4)

# Push model weights and config to wandb
# Note: You may need to adjust this depending on how wandb expects files to be saved
wandb.save(os.path.join(fine_tuned_dir_path, '*'))`

The training completes as expected in keras. Although when I try to load the model using the weights.keras file created from the script above I am getting two unexpected behaviors, see script for loading the model below,

`import keras

loaded_model = keras.saving.load_model("/data/host-category-classification/nlp/classification/Gemma/models"
"/fine_tuned_gemma-2b_20240229_151158/weights.keras")

print(loaded_model.summary())`

First, I observed that each call to the loading process will generate unknown set of files that occupy my disk indefinitely ~10 gb. In addition, the loading process takes forever (havent found the actual time but it should not take more than 10 minutes to load) compared to the the gemma.load_preset method. Do you have any suggestions? There seem to be null documentation either on keras nlp or tensorflow regarding model storage and loading for gemma related models.

@github-actions github-actions bot added the Gemma label Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant