Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the example of model #8

Open
iamysy opened this issue Nov 8, 2022 · 6 comments
Open

Change the example of model #8

iamysy opened this issue Nov 8, 2022 · 6 comments

Comments

@iamysy
Copy link

iamysy commented Nov 8, 2022

Hello, your work is great. Thank you for your inspiration.
I had some problems during the operation.(I run under the docker environment.)
First of all, in the cylinder of example, Even after 200 rounds, the value of the reward is still unstable, and even falling off a cliff.
Secondly, I want to change the reinforcement learning model of the cylindrical example to SAC. I changed it on the basis of the square column example, but the reward is unchanged.
May I ask you about these two questions?
Thank you again.

@1900360
Copy link
Collaborator

1900360 commented Nov 8, 2022

Hi @iamysy!
I am very happy that you can use our platform:)
For the first question: In view of the fact that PPO requires a lot of parameter tuning work, we only found a set of more suitable hyperparameters for training, and we did multiple sets of training to get the following figure. But unfortunately it still has randomness, so we used the SAC algorithm for square column training, which has very few hyperparameters to adjust compared to PPO.

8

I suggest that if there is a sudden drop in Reward, you can restart the training a few times to get the effect similar to the paper:)

For the second question: If you want to train cylinders using the SAC algorithm, you can compare these two files: launch_multiprocessing_traning_cylinder.py and launch_multiprocessing_traning_square.py. You need to put the cylinder changes into launch_multiprocessing_traning_square.py of the square cylinder:

foam_params = {
    'delta_t': 0.0005,
    'solver': 'pimpleFoam',
    'num_processor': 5,
    'of_env_init': 'source ~/OpenFOAM/OpenFOAM-8/etc/bashrc',
    'cfd_init_time': 0.005,
    'num_dimension': 2,
    'verbose': False
}

entry_dict_q0 = {
    'U': {
        'JET1': {
            'q0': '{x}',
        },
        'JET2': {
            'q0': '{-x}',
        }
    }
}

entry_dict_q1 = {
    'U': {
        'JET1': {
            'q1': '{y}',
        },
        'JET2': {
            'q1': '{-y}',
        }
    }
}

entry_dict_t0 = {
    'U': {
        'JET1': {
            't0': '{t}'
        },
        'JET2': {
            't0': '{t}'
        }
    }
}

agent_params = {
    'entry_dict_q0': entry_dict_q0,
    'entry_dict_q1': entry_dict_q1,
    'entry_dict_t0': entry_dict_t0,
    'deltaA': 0.05,
    'minmax_value': (-1.5, 1.5),
    'interaction_period': 0.025,
    'purgeWrite_numbers': 0,
    'writeInterval': 0.025,
    'deltaT': 0.0005,
    'variables_q0': ('x',),
    'variables_q1': ('y',),
    'variables_t0': ('t',),
    'verbose': False,
    "zero_net_Qs": True,
}
state_params = {
    'type': 'velocity'
}

In fact, these are some changes to the jets in the OpenFoam case, which will become clearer if you look further. Good luck:)

@iamysy
Copy link
Author

iamysy commented Nov 10, 2022

Thank you! @1900360
The answer solved some of my confusion!
With regard to two questions, according to your method, I will observe again. I hope we will have more communication in the future.
Thank you again!^

@iamysy
Copy link
Author

iamysy commented Mar 15, 2023

Hello, I am still doing similar work recently, I would like to ask you, if I want to break the limit of 100 trajectory, where do I need to change it? I have tried to change step-per-epoch to 200 and max_episode_timesteps to 200, but I still keep only 100 tracks in the environment.
In addition, I have another question to ask you. I want to save the case with the best reward function in each parallel environment, so I modified the reset function in environment_tianshou.py, but I can only save fluent.msh, but I can't save anything else. Do you have any solutions to this?


	def reset(self):
		"""Resets the environment to an initial state and returns the initial observation."""
		root_path = os.getcwd()
		env_name_list = sorted([envs for envs in os.listdir(root_path) if re.search(r'^env\d+$', envs)])
		env_path_list = ['/'.join([root_path, i]) for i in env_name_list]
		saver='/best_training_episode'
		if os.path.exists('/'.join([root_path, saver])) == False:
			os.makedirs('/'.join([root_path, saver]))
		if self.num_episode < 0.5:
			os.makedirs(self.foam_root_path + '/record')
		else:
			# Extract the optimal action in the entire episode, skip the first initialization process
			self.episode_reward_sequence.append(self.episode_reward)
			pd.DataFrame(
				self.episode_reward_sequence
			).to_csv(self.foam_root_path + '/record/total_reward.csv', index=False, header=False)
			if self.episode_reward_sequence[-1] == np.max(self.episode_reward_sequence):
				pd.DataFrame(
					self.actions_sequence
				).to_csv(self.foam_root_path + '/record/best_actions.csv', index=False, header=False)
				pd.DataFrame(
					self.history_force_Coeffs_df
				).to_csv(self.foam_root_path + '/record/best_history_force_Coeffs_df.csv', index=False, header=False)
				with open(self.foam_root_path + '/record/info.txt', 'w') as f:
					f.write(f'Current number of best reward episode is {self.num_episode}')
				shutil.rmtree('/'.join([root_path, saver]))
				if os.path.exists('/'.join([root_path, saver])) == False:
					os.makedirs('/'.join([root_path, saver]))
				a = 0
				training_num=1
				for i in  env_path_list[0:training_num]:
					a = str(a)
					src = i
					os.makedirs('/'.join([root_path, saver, a]))
					des = f'{root_path}/{saver}/{a}'
					"""
					for file in os.listdir(src):
						full_file_name = os.path.join(src, file)
						print(f'{root_path}/{saver}/{a}', full_file_name)
						if os.path.isfile(full_file_name):
							shutil.move(full_file_name, des)
					"""
					shutil.copytree(src, des)
					a = int(a)
					a = a+1

I hope my question doesn't seem stupid, and please give me your advice.
Thank you again!^.^

@1900360
Copy link
Collaborator

1900360 commented Mar 17, 2023

Hi @iamysy !

Well, I think you still have the following parameter unchanged, which is the key to determine the total number of steps each episode runs in each environment:
The code is in here
register( id="OpenFoam-v0", entry_point="DRLinFluids.environments_tianshou:OpenFoam_tianshou", max_episode_steps=100, )
You need to change max_episode_steps = 200

I don't think you need to save the best episode during DRL training. You can use DRL test function(that is no exploration). The desired episode can be obtained from testing by using the training convergence policy. Thanks for your reminding, I will update this function later.

Also, if you want to save the entire environment calculation file, it is not recommended to do so under environment_tianshou file. After all, there is no way to compare the effects of training (such as reward) between each environments. I think it's appropriate to need to operate under this file. But this is related to tianshou platform, please refer to it in tutorials.

@iamysy
Copy link
Author

iamysy commented Mar 19, 2023

Your advice is very useful to me! I found that I only modified one max_episode_steps in init and the other did not. I am sorry to waste your time because of this problem. I also understand the other problem. Thank you again for your suggestion!

@1900360
Copy link
Collaborator

1900360 commented Mar 23, 2023

I'm very glad to see that you succeeded. The test function for square and cylinder has been updated. Please notify me immediately if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants