Change the example of model #8

iamysy · 2022-11-08T03:14:08Z

Hello, your work is great. Thank you for your inspiration.
I had some problems during the operation.(I run under the docker environment.)
First of all, in the cylinder of example, Even after 200 rounds, the value of the reward is still unstable, and even falling off a cliff.
Secondly, I want to change the reinforcement learning model of the cylindrical example to SAC. I changed it on the basis of the square column example, but the reward is unchanged.
May I ask you about these two questions?
Thank you again.

1900360 · 2022-11-08T15:24:16Z

Hi @iamysy!
I am very happy that you can use our platform:)
For the first question: In view of the fact that PPO requires a lot of parameter tuning work, we only found a set of more suitable hyperparameters for training, and we did multiple sets of training to get the following figure. But unfortunately it still has randomness, so we used the SAC algorithm for square column training, which has very few hyperparameters to adjust compared to PPO.

I suggest that if there is a sudden drop in Reward, you can restart the training a few times to get the effect similar to the paper:)

For the second question: If you want to train cylinders using the SAC algorithm, you can compare these two files: launch_multiprocessing_traning_cylinder.py and launch_multiprocessing_traning_square.py. You need to put the cylinder changes into launch_multiprocessing_traning_square.py of the square cylinder:

foam_params = {
    'delta_t': 0.0005,
    'solver': 'pimpleFoam',
    'num_processor': 5,
    'of_env_init': 'source ~/OpenFOAM/OpenFOAM-8/etc/bashrc',
    'cfd_init_time': 0.005,
    'num_dimension': 2,
    'verbose': False
}

entry_dict_q0 = {
    'U': {
        'JET1': {
            'q0': '{x}',
        },
        'JET2': {
            'q0': '{-x}',
        }
    }
}

entry_dict_q1 = {
    'U': {
        'JET1': {
            'q1': '{y}',
        },
        'JET2': {
            'q1': '{-y}',
        }
    }
}

entry_dict_t0 = {
    'U': {
        'JET1': {
            't0': '{t}'
        },
        'JET2': {
            't0': '{t}'
        }
    }
}

agent_params = {
    'entry_dict_q0': entry_dict_q0,
    'entry_dict_q1': entry_dict_q1,
    'entry_dict_t0': entry_dict_t0,
    'deltaA': 0.05,
    'minmax_value': (-1.5, 1.5),
    'interaction_period': 0.025,
    'purgeWrite_numbers': 0,
    'writeInterval': 0.025,
    'deltaT': 0.0005,
    'variables_q0': ('x',),
    'variables_q1': ('y',),
    'variables_t0': ('t',),
    'verbose': False,
    "zero_net_Qs": True,
}
state_params = {
    'type': 'velocity'
}

In fact, these are some changes to the jets in the OpenFoam case, which will become clearer if you look further. Good luck:)

iamysy · 2022-11-10T03:08:45Z

Thank you! @1900360
The answer solved some of my confusion!
With regard to two questions, according to your method, I will observe again. I hope we will have more communication in the future.
Thank you again!^

iamysy · 2023-03-15T13:42:24Z

Hello, I am still doing similar work recently, I would like to ask you, if I want to break the limit of 100 trajectory, where do I need to change it? I have tried to change step-per-epoch to 200 and max_episode_timesteps to 200, but I still keep only 100 tracks in the environment.
In addition, I have another question to ask you. I want to save the case with the best reward function in each parallel environment, so I modified the reset function in environment_tianshou.py, but I can only save fluent.msh, but I can't save anything else. Do you have any solutions to this?


	def reset(self):
		"""Resets the environment to an initial state and returns the initial observation."""
		root_path = os.getcwd()
		env_name_list = sorted([envs for envs in os.listdir(root_path) if re.search(r'^env\d+$', envs)])
		env_path_list = ['/'.join([root_path, i]) for i in env_name_list]
		saver='/best_training_episode'
		if os.path.exists('/'.join([root_path, saver])) == False:
			os.makedirs('/'.join([root_path, saver]))
		if self.num_episode < 0.5:
			os.makedirs(self.foam_root_path + '/record')
		else:
			# Extract the optimal action in the entire episode, skip the first initialization process
			self.episode_reward_sequence.append(self.episode_reward)
			pd.DataFrame(
				self.episode_reward_sequence
			).to_csv(self.foam_root_path + '/record/total_reward.csv', index=False, header=False)
			if self.episode_reward_sequence[-1] == np.max(self.episode_reward_sequence):
				pd.DataFrame(
					self.actions_sequence
				).to_csv(self.foam_root_path + '/record/best_actions.csv', index=False, header=False)
				pd.DataFrame(
					self.history_force_Coeffs_df
				).to_csv(self.foam_root_path + '/record/best_history_force_Coeffs_df.csv', index=False, header=False)
				with open(self.foam_root_path + '/record/info.txt', 'w') as f:
					f.write(f'Current number of best reward episode is {self.num_episode}')
				shutil.rmtree('/'.join([root_path, saver]))
				if os.path.exists('/'.join([root_path, saver])) == False:
					os.makedirs('/'.join([root_path, saver]))
				a = 0
				training_num=1
				for i in  env_path_list[0:training_num]:
					a = str(a)
					src = i
					os.makedirs('/'.join([root_path, saver, a]))
					des = f'{root_path}/{saver}/{a}'
					"""
					for file in os.listdir(src):
						full_file_name = os.path.join(src, file)
						print(f'{root_path}/{saver}/{a}', full_file_name)
						if os.path.isfile(full_file_name):
							shutil.move(full_file_name, des)
					"""
					shutil.copytree(src, des)
					a = int(a)
					a = a+1

I hope my question doesn't seem stupid, and please give me your advice.
Thank you again!^.^

1900360 · 2023-03-17T17:38:12Z

Hi @iamysy !

Well, I think you still have the following parameter unchanged, which is the key to determine the total number of steps each episode runs in each environment:
The code is in here
register( id="OpenFoam-v0", entry_point="DRLinFluids.environments_tianshou:OpenFoam_tianshou", max_episode_steps=100, )
You need to change max_episode_steps = 200

I don't think you need to save the best episode during DRL training. You can use DRL test function(that is no exploration). The desired episode can be obtained from testing by using the training convergence policy. Thanks for your reminding, I will update this function later.

Also, if you want to save the entire environment calculation file, it is not recommended to do so under environment_tianshou file. After all, there is no way to compare the effects of training (such as reward) between each environments. I think it's appropriate to need to operate under this file. But this is related to tianshou platform, please refer to it in tutorials.

iamysy · 2023-03-19T12:05:04Z

Your advice is very useful to me! I found that I only modified one max_episode_steps in init and the other did not. I am sorry to waste your time because of this problem. I also understand the other problem. Thank you again for your suggestion!

1900360 · 2023-03-23T03:24:38Z

I'm very glad to see that you succeeded. The test function for square and cylinder has been updated. Please notify me immediately if you have any questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the example of model #8

Change the example of model #8

iamysy commented Nov 8, 2022

1900360 commented Nov 8, 2022

iamysy commented Nov 10, 2022

iamysy commented Mar 15, 2023

1900360 commented Mar 17, 2023

iamysy commented Mar 19, 2023

1900360 commented Mar 23, 2023

Change the example of model #8

Change the example of model #8

Comments

iamysy commented Nov 8, 2022

1900360 commented Nov 8, 2022

iamysy commented Nov 10, 2022

iamysy commented Mar 15, 2023

1900360 commented Mar 17, 2023

iamysy commented Mar 19, 2023

1900360 commented Mar 23, 2023