Skip to content

Framework introduction – Tasks

Nikkel Mollenhauer edited this page Jul 25, 2022 · 31 revisions

This page contains an overview and explanation over the different tasks users can perform using the recommerce simulation framework. Tasks can be started using our custom CLI:

recommerce -c <task>

where <task> is one of the valid tasks explained in the sections below.

For further reading on the topic of the latter three tasks, concerning the monitoring of agents, also refer to this thesis.

Diagram types

The following diagram shows all types of diagrams that are created by the various monitoring tools and the metrics they visualize:

Training

CLI command: recommerce -c training

For more information about the training task, also see the training example.

Description

During the training task, a reinforcement learning algorithm is used to train an artificial vendor (agent) to set optimal prices on a given market. For this, a large number of episodes are simulated, during which all competitors set prices. Over time, the RL-agent learns the behaviour of the market and its competitors, allowing it to improve its policy and increase profits. During the training process, models are saved, which contain the current pricing policy of the RL-agent. These models can then be used to further analyze the trained agent using one of the other tools outlined below.

Explanation

Before starting your training session, you need to decide on a combination of vendors and a marketplace to use. At least one (in most cases exactly one) of the vendors must be a reinforcement learning agent. You must also make sure that the vendors and marketplace fit together, meaning that only vendors built to work on circular economy models can function on a circular economy marketplace.

All of the chosen classes and other configuration items must be placed in their respective configuration files. To learn more about configuration files see this page.

Choosing a marketplace

For a training session, any of the (non-abstract) marketplaces available in our framework can be chosen. See this page for a list of all currently available marketplace classes.

Choosing an RL-agent

As was mentioned above, at least one of the vendors in the simulation must be a reinforcement learning agent when running the training task. This will be the trained agent. The following four scenarios are possible:

  • Standard training scenario: The RL-agent will be trained against the defined rule based competitors.
  • Rl-vs-RL training: Multiple RL-agents will be trained against each other, each learning their respective policies.
  • Self-play: The RL-agent will play against itself, learning against its own policy. In this scenario, there are no rule based competitors.

In the following, we will assume the standard training scenario was chosen.

All of the RL-algorithms that are available in our framework are compatible to be trained on all marketplaces. See this page for a list of all currently available RL-agent classes.

Choosing the competitors

After choosing the RL agent to be trained, its competitors in the market need to be defined. Currently, it is only possible to choose rule based vendors as competitors (see this issue, which should allow pre-trained RL-agents to be used as competitors).

You have two options:

  1. Using the default competitors: By simply not defining any competitors in your configuration file, the default competitors of the respective marketplace will be used.
  2. Defining your own competitors: When defining your own competitors, you need to supply the correct amount. For example, a duopoly marketplace needs you to define exactly one competitor.

See this page for a list of all currently available rule based vendor classes.

Both the vendors (the RL-agent as well as the competitors) and the marketplace need to be defined in the environment_config_training.json file.

In addition, a market_config.json needs to be supplied, containing the parameters configuring the marketplace, as well as a rl_config.json, containing the parameters specific to the RL-agent.

Exampleprinter

CLI command: recommerce -c exampleprinter

For more information about the exampleprinter task, also see the exampleprinter example.

Description

The exampleprinter is a tool meant for quickly evaluating a market scenario in-depth. When run, each action taken by the monitored agents is being recorded, in addition to market states and events such as the number of customers arriving and the amount of products thrown away. At the end of this quick simulation, an animated overview diagram is created, which shows all actions and their consequences for each step in the simulation. The diagram below shows the 17th step of one such diagram.

Explanation

Before starting the exampleprinter session, you need to decide on a marketplace and agent(s) combination, similar to the training task. The only difference here is that multiple RL-agents can be chosen to be monitored using the exampleprinter. These agents must however be trained already, and you need to supply the filenames of their trained models in the argument field of the respective agent in the environment_config_exampleprinter.json file. The trained models need to be placed in a folder called data in the datapath.

Agent-monitoring

CLI command: recommerce -c agent_monitoring

For more information about the agent-monitoring task, also see the agent-monitoring example.

Description

During the agent-monitoring task, the user can define a marketplace and a set of vendors that should compete against each other for a customizable amount of episodes. Both pre-trained RL-agents and rule based vendors can be chosen to compete on the market. During the simulation, the tool records all actions and states, which it then uses to create a large number of diagrams that can be used to analyze the behaviour of the vendors.

Explanation

The same as for the other tasks, the agent-monitoring task requires a marketplace and a set of vendors to be defined. The vendors can be either pre-trained RL-agents or rule based vendors. You also need to define for how many episodes the monitoring session should run (note that more is not necessarily better), and whether or not the monitored agents should play against each other on the same marketplace, or independent from each other on identical marketplaces. This is defined by the separate_markets flag. All of these parameters must be defined in the environment_config_agent_monitoring.json file.

You must also supply a market_config.json file, containing the parameters for the marketplace.

Policyanalyzer

No CLI command available as of yet.

For more information about the policyanalyzer task, also see the policyanalyzer example.

Description

The policyanalyzer is our only tool which does not simulate a marketplace. Instead, the tool can be used to monitor a vendor’s reaction to different market states. The user can decide on up to two different features to give as an input, such as a competitor’s new and refurbished prices, and the policyanalyzer will feed all possible input combinations to the vendor and record its reactions. When initialising the policyanalyzer, the user defines a number of parameters: The vendor whose policy should be analysed, as well as the marketplace and the competitors that should be used, just as is done for all the other tools. Additionally, the user defines a template market state, a market state containing all values that are passed to the analysed agent, such as the number of items currently in circulation and the prices of competitors. Lastly, a list of analysed features needs to be provided, which defines one or two features of the template market state that should be varied. When the policyanalyzer is run, these features are inserted into the template market state, overwriting the initial values and creating a new combination. This new market state is then passed on to the policy-method of the analysed vendor and its reactions are recorded and visualised.

The policyanalyzer is the monitoring tool which operates on the smallest scale out of all the tools we built for our framework. It allows users to define any market state they want and to then accurately monitor a vendor’s reactions to changes to this specific state. While the tool can just as well be utilised to test new rule-based strategies, it is very much meant to be used as a way to understand RL-agents better, as their policies are not immediately visible to the user and must therefore be discovered through tools such as the one’s we built.