Note

You can download this example as a Jupyter notebook or try it out directly in Google Colab.

4. Reinforcement learning tutorial#

This tutorial will introduce users into ASSUME and its ways of using reinforcement learning (RL). The main objective of this tutorial is to ensure participants grasp the steps required to equip a new unit with RL strategies or modify the action dimensions. Our emphasis lies in the bidding strategy, with less focus on the algorithm and role. The latter are usable as a plug-and-play solution in the framework. The following coding tasks will highlight the key aspects to be adjusted, as already outlined in the learning_strategies.py file.

The outline of this tutorial is as follows. We will start with a basic summary of the implementation of reinforcement learning (RL) in ASSUME and its architecture (1. ASSUME & Learning Basics) . If you need a refresher on RL in general, please visit our readthedocs (https://ASSUME.readthedocs.io/en/latest/). Afterwards, we install ASSUME in this Google Colab (2. Get ASSUME running) and then we dive into the learning_strategies.py file and explain how we need to adjust conventional bidding strategies. to incorporate RL (3. Make ASSUME learn).

As a whole, this tutorial covers the following coding tasks:

  1. How to define a step function in the ASSUME framework.

  2. How do we get observations from the simulation framework.

  3. How do we define actions based on the output of the actor neural network considering necessary exploration?

  4. How do we define the reward?

1. ASSUME & Learning Basics#

ASSUME in general is intended for researchers, planners, utilities and everyone searching to understand market dynamics of energy markets. It provides an easy-to-use tool-box as a free software that can be tailored to the specific use case of the user.

In the following figure the architecture of the framework is depicted. It can be roughly devided into two parts. On the left side of the world class the markets are located and on the right side the market participants, which are here named units. Both world are connected via the orders that market participants place on the markets. The learning capability is sketched out with the yellow classes on the right side, namely the units side.

[architecture.svg]

Let’s focus on the bright yellow part of the architecture, namely the learning algorithm, the actor and the critic. We start with some reinforcement learning backround. In the current implementation of ASSUME, we model the electricity market as a partially observable Markov game, which is an extension of MDPs for multi-agent setups.

Multi-agent DRL is understood as the simultaneous learning of multiple agents interacting in the same environment. The Markov game for \(N\) agents consists of a set of states \(S\), a set of actions \(A_1, ..., A_N\), a set of observations \(O_1, ..., O_N\), and a state transition function \(P: S \times A_1 \times ... \times A_N \rightarrow \mathcal{P}(S)\) dependent on the state and actions of all agents. After taking action \(a_i \in A_i\) in state \(s_i \in S\) according to a policy \(\pi_i:O_i\rightarrow A_i\), every agent \(i\) is transitioned into the new state \(s'_i \in S\). Each agent receives a reward \(r_i\) according to the individual reward function \(R_i\) and a private observation correlated with the state \(o_i:S \rightarrow O_i\). Like MDP, each agent \(i\) learns an optimal policy \(\pi_i^*(s)\) that maximizes its expected reward.

To enable multi-agent learning some adjustments are needed within the learning algorithm to get from the TD3 to an MATD3 algorithm. Other authors used similar tweaks to improve the TD3 into the MADDPG algorithm and derive the MA-TD3 algorithm. We’ll start explaining the learning by focusing on a single agent and then extend it to multi-agent learning.

Single-Agent Learning#

We use the actor-critic approach to train the learning agent. The actor-critic approach is a popular RL algorithm that uses two neural networks: an actor network and a critic network. The actor network is responsible for selecting actions, while the critic network evaluates the quality of the actions taken by the actor.

The actor and critic networks are trained simultaneously using the actor-critic algorithm, which updates the weights of both networks at each time step. The actor-critic algorithm is a form of policy iteration, where the policy is updated based on the estimated value function, and the value function is updated based on the.

Actor The actor network is trained using the policy gradient method, which updates the weights of the actor network in the direction of the gradient of the expected reward with respect to the network parameters:

\(\nabla_{\theta} J(\theta) = E[\nabla_{\theta} log \pi_{\theta}(a_t|s_t) * Q^{\pi}(s_t, a_t)]\)

where \(J(\theta)\) is the expected reward, \(\theta\) are the weights of the actor network, \(\pi_{\theta}(a_t|s_t)\) is the probability of selecting action a_t given state \(s_t\), and \(Q^{\pi}(s_t, a_t)\) is the expected reward of taking action \(a_t\) in state \(s_t\) under policy \(\pi\).

Critic The critic network is trained using the temporal difference (TD) learning method, which updates the weights of the critic network based on the difference between the estimated value of the current state and the estimated value of the next state:

\(\delta_t = r_t + \gamma * V(s_{t+1}) - V(s_t)\)

where \(\delta_t\) is the TD error, \(r_t\) is the reward obtained at time step \(t\), \(\gamma\) is the discount factor, \(V(s_t)\) is the estimated value of state \(s_t\), and \(V(s_{t+1})\) is the estimated value of the next state \(s_{t+1}\).

The weights of the critic network are updated in the direction of the gradient of the mean squared TD error:

\(\nabla_{\theta} L = E[(\delta_t)^2]\)

where L is the loss function.

Multi-Agent Learning#

While in a single-agent setup, the state transition and respective reward depend only on the actions of a single agent, the state transitions and rewards depend on the actions of all learning agents in a multi-agent setup. This makes the environment non-stationary for a single agent, which violates the Markov property. Hence, the convergence guarantees of single-agent RL algorithms are no longer valid. Therefore, we utilize the framework of centralized training and decentralized execution and expand upon the MADDPG algorithm. The main idea of this approach is to use a centralized critic during the training phase, which has access to the entire state \(\textbf{S}\), and all actions \(a_1, ..., a_N\), thus resolving the issue of non-stationarity, as changes in state transitions and rewards can be explained by the actions of other agents. Meanwhile, during both training and execution, the actor has access only to its local observations \(o_i\) derived from the entire state \(\textbf{S}\).

For each agent \(i\), we train two centralized critics \(Q_{i,θ_1,2}(S, a_1, ..., a_N)\) together with two target critic networks. Similar to TD3, the smaller value of the two critics and target action noise \(a_i\),\(k~\) is used to calculate the target \(y_i,k\):

\(y_i,k = r_i,k + γ * min_j=1,2 Q_i,θ′_j(S′_k, a_1,k, ..., a_N,k, π′(o_i,k))\)

where \(r_i,k\) is the reward obtained by agent \(i\) at time step \(k\), \(γ\) is the discount factor, \(S′_k\) is the next state of the environment, and \(π′(o_i,k)\) is the target policy of agent \(i\).

The critics are trained using the mean squared Bellman error (MSBE) loss:

\(L(Q_i,θ_j) = E[(y_i,k - Q_i,θ_j(S_k, a_1,k, ..., a_N,k))^2]\)

The actor policy of each agent is updated using the deterministic policy gradient (DPG) algorithm:

\(∇_a Q_i,θ_j(S_k, a_1,k, ..., a_N,k, π(o_i,k))|a_i,k=π(o_i,k) * ∇_θ π(o_i,k)\)

The actor is updated similarly using only one critic network \(Q_{θ1}\). These changes to the original DDPG algorithm allow increased stability and convergence of the TD3 algorithm. This is especially relevant when approaching a multi-agent RL setup, as discussed in the following section.

2. Get ASSUME running#

Here we just install the ASSUME core package via pip. In general the instructions for an installation can be found here: https://ASSUME.readthedocs.io/en/latest/installation.html. All the required steps are executed here and since we are working in colab the generation of a venv is not necessary.

As we will be working with learning agents, we need to install ASSUME with all learning dependencies such as torch. For this, we use the [learning] attribute.

You don’t need to execute the following code cell if you already have the ASSUME framework installed including learning dependencies

[1]:
!pip install 'assume-framework[learning]'
Requirement already satisfied: assume-framework[learning] in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (0.4.1)
Requirement already satisfied: argcomplete>=3.1.4 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (3.5.1)
Requirement already satisfied: nest-asyncio>=1.5.6 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (1.6.0)
Requirement already satisfied: mango-agents-assume>=1.1.4-6 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (1.1.4.post10)
Requirement already satisfied: numpy<2 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (1.26.4)
Requirement already satisfied: tqdm>=4.64.1 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (4.66.5)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (2.9.0)
Requirement already satisfied: sqlalchemy>=2.0.9 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (2.0.35)
Requirement already satisfied: pandas>=2.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (2.2.3)
Requirement already satisfied: psycopg2-binary>=2.9.5 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (2.9.9)
Requirement already satisfied: pyyaml>=6.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (6.0.2)
Requirement already satisfied: pyyaml-include>=2.2a in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (2.2a1)
Requirement already satisfied: torch>=2.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from assume-framework[learning]) (2.4.1.post100)
Requirement already satisfied: dill>=0.3.8 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from mango-agents-assume>=1.1.4-6->assume-framework[learning]) (0.3.9)
Requirement already satisfied: msgspec>=0.18.6 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from mango-agents-assume>=1.1.4-6->assume-framework[learning]) (0.18.6)
Requirement already satisfied: paho-mqtt>=2.1.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from mango-agents-assume>=1.1.4-6->assume-framework[learning]) (2.1.0)
Requirement already satisfied: protobuf==5.27.2 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from mango-agents-assume>=1.1.4-6->assume-framework[learning]) (5.27.2)
Requirement already satisfied: pytz>=2020.1 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from pandas>=2.0.0->assume-framework[learning]) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from pandas>=2.0.0->assume-framework[learning]) (2024.2)
Requirement already satisfied: six>=1.5 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from python-dateutil>=2.8.2->assume-framework[learning]) (1.16.0)
Requirement already satisfied: fsspec>=2021.04.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from pyyaml-include>=2.2a->assume-framework[learning]) (2024.9.0)
Requirement already satisfied: typing-extensions>=4.6.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from sqlalchemy>=2.0.9->assume-framework[learning]) (4.12.2)
Requirement already satisfied: greenlet!=0.4.17 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from sqlalchemy>=2.0.9->assume-framework[learning]) (3.1.1)
Requirement already satisfied: filelock in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from torch>=2.0.1->assume-framework[learning]) (3.16.1)
Requirement already satisfied: sympy in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from torch>=2.0.1->assume-framework[learning]) (1.13.3)
Requirement already satisfied: networkx in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from torch>=2.0.1->assume-framework[learning]) (3.3)
Requirement already satisfied: jinja2 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from torch>=2.0.1->assume-framework[learning]) (3.1.4)
Requirement already satisfied: setuptools in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from torch>=2.0.1->assume-framework[learning]) (75.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from jinja2->torch>=2.0.1->assume-framework[learning]) (3.0.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from sympy->torch>=2.0.1->assume-framework[learning]) (1.3.0)

And easy like this we have ASSUME installed. Now we can let it run. Please note though that we cannot use the functionalities tied to docker and, hence, cannot access the predefined dashboards in colab. For this please install docker and ASSUME on your personal machine.

Further we would like to access the predefined scenarios in ASSUME which are stored on the git repository. Hence, we clone the repository.

You don’t need to execute the following code cell if you already have the ASSUME repository cloned.

[2]:
!git clone --depth=1 https://github.com/assume-framework/assume.git assume-repo
Cloning into 'assume-repo'...
remote: Enumerating objects: 299, done.
remote: Counting objects: 100% (299/299), done.
remote: Compressing objects: 100% (267/267), done.
remote: Total 299 (delta 71), reused 134 (delta 26), pack-reused 0 (from 0)
Receiving objects: 100% (299/299), 8.85 MiB | 10.26 MiB/s, done.
Resolving deltas: 100% (71/71), done.

Let the magic happen. Now you can run your first ever simulation in ASSUME. The following code navigates to the respective ASSUME folder and starts the simulation example example_01b using the local database here in colab.

When running locally, you can also just run ASSUME -s example_01b -db "sqlite:///./examples/local_db/ASSUME_db_example_01b.db" in a shell

[3]:
!cd assume-repo && assume -s example_01b -db "sqlite:///./examples/local_db/assume_db_example_01b.db"
INFO:assume.world:connected to db
INFO:assume.scenario.loader_csv:Starting Scenario example_01b/ from examples/inputs
INFO:assume.scenario.loader_csv:storage_units not found. Returning None
INFO:assume.scenario.loader_csv:industrial_dsm_units not found. Returning None
INFO:assume.scenario.loader_csv:forecasts_df not found. Returning None
INFO:assume.scenario.loader_csv:Downsampling demand_df successful.
INFO:assume.scenario.loader_csv:cross_border_flows not found. Returning None
INFO:assume.scenario.loader_csv:Downsampling availability_df successful.
INFO:assume.scenario.loader_csv:electricity_prices not found. Returning None
INFO:assume.scenario.loader_csv:price_forecasts not found. Returning None
INFO:assume.scenario.loader_csv:temperature not found. Returning None
INFO:assume.scenario.loader_csv:Adding markets
INFO:assume.scenario.loader_csv:Read units from file
INFO:assume.scenario.loader_csv:Adding power_plant units
INFO:assume.scenario.loader_csv:Adding demand units
INFO:assume.scenario.loader_csv:Adding unit operators and units
example_01b_base 2019-02-01 00:00:00: : 2678401.0it [02:00, 22151.72it/s]

Select input files path:

We also need to differentiate between the input file paths when using this tutorial in Google Colab and a local environment. The code snippets will include both options for your convenience.

[4]:
import importlib.util

# Check if 'google.colab' is available
IN_COLAB = importlib.util.find_spec("google.colab") is not None

colab_inputs_path = "assume-repo/examples/inputs"
local_inputs_path = "../inputs"

inputs_path = colab_inputs_path if IN_COLAB else local_inputs_path

3. Make your agents learn#

Now it is time to get your hands dirty and actually dive into coding in ASSUME. The main objective of this session is to ensure participants grasp the steps required to equip a new unit with RL strategies or modify the action dimensions. Our emphasis lies in the bidding strategy, with less focus on the algorithm and role. Coding tasks will highlight the key aspects to be a djusted, as already outlined in the learning_strategies.py file. Subsequent sections will present the tasks and provide the correct answers for the coding exercises.

We start by initializing the class of our Learning Strategy. This is very cloesly related to the general strucutre of a bidding strategy.

But first some imports:

[5]:
import logging
import os
from datetime import datetime, timedelta
from pathlib import Path

import numpy as np
import pandas as pd
import torch as th

from assume import World
from assume.common.base import LearningStrategy, SupportsMinMax
from assume.common.market_objects import MarketConfig, Orderbook, Product
from assume.reinforcement_learning.algorithms import actor_architecture_aliases
from assume.reinforcement_learning.learning_utils import NormalActionNoise
from assume.scenario.loader_csv import load_scenario_folder, run_learning
[6]:
class RLStrategy(LearningStrategy):
    """
    Reinforcement Learning Strategy
    """

    def __init__(self, *args, **kwargs):
        super().__init__(obs_dim=50, act_dim=2, unique_obs_dim=2, *args, **kwargs)

        self.unit_id = kwargs["unit_id"]

        # defines bounds of actions space
        self.max_bid_price = kwargs.get("max_bid_price", 100)
        self.max_demand = kwargs.get("max_demand", 10e3)

        # tells us whether we are training the agents or just executing per-learnind stategies
        self.learning_mode = kwargs.get("learning_mode", False)
        self.perform_evaluation = kwargs.get("perform_evaluation", False)

        # based on learning config define algorithm configuration
        self.algorithm = kwargs.get("algorithm", "matd3")
        actor_architecture = kwargs.get("actor_architecture", "mlp")

        # define the architecture of the actor neural network
        # if you use many time series niputs you might wantto use the LSTM instead of teh MLP for example
        if actor_architecture in actor_architecture_aliases.keys():
            self.actor_architecture_class = actor_architecture_aliases[
                actor_architecture
            ]
        else:
            raise ValueError(
                f"Policy '{actor_architecture}' unknown. Supported architectures are {list(actor_architecture_aliases.keys())}"
            )

        # sets the devide of the actor network
        device = kwargs.get("device", "cpu")
        self.device = th.device(device if th.cuda.is_available() else "cpu")
        if not self.learning_mode:
            self.device = th.device("cpu")

        # future: add option to choose between float16 and float32
        # float_type = kwargs.get("float_type", "float32")
        self.float_type = th.float

        # for definition of observation space
        self.foresight = kwargs.get("foresight", 24)

        if self.learning_mode:
            self.learning_role = None
            self.collect_initial_experience_mode = kwargs.get(
                "episodes_collecting_initial_experience", True
            )

            self.action_noise = NormalActionNoise(
                mu=0.0,
                sigma=kwargs.get("noise_sigma", 0.1),
                action_dimension=self.act_dim,
                scale=kwargs.get("noise_scale", 1.0),
                dt=kwargs.get("noise_dt", 1.0),
            )

        elif Path(load_path=kwargs["trained_policies_save_path"]).is_dir():
            self.load_actor_params(load_path=kwargs["trained_policies_save_path"])

3.1 The “Step Function”#

The key function in an RL problem is the step that is taken in the so called environment. It consist the following parts:

  1. Get an observation

  2. Choose an action

  3. Get a reward

  4. Update your policy

In ASSUME we do not have such a straight forward step function. The steps 1 & 2 are combined in the calculate_bids() function which is called as soon as an offer on the market is placed. The step 3, however, can only happen after we get the market feedback from the simulation run and, hence, is in the calculate_reward() function. Step 4 is solely handeled by the learning_role as it shedules the policy update manages the buffer and what not. Hence, it is actually not included in this notebook, since we only focus on transforming the bidding strategy into a learning one.

Step 1-3 will be implemented in the following sections 3.2 to 3.4. If there is a coding task for you it will be marked accordingly.

[7]:
# we define the class again and inherit from the initial class just to add the additional method to the original class
# this is a workaround to have different methods of the class in different cells
# which is good for the purpose of this tutorial
# however, you should have all functions in a single class when using this example in .py files


class RLStrategy(RLStrategy):
    def calculate_bids(
        self,
        unit: SupportsMinMax,
        market_config: MarketConfig,
        product_tuples: list[Product],
        **kwargs,
    ) -> Orderbook:
        """
        Calculate bids for a unit -> STEP 1 & 2
        """

        start = product_tuples[0][0]
        end = product_tuples[0][1]
        # get technical bounds for the unit output from the unit
        min_power, max_power = unit.calculate_min_max_power(start, end)
        min_power = min_power[start]
        max_power = max_power[start]

        # =============================================================================
        # 1. Get the Observations, which are the basis of the action decision
        # =============================================================================
        next_observation = self.create_observation(
            unit=unit,
            market_id=market_config.market_id,
            start=start,
            end=end,
        )

        # =============================================================================
        # 2. Get the Actions, based on the observations
        # =============================================================================
        actions, noise = self.get_actions(next_observation)

        bids = actions

        bids = self.remove_empty_bids(bids)

        return bids
[8]:
# we define the class again and inherit from the initial class just to add the additional method to the original class
# this is a workaround to have different methods of the class in different cells
# which is good for the purpose of this tutorial
# however, you should have all functions in a single class when using this example in .py files


class RLStrategy(RLStrategy):
    def calculate_reward(
        self,
        unit,
        marketconfig: MarketConfig,
        orderbook: Orderbook,
    ):
        """
        Calculate reward
        """

        return None

3.2 Get an observation#

The decision about the observations received by each agent plays a crucial role when designing a multi-agent RL setup. The following describes the task of learning agents representing profit-maximizing electricity market participants who either sell a generating unit’s output or optimize a storage unit’s operation. They are represented through their plants’ techno-economic parameters, such as minimal operational capacity \(P^{min}\), start-up \(c^{su}\), and shut-down \(c^{sd}\) costs. This information is all know by the unit istself and, hence, also accessible in the bidding strategy.

During the training phase, the centralized critic receives observations from all agents, resulting in an input size that grows linearly with the number of agents. This can lead to unstable training behavior of the critic networks, which limits the maximal number of agents in the simulation. This effect is known as the dimensionality curse, which likely contributed to the small number of learning agents in existing approaches. To address the dimensionality curse, we use a single observation that is the same for all agents and added a small size of unique observations for each agent to improve their performance. This modification allows the use of only one observation in the centralized critic, decoupled from the number of learning agents, significantly reducing the observation size and enabling simultaneous training of hundreds of learning agents with stable training behavior. The only limiting factor is the available working memory.

At time-step \(t\), agent \(i\) receives the observation \(o_{i,t}\) consisting of vectors \([L_{\mathrm{h},t}, L_{\mathrm{f},t}, M_{\mathrm{h},t}, M_{\mathrm{f},t}, mc_{i,t}]\). Here \(L_{\mathrm{h},t}, L_{\mathrm{f},t}\) and \(M_{\mathrm{h},t}, M_{\mathrm{f},t}\) are the past and the forecast residual loads and market prices, respectively. These information stems from the world, where a overall forecasting role generates them. The price forecast is calculated ahead of the simulation run using a simple merit order model based on the residual load forecast and the marginal cost of power plants. This part of the observation is the same for all agents. In addition, each agent receives its current marginal cost \(mc_{i,t}\). Information about the marginal cost is shared with a centralized critic during the training phase. Still, it is not shared with other agents during the execution phase. All the inputs are normalized to improve the performance of the training process.

Task 1#

Goal: With the help of the unit, the starttime and the endtime we want to create the Observations for the unit.

There are 4 different observations: - residual load forecast - price forecast - total capacity of the unit - marginal costs of the unit

For all observations we need scaling factors. Why do you think it is important to scale the input? How would you define the scaling factors?

[9]:
# we define the class again and inherit from the initial class just to add the additional method to the original class
# this is a workaround to have different methods of the class in different cells
# which is good for the purpose of this tutorial
# however, you should have all functions in a single class when using this example in .py files


class RLStrategy(RLStrategy):
    def create_observation(
        self,
        unit: SupportsMinMax,
        market_id: str,
        start: datetime,
        end: datetime,
    ):
        """
        Create observation
        """

        end_excl = end - unit.index.freq

        # get the forecast length depending on the tme unit considered in the modelled unit
        forecast_len = pd.Timedelta((self.foresight - 1) * unit.index.freq)

        # =============================================================================
        # 1.1 Get the Observations, which are the basis of the action decision
        # =============================================================================
        # residual load forecast
        # residual load forecast
        scaling_factor_res_load = self.max_demand

        # price forecast
        scaling_factor_price = self.max_bid_price

        # total capacity
        scaling_factor_total_capacity = unit.max_power

        # marginal cost
        scaling_factor_marginal_cost = self.max_bid_price

        # checks if we are at end of simulation horizon, since we need to change the forecast then
        # for residual load and price forecast and scale them
        if (
            end_excl + forecast_len
            > unit.forecaster[f"residual_load_{market_id}"].index[-1]
        ):
            scaled_res_load_forecast = (
                unit.forecaster[f"residual_load_{market_id}"].loc[start:].values
                / scaling_factor_res_load
            )
            scaled_res_load_forecast = np.concatenate(
                [
                    scaled_res_load_forecast,
                    unit.forecaster[f"residual_load_{market_id}"].iloc[
                        : self.foresight - len(scaled_res_load_forecast)
                    ],
                ]
            )

        else:
            scaled_res_load_forecast = (
                unit.forecaster[f"residual_load_{market_id}"]
                .loc[start : end_excl + forecast_len]
                .values
                / scaling_factor_res_load
            )

        if end_excl + forecast_len > unit.forecaster[f"price_{market_id}"].index[-1]:
            scaled_price_forecast = (
                unit.forecaster[f"price_{market_id}"].loc[start:].values
                / scaling_factor_price
            )
            scaled_price_forecast = np.concatenate(
                [
                    scaled_price_forecast,
                    unit.forecaster[f"price_{market_id}"].iloc[
                        : self.foresight - len(scaled_price_forecast)
                    ],
                ]
            )

        else:
            scaled_price_forecast = (
                unit.forecaster[f"price_{market_id}"]
                .loc[start : end_excl + forecast_len]
                .values
                / scaling_factor_price
            )

        # get last accapted bid volume and the current marginal costs of the unit
        current_volume = unit.get_output_before(start)
        current_costs = unit.calc_marginal_cost_with_partial_eff(current_volume, start)

        # scale unit outpus
        scaled_total_capacity = current_volume / scaling_factor_total_capacity
        scaled_marginal_cost = current_costs / scaling_factor_marginal_cost

        # concat all obsverations into one array
        observation = np.concatenate(
            [
                scaled_res_load_forecast,
                scaled_price_forecast,
                np.array([scaled_total_capacity, scaled_marginal_cost]),
            ]
        )

        # transfer arry to GPU for NN processing
        observation = (
            th.tensor(observation, dtype=self.float_type)
            .to(self.device, non_blocking=True)
            .view(-1)
        )

        return observation.detach().clone()

Solution 1#

First why do we scale?

Scaling observations is a crucial preprocessing step in machine learning, including reinforcement learning. It involves transforming the features so that they all fall within a similar numerical range. This is important for several reasons. Firstly, it aids in numerical stability during training. Large input values can lead to numerical precision issues, potentially causing the algorithm to perform poorly or even fail to converge. By scaling the features, we mitigate this risk, ensuring a more stable and reliable learning process.

Additionally, scaling promotes uniformity in the learning process. Many optimization algorithms, like gradient descent, adjust model parameters based on the magnitude of gradients. When features have vastly different scales, some may dominate the learning process, while others receive less attention. This imbalance can hinder convergence and result in a suboptimal model. Scaling addresses this issue, allowing the algorithm to treat all features equally and progress more efficiently towards an optimal solution. This not only expedites the learning process but also enhances the model’s ability to generalize to new, unseen data. In essence, scaling observations is a fundamental practice that enhances the performance and robustness of machine learning models across a wide array of applications.

According to this the scaling should ensure a similar range for all input parameteres. You can achieve that by chosing the following scaling factors. If you add new observations, choose your scaling factors wisely.

[10]:
"""
#scaling factors for all observations
#residual load forecast
scaling_factor_res_load = self.max_demand

# price forecast
scaling_factor_price = self.max_bid_price

# total capacity
scaling_factor_total_capacity = unit.max_power

# marginal cost
scaling_factor_marginal_cost = self.max_bid_price
"""
[10]:
'\n#scaling factors for all observations\n#residual load forecast\nscaling_factor_res_load = self.max_demand\n\n# price forecast\nscaling_factor_price = self.max_bid_price\n\n# total capacity\nscaling_factor_total_capacity = unit.max_power\n\n# marginal cost\nscaling_factor_marginal_cost = self.max_bid_price\n'

3.3 Choose an action#

To differentiate between the inflexible and flexible parts of a plant’s generation capacity, we split the bids into two parts. The first bid part allows agents to bid a very low or even negative price for the inflexible capacity; this reflects the agent’s motivation to stay infra-marginal during periods of very low net load (e.g., in periods of high solar and wind power generation) to avoid the cost of a shut-down and subsequent start-up of the plant. The flexible part of the capacity can be offered at a higher price to provide chances for higher profits. The actions of agent \(i\) at time-step \(t\) are defined as \(a_{i,t} = [ep^\mathrm{inflex}_{i,t}, ep^\mathrm{flex}_{i,t}] \in [ep^{min},ep^{max}]\), where \(ep^\mathrm{inflex}_{i,t}\) and \(ep^\mathrm{flex}_{i,t}\) are bid prices for the inflexible and flexible capacities, and \(ep^{min},ep^{max}\) are minimal and maximal bid prices, respectively.

How do we learn, how to make good decisions? Basically by try and error, also know as exploration. Exploration is a fundamental concept in reinforcement learning, representing the strategy by which an agent interacts with its environment to gather information about the consequences of its actions. This is crucial because without exploration, the agent might settle for suboptimal policies based on its initial knowledge, limiting its ability to discover more rewarding states or actions.

In the initial stages of training, also often called initial exploration, it’s imperative to employ almost random actions. This means having the agent take actions purely by chance. This seemingly counterintuitive approach serves a critical purpose. Initially, the agent lacks any meaningful information about the environment, making it impossible to make informed decisions. By taking random actions, it can quickly gather a broad range of experiences, allowing it to grasp the fundamental structure of the environment. These random actions serve as a kind of “baseline exploration,” providing a starting point from which the agent can refine its policy through learning. With our domain knowledge we can even guide the initial exploration process, to enhance learning capabilities.

Following up on these concepts the following tasks will: 1. obtain the action values from the neurnal net in the bidding staretgy and 2. then transform theses values into the actual bids of an order.

Task 2.1#

Goal: With the observations and noise we generate actions

In the following task we define the actions for the initial exploration mode. As described before we can guide it by not letting it choose random actions but defining a base-bid on which we add a good amount of noise. In this way the initial strategy starts from a solution that we know works somewhat well. Define the respective base bid in the followin code. Remeber we are defining bids for a conventional power plant bidding in an Energy-Only-Market with a uniform pricing auction.

[11]:
# we define the class again and inherit from the initial class just to add the additional method to the original class
# this is a workaround to have different methods of the class in different cells
# which is good for the purpose of this tutorial
# however, you should have all functions in a single class when using this example in .py files


class RLStrategy(RLStrategy):
    def get_actions(self, next_observation):
        """
        Get actions
        """

        # distinction whetere we are in learning mode or not to handle exploration realised with noise
        if self.learning_mode:
            # if we are in learning mode the first x episodes we want to explore the entire action space
            # to get a good initial experience, in the area around the costs of the agent
            if self.collect_initial_experience_mode:
                # define current action as soley noise
                noise = (
                    th.normal(
                        mean=0.0, std=0.2, size=(1, self.act_dim), dtype=self.float_type
                    )
                    .to(self.device)
                    .squeeze()
                )

                # =============================================================================
                # 2.1 Get Actions and handle exploration
                # =============================================================================
                # ==> YOUR CODE HERE
                base_bid = next_observation[-1]  # = marginal_costs

                # add niose to the last dimension of the observation
                # needs to be adjusted if observation space is changed, because only makes sense
                # if the last dimension of the observation space are the marginal cost
                curr_action = noise + base_bid.clone().detach()

            else:
                # if we are not in the initial exploration phase we chose the action with the actor neuronal net
                # and add noise to the action
                curr_action = self.actor(next_observation).detach()
                noise = th.tensor(
                    self.action_noise.noise(), device=self.device, dtype=self.float_type
                )
                curr_action += noise
        else:
            # if we are not in learning mode we just use the actor neuronal net to get the action without adding noise

            curr_action = self.actor(next_observation).detach()
            noise = tuple(0 for _ in range(self.act_dim))

        curr_action = curr_action.clamp(-1, 1)

        return curr_action, noise

Solution 2.1#

So how do we define the base bid?

Assuming the described auction is a efficient market with full information and competition, we know that bidding the marginal costs of the power plant is the economically best bid. With the RL strategy we can recreate the abuse of market power and incomplete information, which enables us to model different market settings. Yet, starting of with the theoretically styleized optimal solution guides our RL agents porperly. As the marginal costs of the power plant are part of the oberservations we can define the base bid in the following way.

[12]:
"""
#base_bid = marginal costs
base_bid = next_observation[-1] # = marginal_costs
"""
[12]:
'\n#base_bid = marginal costs\nbase_bid = next_observation[-1] # = marginal_costs\n'

Task 2.2#

Goal: Define the actual bids with the outputs of the actors

Similarly to every other output of a neuronal network, the actions are in the range of 0-1. These values need to be translated into the actual bids \(a_{i,t} = [ep^\mathrm{inflex}_{i,t}, ep^\mathrm{flex}_{i,t}] \in [ep^{min},ep^{max}]\). This can be done in a way that further helps the RL agent to learn, if we put some thought into.

For this we go back into the calculate_bids() function and instead of just defining bids=actions, which was just a place holder, we actually make them into bids. Think about a smart way to transform them and fill the gaps in the following code. Remember:

  • bid_quantity_inflex represent the inflexible part of the bid. This represents the minimum run capacity of the unit.

  • bid_quantity_flex represent the flexible part of the bid. This represents the flexible capacity of the unit.

[13]:
# we define the class again and inherit from the initial class just to add the additional method to the original class
# this is a workaround to have different methods of the class in different cells
# which is good for the purpose of this tutorial
# however, you should have all functions in a single class when using this example in .py files


class RLStrategy(RLStrategy):
    def calculate_bids(
        self,
        unit: SupportsMinMax,
        market_config: MarketConfig,
        product_tuples: list[Product],
        **kwargs,
    ) -> Orderbook:
        """
        Calculate bids for a unit
        """

        bid_quantity_inflex, bid_price_inflex = 0, 0
        bid_quantity_flex, bid_price_flex = 0, 0

        start = product_tuples[0][0]
        end = product_tuples[0][1]
        # get technical bounds for the unit output from the unit
        min_power, max_power = unit.calculate_min_max_power(start, end)
        min_power = min_power[start]
        max_power = max_power[start]

        # =============================================================================
        # 1. Get the Observations, which are the basis of the action decision
        # =============================================================================
        next_observation = self.create_observation(
            unit=unit,
            market_id=market_config.market_id,
            start=start,
            end=end,
        )

        # =============================================================================
        # 2. Get the Actions, based on the observations
        # =============================================================================
        actions, noise = self.get_actions(next_observation)

        bids = actions

        # =============================================================================
        # 3.2 Transform Actions into bids
        # =============================================================================
        # ==> YOUR CODE HERE
        # actions are in the range [0,1], we need to transform them into actual bids
        # we can use our domain knowledge to guide the bid formulation
        bid_prices = actions * self.max_bid_price

        # calculate actual bids
        # rescale actions to actual prices
        bid_prices = actions * self.max_bid_price

        # calculate inflexible part of the bid
        bid_quantity_inflex = min_power
        bid_price_inflex = min(bid_prices)

        # calculate flexible part of the bid
        bid_quantity_flex = max_power - bid_quantity_inflex
        bid_price_flex = max(bid_prices)

        # actually formulate bids in orderbook format
        bids = [
            {
                "start_time": start,
                "end_time": end,
                "only_hours": None,
                "price": bid_price_inflex,
                "volume": bid_quantity_inflex,
            },
            {
                "start_time": start,
                "end_time": end,
                "only_hours": None,
                "price": bid_price_flex,
                "volume": bid_quantity_flex,
            },
        ]

        # store results in unit outputs as lists to be written to the buffer for learning
        unit.outputs["rl_observations"].append(next_observation)
        unit.outputs["rl_actions"].append(actions)

        # store results in unit outputs as series to be written to the database by the unit operator
        unit.outputs["actions"][start] = actions
        unit.outputs["exploration_noise"][start] = noise

        bids = self.remove_empty_bids(bids)

        return bids

Solution 2.2#

So how do we define the actual bid from the action?

We have the bid price for the minimum power (inflex) and the rest of the power. As the power plant needs to run at minimal the minum power in order to offer generation in general, it makes sense to offer this generation at a lower price than the rest of the power. Hence, we can allocate the actions to the bid prices in the following way. In addition, the actions need to be rescaled of course.

[14]:
"""
#calculate actual bids
#rescale actions to actual prices
bid_prices = actions * self.max_bid_price

#calculate inflexible part of the bid
bid_quantity_inflex = min_power
bid_price_inflex = min(bid_prices)

#calculate flexible part of the bid
bid_quantity_flex = max_power - bid_quantity_inflex
bid_price_flex = max(bid_prices)
"""
[14]:
'\n#calculate actual bids\n#rescale actions to actual prices\nbid_prices = actions * self.max_bid_price\n\n#calculate inflexible part of the bid\nbid_quantity_inflex = min_power\nbid_price_inflex = min(bid_prices)\n\n#calculate flexible part of the bid\nbid_quantity_flex = max_power - bid_quantity_inflex\nbid_price_flex = max(bid_prices)\n'

3.4 Get a reward#

This step is done in the calculate_reward()-function, which is called after the market is cleared and we get the market feedback, so we can calculate the profit. In RL, the design of a reward function is as important as the choice of the correct algorithm. During the initial phase of the work, pure economic reward in the form of the agent’s profit was used. Typically, electricity market models consider only a single restart cost. Still, in the case of using RL, the split into shut-down and start-up costs allow the agents to better differentiate between these two events and learn a better policy.

\begin{equation} \pi_{i,t} = \begin{cases} P^\text{conf}_{i,t} (M_t - mc_{i,t}) dt - c^{su}_i & \text{if $P^\text{conf}_{i,t}$ $\geq P^{min}_i$} \\ & \text{and $P_{i,t-1}$ $= 0$} \\ P^\text{conf}_{i,t} (M_t - mc_{i,t}) dt & \text{if $P^\text{conf}_{i,t}$ $\geq P^{min}_i$} \\ & \text{and $P_{i,t-1}$ $\neq 0$} \\ - c^{sd}_i & \text{if $P^\text{conf}_{i,t}$ $\leq P^{min}_i$} \\ & \text{and $P_{i,t-1}$ $\neq 0$} \\ 0 & \text{otherwise} \\ \end{cases} \end{equation}

In this equation, the variables are: * \(P^\text{conf}\) the confirmed capacity on the market * \(P^{min}\) the minimal stable capacity * \(M\) the market clearing price * \(mc\) the marginal generation cost * \(dt\) the market time resolution * \(c^{su}, c^{sd}\) the start-up and shut-down costs, respectively

The profit-driven reward function was sufficient for a few agents, but the learning performance decreased significantly with more agents. Therefore, we add an additional regret term \(cm\).

Task 3#

Goal: Define the reward guiding the learning process of the agent.

As the reward plays such a crucial role in the learning think of ways how to integrate further signals exceeding the monetary profit. One example could be integrating a regret term, namely the opportunity costs. Your task is to define the rewrad using the opportunity costs and to scale it.

[15]:
# we define the class again and inherit from the initial class just to add the additional method to the original class
# this is a workaround to have different methods of the class in different cells
# which is good for the purpose of this tutorial
# however, you should have all functions in a single class when using this example in .py files


class RLStrategy(RLStrategy):
    def calculate_reward(
        self,
        unit,
        marketconfig: MarketConfig,
        orderbook: Orderbook,
    ):
        """
        Calculate reward
        """

        # =============================================================================
        # 3. Calculate Reward
        # =============================================================================
        # function is called after the market is cleared and we get the market feedback,
        # so we can calculate the profit

        product_type = marketconfig.product_type

        profit = 0
        reward = 0
        opportunity_cost = 0

        # iterate over all orders in the orderbook, to calculate order specific profit
        for order in orderbook:
            start = order["start_time"]
            end = order["end_time"]
            end_excl = end - unit.index.freq

            # depending on way the unit calaculates marginal costs we take costs
            if unit.marginal_cost is not None:
                marginal_cost = (
                    unit.marginal_cost[start]
                    if len(unit.marginal_cost) > 1
                    else unit.marginal_cost
                )
            else:
                marginal_cost = unit.calc_marginal_cost_with_partial_eff(
                    power_output=unit.outputs[product_type].loc[start:end_excl],
                    timestep=start,
                )

            duration = (end - start) / timedelta(hours=1)

            # calculate profit as income - running_cost from this event
            price_difference = order["accepted_price"] - marginal_cost
            order_profit = price_difference * order["accepted_volume"] * duration

            # calculate opportunity cost
            # as the loss of income we have because we are not running at full power
            order_opportunity_cost = (
                price_difference
                * (
                    unit.max_power - unit.outputs[product_type].loc[start:end_excl]
                ).sum()
                * duration
            )

            # if our opportunity costs are negative, we did not miss an opportunity to earn money and we set them to 0
            order_opportunity_cost = max(order_opportunity_cost, 0)

            # collect profit and opportunity cost for all orders
            opportunity_cost += order_opportunity_cost
            profit += order_profit

        # consideration of start-up costs, which are evenly divided between the
        # upward and downward regulation events
        if (
            unit.outputs[product_type].loc[start] != 0
            and unit.outputs[product_type].loc[start - unit.index.freq] == 0
        ):
            profit = profit - unit.hot_start_cost / 2
        elif (
            unit.outputs[product_type].loc[start] == 0
            and unit.outputs[product_type].loc[start - unit.index.freq] != 0
        ):
            profit = profit - unit.hot_start_cost / 2

        # =============================================================================
        # =============================================================================
        # ==> YOUR CODE HERE
        # The straight forward implemntation would be reward = profit, yet we would like to give the agent more guidance
        # in the learning process, so we add a regret term to the reward, which is the opportunity cost
        # define the reward and scale it

        scaling = 0.1 / unit.max_power
        regret_scale = 0.2
        reward = float(profit - regret_scale * opportunity_cost) * scaling

        # store results in unit outputs which are written to database by unit operator
        unit.outputs["profit"].loc[start:end_excl] += profit
        unit.outputs["reward"].loc[start:end_excl] = reward
        unit.outputs["regret"].loc[start:end_excl] = opportunity_cost

Solution 3#

So how do we define the actual reward?

We use the opportunity costs for further guidance, which quantify the expected contribution margin, as defined by the following equation, with \(P^{max}\) as the maximal available capacity.

\begin{equation} cm_{i,t} = \max[(P^{max}_i - P^\text{conf}_{i,t}) (M_t - mc_{i,t}) dt, 0] \end{equation}

The regret term gives a negative signal to the agent when there is opportunity cost due to the unsold capacity, thus correcting the agent’s actions. This term also introduces an increased influence of the competition between agents in learning. By minimizing the regret, the agents drive the bid prices closer to the marginal generation cost, which drives the market price down.

The reward of agent \(i\) at time-step \(t\) is defined by the equation below.

\begin{equation} R_{i,t} = \pi_{i,t} + \beta cm_{i,t} \end{equation}

Here, \(\beta\) is the regret scaling factor to adjust the ratio between profit-maximizing and regret-minimizing learning.

The described reward function has proven to perform well even with many agents and to accelerate learning convergence. This is because minimizing the regret term drives the overall system to equilibrium. At a point close to the equilibrium point, the average reward of all agents would converge to a constant value since further policy changes would not lead to an additional reduction in regrets or an increase in profits. Therefore, the average reward value can also be a good indicator of learning performance and convergence.

[16]:
"""
scaling = 0.1 / unit.max_power
regret_scale = 0.2
reward = float(profit - regret_scale * opportunity_cost) * scaling
"""
[16]:
'\nscaling = 0.1 / unit.max_power\nregret_scale = 0.2\nreward = float(profit - regret_scale * opportunity_cost) * scaling\n'

3.5 Start the simulation#

We are almost done with all the changes to actually be able to make ASSUME learn here in google colab. If you would rather like to load our pretrained strategies, we need a function for loading parameters, which can be found below.

[17]:
# we define the class again and inherit from the initial class just to add the additional method to the original class
# this is a workaround to have different methods of the class in different cells
# which is good for the purpose of this tutorial
# however, you should have all functions in a single class when using this example in .py files


class RLStrategy(RLStrategy):
    def load_actor_params(self, load_path):
        """
        Load actor parameters
        """
        directory = f"{load_path}/actors/actor_{self.unit_id}.pt"

        params = th.load(directory, map_location=self.device)

        self.actor = self.actor_architecture_class(
            obs_dim=self.obs_dim,
            act_dim=self.act_dim,
            float_type=self.float_type,
            unique_obs_dim=self.unique_obs_dim,
            num_timeseries_obs_dim=self.num_timeseries_obs_dim,
        ).to(self.device)

        self.actor.load_state_dict(params["actor"])

        if self.learning_mode:
            self.actor_target = self.actor_architecture_class(
                obs_dim=self.obs_dim,
                act_dim=self.act_dim,
                float_type=self.float_type,
                unique_obs_dim=self.unique_obs_dim,
                num_timeseries_obs_dim=self.num_timeseries_obs_dim,
            ).to(self.device)
            self.actor_target.load_state_dict(params["actor_target"])
            self.actor_target.eval()
            self.actor.optimizer.load_state_dict(params["actor_optimizer"])

To control the learning process, the config file determines the parameters of the learning algorithm. As we want to temper with these values in the notebook we will overwrite the learning config in the next cell and then load it into our world.

[18]:
learning_config = {
    "continue_learning": False,
    "trained_policies_save_path": "null",
    "max_bid_price": 100,
    "algorithm": "matd3",
    "learning_rate": 0.001,
    "training_episodes": 10,
    "episodes_collecting_initial_experience": 3,
    "train_freq": "24h",
    "gradient_steps": -1,
    "batch_size": 256,
    "gamma": 0.99,
    "device": "cpu",
    "noise_sigma": 0.1,
    "noise_scale": 1,
    "noise_dt": 1,
    "validation_episodes_interval": 5,
}

In order to let the simulation run with the integrated learning we need to touch up the main file that runs it in the following way.

In the following cell, we let the example run in case 1 of [1], where we have one big reinforcement learning power plan exists that technically can exert my power.

[1] Harder, N.; Qussous, R.; Weidlich, A. Fit for purpose: Modeling wholesale electricity markets realistically with multi-agent deep reinforcement learning. Energy and AI 2023. 14. 100295. https://doi.org/10.1016/j.egyai.2023.100295.

[19]:
log = logging.getLogger(__name__)

csv_path = "outputs"
os.makedirs("local_db", exist_ok=True)

if __name__ == "__main__":
    db_uri = "sqlite:///local_db/assume_db.db"

    scenario = "example_02a"
    study_case = "base"

    # create world
    world = World(database_uri=db_uri, export_csv_path=csv_path)

    # we import our defined bidding strategey class including the learning into the world bidding strategies
    # in the example files we provided the name of the learning bidding strategeis in the input csv is  "pp_learning"
    # hence we define this strategey to be one of the learning class
    world.bidding_strategies["pp_learning"] = RLStrategy

    # then we load the scenario specified above from the respective input files
    load_scenario_folder(
        world,
        inputs_path=inputs_path,
        scenario=scenario,
        study_case=study_case,
    )

    # run learning if learning mode is enabled
    # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )

    if world.learning_config.get("learning_mode", False):
        run_learning(
            world,
            inputs_path=inputs_path,
            scenario=scenario,
            study_case=study_case,
        )

    # after the learning is done we make a normal run of the simulation, which equals a test run
    world.run()
INFO:assume.world:connected to db
INFO:assume.scenario.loader_csv:Starting Scenario example_02a/base from ../inputs
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[19], line 21
     18 world.bidding_strategies["pp_learning"] = RLStrategy
     20 # then we load the scenario specified above from the respective input files
---> 21 load_scenario_folder(
     22     world,
     23     inputs_path=inputs_path,
     24     scenario=scenario,
     25     study_case=study_case,
     26 )
     28 # run learning if learning mode is enabled
     29 # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )
     31 if world.learning_config.get("learning_mode", False):

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/assume/scenario/loader_csv.py:768, in load_scenario_folder(world, inputs_path, scenario, study_case, perform_evaluation, terminate_learning, episode, eval_episode)
    729 """
    730 Load a scenario from a given path.
    731
   (...)
    764
    765 """
    766 logger.info(f"Starting Scenario {scenario}/{study_case} from {inputs_path}")
--> 768 scenario_data = load_config_and_create_forecaster(inputs_path, scenario, study_case)
    770 setup_world(
    771     world=world,
    772     scenario_data=scenario_data,
   (...)
    777     eval_episode=eval_episode,
    778 )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/assume/scenario/loader_csv.py:421, in load_config_and_create_forecaster(inputs_path, scenario, study_case)
    407 """
    408 Load the configuration and files for a given scenario and study case. This function
    409 allows us to load the files and config only once when running multiple iterations of the same scenario.
   (...)
    417     dict[str, object]:: A dictionary containing the configuration and loaded files for the scenario and study case.
    418 """
    420 path = f"{inputs_path}/{scenario}"
--> 421 with open(f"{path}/config.yaml") as f:
    422     config = yaml.safe_load(f)
    423 if not study_case:

FileNotFoundError: [Errno 2] No such file or directory: '../inputs/example_02a/config.yaml'

In comparison, the following cell executes example case 2 of [1] where the same capacity of the reinforcement power plant in case 1 is divided into five reinforcement learning power plants, which hence cannot exert market power anymore.

[20]:
log = logging.getLogger(__name__)

csv_path = "outputs"
os.makedirs("local_db", exist_ok=True)

if __name__ == "__main__":
    db_uri = "sqlite:///local_db/assume_db.db"

    scenario = "example_02b"
    study_case = "base"

    # create world
    world = World(database_uri=db_uri, export_csv_path=csv_path)

    # we import our defined bidding strategey class including the learning into the world bidding strategies
    # in the example files we provided the name of the learning bidding strategeis in the input csv is  "pp_learning"
    # hence we define this strategey to be one of the learning class
    world.bidding_strategies["pp_learning"] = RLStrategy

    # then we load the scenario specified above from the respective input files
    load_scenario_folder(
        world,
        inputs_path=inputs_path,
        scenario=scenario,
        study_case=study_case,
    )

    # run learning if learning mode is enabled
    # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )

    if world.learning_config.get("learning_mode", False):
        run_learning(
            world,
            inputs_path=inputs_path,
            scenario=scenario,
            study_case=study_case,
        )

    # after the learning is done we make a normal run of the simulation, which equals a test run
    world.run()
INFO:assume.world:connected to db
INFO:assume.scenario.loader_csv:Starting Scenario example_02b/base from ../inputs
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[20], line 21
     18 world.bidding_strategies["pp_learning"] = RLStrategy
     20 # then we load the scenario specified above from the respective input files
---> 21 load_scenario_folder(
     22     world,
     23     inputs_path=inputs_path,
     24     scenario=scenario,
     25     study_case=study_case,
     26 )
     28 # run learning if learning mode is enabled
     29 # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )
     31 if world.learning_config.get("learning_mode", False):

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/assume/scenario/loader_csv.py:768, in load_scenario_folder(world, inputs_path, scenario, study_case, perform_evaluation, terminate_learning, episode, eval_episode)
    729 """
    730 Load a scenario from a given path.
    731
   (...)
    764
    765 """
    766 logger.info(f"Starting Scenario {scenario}/{study_case} from {inputs_path}")
--> 768 scenario_data = load_config_and_create_forecaster(inputs_path, scenario, study_case)
    770 setup_world(
    771     world=world,
    772     scenario_data=scenario_data,
   (...)
    777     eval_episode=eval_episode,
    778 )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/assume/scenario/loader_csv.py:421, in load_config_and_create_forecaster(inputs_path, scenario, study_case)
    407 """
    408 Load the configuration and files for a given scenario and study case. This function
    409 allows us to load the files and config only once when running multiple iterations of the same scenario.
   (...)
    417     dict[str, object]:: A dictionary containing the configuration and loaded files for the scenario and study case.
    418 """
    420 path = f"{inputs_path}/{scenario}"
--> 421 with open(f"{path}/config.yaml") as f:
    422     config = yaml.safe_load(f)
    423 if not study_case:

FileNotFoundError: [Errno 2] No such file or directory: '../inputs/example_02b/config.yaml'

The following simulation represents case 3, respectively.

[21]:
log = logging.getLogger(__name__)

csv_path = "outputs"
os.makedirs("local_db", exist_ok=True)

if __name__ == "__main__":
    db_uri = "sqlite:///local_db/assume_db.db"

    scenario = "example_02c"
    study_case = "base"

    # create world
    world = World(database_uri=db_uri, export_csv_path=csv_path)

    # we import our defined bidding strategey class including the learning into the world bidding strategies
    # in the example files we provided the name of the learning bidding strategeis in the input csv is  "pp_learning"
    # hence we define this strategey to be one of the learning class
    world.bidding_strategies["pp_learning"] = RLStrategy

    # then we load the scenario specified above from the respective input files
    load_scenario_folder(
        world,
        inputs_path=inputs_path,
        scenario=scenario,
        study_case=study_case,
    )

    # run learning if learning mode is enabled
    # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )

    if world.learning_config.get("learning_mode", False):
        run_learning(
            world,
            inputs_path=inputs_path,
            scenario=scenario,
            study_case=study_case,
        )

    # after the learning is done we make a normal run of the simulation, which equals a test run
    world.run()
INFO:assume.world:connected to db
INFO:assume.scenario.loader_csv:Starting Scenario example_02c/base from ../inputs
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[21], line 21
     18 world.bidding_strategies["pp_learning"] = RLStrategy
     20 # then we load the scenario specified above from the respective input files
---> 21 load_scenario_folder(
     22     world,
     23     inputs_path=inputs_path,
     24     scenario=scenario,
     25     study_case=study_case,
     26 )
     28 # run learning if learning mode is enabled
     29 # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )
     31 if world.learning_config.get("learning_mode", False):

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/assume/scenario/loader_csv.py:768, in load_scenario_folder(world, inputs_path, scenario, study_case, perform_evaluation, terminate_learning, episode, eval_episode)
    729 """
    730 Load a scenario from a given path.
    731
   (...)
    764
    765 """
    766 logger.info(f"Starting Scenario {scenario}/{study_case} from {inputs_path}")
--> 768 scenario_data = load_config_and_create_forecaster(inputs_path, scenario, study_case)
    770 setup_world(
    771     world=world,
    772     scenario_data=scenario_data,
   (...)
    777     eval_episode=eval_episode,
    778 )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/assume/scenario/loader_csv.py:421, in load_config_and_create_forecaster(inputs_path, scenario, study_case)
    407 """
    408 Load the configuration and files for a given scenario and study case. This function
    409 allows us to load the files and config only once when running multiple iterations of the same scenario.
   (...)
    417     dict[str, object]:: A dictionary containing the configuration and loaded files for the scenario and study case.
    418 """
    420 path = f"{inputs_path}/{scenario}"
--> 421 with open(f"{path}/config.yaml") as f:
    422     config = yaml.safe_load(f)
    423 if not study_case:

FileNotFoundError: [Errno 2] No such file or directory: '../inputs/example_02c/config.yaml'

Result Plotting#

[22]:
!pip install matplotlib
Requirement already satisfied: matplotlib in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (3.9.2)
Requirement already satisfied: contourpy>=1.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (1.4.7)
Requirement already satisfied: numpy>=1.23 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (24.1)
Requirement already satisfied: pillow>=8 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (3.1.4)
Requirement already satisfied: python-dateutil>=2.7 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from matplotlib) (2.9.0)
Requirement already satisfied: six>=1.5 in /home/docs/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
[23]:
import os
from functools import partial

import matplotlib.pyplot as plt
from sqlalchemy import create_engine

os.makedirs("outputs", exist_ok=True)

db_uri = "sqlite:///local_db/assume_db.db"

engine = create_engine(db_uri)


sql = """
SELECT ident, simulation,
sum(round(CAST(value AS numeric), 2))  FILTER (WHERE variable = 'total_cost') as total_cost,
sum(round(CAST(value AS numeric), 2)*1000)  FILTER (WHERE variable = 'total_volume') as total_volume,
sum(round(CAST(value AS numeric), 2))  FILTER (WHERE variable = 'avg_price') as average_cost
FROM kpis
where variable in ('total_cost', 'total_volume', 'avg_price')
and simulation in ('example_02a_base', 'example_02b_base', 'example_02c_base')
group by simulation, ident ORDER BY simulation
"""


kpis = pd.read_sql(sql, engine)

kpis
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1967, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1966     if not evt_handled:
-> 1967         self.dialect.do_execute(
   1968             cursor, str_statement, effective_parameters, context
   1969         )
   1971 if self._has_events or self.engine._has_events:

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/default.py:941, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    940 def do_execute(self, cursor, statement, parameters, context=None):
--> 941     cursor.execute(statement, parameters)

OperationalError: no such table: kpis

The above exception was the direct cause of the following exception:

OperationalError                          Traceback (most recent call last)
Cell In[23], line 26
     11 engine = create_engine(db_uri)
     14 sql = """
     15 SELECT ident, simulation,
     16 sum(round(CAST(value AS numeric), 2))  FILTER (WHERE variable = 'total_cost') as total_cost,
   (...)
     22 group by simulation, ident ORDER BY simulation
     23 """
---> 26 kpis = pd.read_sql(sql, engine)
     28 kpis

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/pandas/io/sql.py:734, in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize, dtype_backend, dtype)
    724     return pandas_sql.read_table(
    725         sql,
    726         index_col=index_col,
   (...)
    731         dtype_backend=dtype_backend,
    732     )
    733 else:
--> 734     return pandas_sql.read_query(
    735         sql,
    736         index_col=index_col,
    737         params=params,
    738         coerce_float=coerce_float,
    739         parse_dates=parse_dates,
    740         chunksize=chunksize,
    741         dtype_backend=dtype_backend,
    742         dtype=dtype,
    743     )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/pandas/io/sql.py:1836, in SQLDatabase.read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)
   1779 def read_query(
   1780     self,
   1781     sql: str,
   (...)
   1788     dtype_backend: DtypeBackend | Literal["numpy"] = "numpy",
   1789 ) -> DataFrame | Iterator[DataFrame]:
   1790     """
   1791     Read SQL query into a DataFrame.
   1792
   (...)
   1834
   1835     """
-> 1836     result = self.execute(sql, params)
   1837     columns = result.keys()
   1839     if chunksize is not None:

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/pandas/io/sql.py:1659, in SQLDatabase.execute(self, sql, params)
   1657 args = [] if params is None else [params]
   1658 if isinstance(sql, str):
-> 1659     return self.con.exec_driver_sql(sql, *args)
   1660 return self.con.execute(sql, *args)

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1779, in Connection.exec_driver_sql(self, statement, parameters, execution_options)
   1774 execution_options = self._execution_options.merge_with(
   1775     execution_options
   1776 )
   1778 dialect = self.dialect
-> 1779 ret = self._execute_context(
   1780     dialect,
   1781     dialect.execution_ctx_cls._init_statement,
   1782     statement,
   1783     None,
   1784     execution_options,
   1785     statement,
   1786     distilled_parameters,
   1787 )
   1789 return ret

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1846, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1844     return self._exec_insertmany_context(dialect, context)
   1845 else:
-> 1846     return self._exec_single_context(
   1847         dialect, context, statement, parameters
   1848     )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1986, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1983     result = context._setup_result_proxy()
   1985 except BaseException as e:
-> 1986     self._handle_dbapi_exception(
   1987         e, str_statement, effective_parameters, cursor, context
   1988     )
   1990 return result

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:2355, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec)
   2353 elif should_wrap:
   2354     assert sqlalchemy_exception is not None
-> 2355     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
   2356 else:
   2357     assert exc_info[1] is not None

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1967, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1965                 break
   1966     if not evt_handled:
-> 1967         self.dialect.do_execute(
   1968             cursor, str_statement, effective_parameters, context
   1969         )
   1971 if self._has_events or self.engine._has_events:
   1972     self.dispatch.after_cursor_execute(
   1973         self,
   1974         cursor,
   (...)
   1978         context.executemany,
   1979     )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/default.py:941, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    940 def do_execute(self, cursor, statement, parameters, context=None):
--> 941     cursor.execute(statement, parameters)

OperationalError: (sqlite3.OperationalError) no such table: kpis
[SQL:
SELECT ident, simulation,
sum(round(CAST(value AS numeric), 2))  FILTER (WHERE variable = 'total_cost') as total_cost,
sum(round(CAST(value AS numeric), 2)*1000)  FILTER (WHERE variable = 'total_volume') as total_volume,
sum(round(CAST(value AS numeric), 2))  FILTER (WHERE variable = 'avg_price') as average_cost
FROM kpis
where variable in ('total_cost', 'total_volume', 'avg_price')
and simulation in ('example_02a_base', 'example_02b_base', 'example_02c_base')
group by simulation, ident ORDER BY simulation
]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
[24]:
# sort the dataframe to have sho, bo and lo case in the right order

# sort kpis in the order sho, bo, lo

kpis = kpis.sort_values(
    by="simulation",
    #    key=lambda x: x.map({"example_02a": 1, "example_02b": 2, "example_02c": 3}),
)


kpis["total_volume"] /= 1e9
kpis["total_cost"] /= 1e6
savefig = partial(plt.savefig, transparent=False, bbox_inches="tight")

xticks = kpis["simulation"].unique()
plt.style.use("seaborn-v0_8")

fig, ax = plt.subplots(1, 1, figsize=(10, 6))

ax2 = ax.twinx()  # Create another axes that shares the same x-axis as ax.

width = 0.4

kpis.total_volume.plot(kind="bar", ax=ax, width=width, position=1, color="royalblue")
kpis.total_cost.plot(kind="bar", ax=ax2, width=width, position=0, color="green")

# set x-achxis limits
ax.set_xlim(-0.6, len(kpis["simulation"]) - 0.4)

# set y-achxis limits
ax.set_ylim(0, max(kpis.total_volume) * 1.1 + 0.1)
ax2.set_ylim(0, max(kpis.total_cost) * 1.1 + 0.1)

ax.set_ylabel("Total Volume (GWh)")
ax2.set_ylabel("Total Cost (M€)")

ax.set_xticklabels(xticks, rotation=45)
ax.set_xlabel("Simulation")

ax.legend(["Total Volume"], loc="upper left")
ax2.legend(["Total Cost"], loc="upper right")

plt.title("Total Volume and Total Cost for each Simulation")

plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 5
      1 # sort the dataframe to have sho, bo and lo case in the right order
      2
      3 # sort kpis in the order sho, bo, lo
----> 5 kpis = kpis.sort_values(
      6     by="simulation",
      7     #    key=lambda x: x.map({"example_02a": 1, "example_02b": 2, "example_02c": 3}),
      8 )
     11 kpis["total_volume"] /= 1e9
     12 kpis["total_cost"] /= 1e6

NameError: name 'kpis' is not defined
[25]:
sql = """
SELECT
  product_start AS "time",
  price AS "Price",
  simulation AS "simulation",
  node
FROM market_meta
WHERE simulation in ('example_02a_base', 'example_02b_base', 'example_02c_base') AND market_id in ('EOM')
GROUP BY market_id, simulation, product_start, price, node
ORDER BY product_start, node

"""

df = pd.read_sql(sql, engine)

df
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1967, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1966     if not evt_handled:
-> 1967         self.dialect.do_execute(
   1968             cursor, str_statement, effective_parameters, context
   1969         )
   1971 if self._has_events or self.engine._has_events:

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/default.py:941, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    940 def do_execute(self, cursor, statement, parameters, context=None):
--> 941     cursor.execute(statement, parameters)

OperationalError: no such table: market_meta

The above exception was the direct cause of the following exception:

OperationalError                          Traceback (most recent call last)
Cell In[25], line 14
      1 sql = """
      2 SELECT
      3   product_start AS "time",
   (...)
     11
     12 """
---> 14 df = pd.read_sql(sql, engine)
     16 df

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/pandas/io/sql.py:734, in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize, dtype_backend, dtype)
    724     return pandas_sql.read_table(
    725         sql,
    726         index_col=index_col,
   (...)
    731         dtype_backend=dtype_backend,
    732     )
    733 else:
--> 734     return pandas_sql.read_query(
    735         sql,
    736         index_col=index_col,
    737         params=params,
    738         coerce_float=coerce_float,
    739         parse_dates=parse_dates,
    740         chunksize=chunksize,
    741         dtype_backend=dtype_backend,
    742         dtype=dtype,
    743     )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/pandas/io/sql.py:1836, in SQLDatabase.read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)
   1779 def read_query(
   1780     self,
   1781     sql: str,
   (...)
   1788     dtype_backend: DtypeBackend | Literal["numpy"] = "numpy",
   1789 ) -> DataFrame | Iterator[DataFrame]:
   1790     """
   1791     Read SQL query into a DataFrame.
   1792
   (...)
   1834
   1835     """
-> 1836     result = self.execute(sql, params)
   1837     columns = result.keys()
   1839     if chunksize is not None:

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/pandas/io/sql.py:1659, in SQLDatabase.execute(self, sql, params)
   1657 args = [] if params is None else [params]
   1658 if isinstance(sql, str):
-> 1659     return self.con.exec_driver_sql(sql, *args)
   1660 return self.con.execute(sql, *args)

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1779, in Connection.exec_driver_sql(self, statement, parameters, execution_options)
   1774 execution_options = self._execution_options.merge_with(
   1775     execution_options
   1776 )
   1778 dialect = self.dialect
-> 1779 ret = self._execute_context(
   1780     dialect,
   1781     dialect.execution_ctx_cls._init_statement,
   1782     statement,
   1783     None,
   1784     execution_options,
   1785     statement,
   1786     distilled_parameters,
   1787 )
   1789 return ret

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1846, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1844     return self._exec_insertmany_context(dialect, context)
   1845 else:
-> 1846     return self._exec_single_context(
   1847         dialect, context, statement, parameters
   1848     )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1986, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1983     result = context._setup_result_proxy()
   1985 except BaseException as e:
-> 1986     self._handle_dbapi_exception(
   1987         e, str_statement, effective_parameters, cursor, context
   1988     )
   1990 return result

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:2355, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec)
   2353 elif should_wrap:
   2354     assert sqlalchemy_exception is not None
-> 2355     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
   2356 else:
   2357     assert exc_info[1] is not None

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/base.py:1967, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1965                 break
   1966     if not evt_handled:
-> 1967         self.dialect.do_execute(
   1968             cursor, str_statement, effective_parameters, context
   1969         )
   1971 if self._has_events or self.engine._has_events:
   1972     self.dispatch.after_cursor_execute(
   1973         self,
   1974         cursor,
   (...)
   1978         context.executemany,
   1979     )

File ~/checkouts/readthedocs.org/user_builds/assume/conda/stable/lib/python3.12/site-packages/sqlalchemy/engine/default.py:941, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    940 def do_execute(self, cursor, statement, parameters, context=None):
--> 941     cursor.execute(statement, parameters)

OperationalError: (sqlite3.OperationalError) no such table: market_meta
[SQL:
SELECT
  product_start AS "time",
  price AS "Price",
  simulation AS "simulation",
  node
FROM market_meta
WHERE simulation in ('example_02a_base', 'example_02b_base', 'example_02c_base') AND market_id in ('EOM')
GROUP BY market_id, simulation, product_start, price, node
ORDER BY product_start, node

]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
[26]:
# Convert the 'time' column to datetime
df["time"] = pd.to_datetime(df["time"])

# Plot the data
plt.figure(figsize=(14, 7))
# Loop through each simulation and plot
for simulation in df["simulation"].unique():
    subset = df[df["simulation"] == simulation]
    plt.plot(subset["time"], subset["Price"], label=simulation)

plt.title("Price over Time for Different Simulations")
plt.xlabel("Time")
plt.ylabel("Price")
plt.legend(title="Simulation")
plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 2
      1 # Convert the 'time' column to datetime
----> 2 df["time"] = pd.to_datetime(df["time"])
      4 # Plot the data
      5 plt.figure(figsize=(14, 7))

NameError: name 'df' is not defined
[ ]: