paper-reading-group

Model free conventions in MARL with Heterogeneous preferences

By Koster et al

link - https://arxiv.org/pdf/2010.09054.pdf

Key Takeaways -

What exactly are conventions -

Examples of MF conventions in real life?? -

The model -

A convention here is a problem of joint exploration and collective action

The game -

Formally -

imgs/Untitled.png

Heterogeneous preferences -

Formally -

imgs/Untitled%201.png

$G = {g_1, g_2,…, g_{ G }}$ indicates the set of unique reward profiles such that $g(I_k(s_t, a_t)) = R^K_t$ , the reward to the Kth agent at the t-th time step

$\mu_E$ denote the distribution of states generated from playing out games where each agent implements its part of the equilibria

imgs/Untitled%202.png

Individual learning behavior -

Each agent follows a policy $\pi(a^i o^i)$, based on its observation $o^i = O(s, i)$ and its reward $r^i(s, a^{i:1 to N})$. Each agents goal is to maximise the gamma discounted reward:

imgs/Untitled%203.png

Definition of convention -

The group produces a distribution of joint behavior. The degree of dissimilarity of this produced behavior to the expected behavior in a maximally uncoordinated state is called the Degree of convention

Formally -

Assume a signature function \sigma(s) for a degree of conventionality for a population of N agents with reward profile set G, and a joint policy $\vec{\pi}$. Let $\mu_{\vec{\pi}, s_0}$ be the distribution of states following $\vec{\pi}$ and starting from $s_0$. The expected emperical signature would be

imgs/Untitled%204.png

Following a policy $\vec{\pi}_{TS}$ being the taste-seeking policy, the expirical signature obtained would be

imgs/Untitled%205.png

The degree of conventionality then can be defined as - $C = M(\Sigma_{TS}, \Sigma)$, m being any measure to compare two distributions

What are the valid signature functions??

A valid signature used in the paper is $\sigma(s) = (b_1(s), b_2(s),…b_K(s))$, where $b_K(s)$ = Number of K type berries / Total number of berries

The GAME-

imgs/Untitled%206.png

m(s) - monculture fraction

Agent architectural details -

Combinations tried out -

imgs/Untitled%207.png

Monoculture states

The Free rider problem -

Considering the Setting 3 - Let $\mu_{\vec{\pi}}$ be the distribution of states following the policy $\vec{\pi}$, the expected monoculture fraction is $\bar{m} = \Sigma m(s)\mu_{\vec{\pi}}$ **

The start-up problem -

Occurs in Setting 4 and setting 2

imgs/Untitled%208.png

Long Term convention formation

Considering setting 4, with 4 groups consisting of 11(Red), 5, 4, 4 players respectively

imgs/Untitled%209.png

Obs -

Discussion -