Открыть кейс dota 3

Swift Claw

Masque of Awaleb

Masque of Awaleb

Bladeform Legacy

Golden Mandate of the Stormborn

Golden Mandate of the Stormborn

Primal Paean

Genuine Kantusa the Script Sword

Genuine Kantusa the Script Sword

Bitter Lineage

Arms of Desolation

Arms of Desolation

Origins of Faith

Origins of Faith

Great Sage’s Reckoning

Great Sage's Reckoning

Golden Cyrridae

Swine of the Sunken Galley

Swine of the Sunken Galley

Sullen Harvest

Pauldrons of the Demon Trickster

Pauldrons of the Demon Trickster

Sea Rake’s Bridle

Sea Rake's Bridle

Jagged Honor Legs

Jagged Honor Legs

Sylvan Vedette

Genuine Eternal Radiance Blades

Genuine Eternal Radiance Blades

Scorching Talon

Golden Mantle of Grim Facade

Golden Mantle of Grim Facade

Dark Artistry Belt

Dark Artistry Belt

Astral Drift

Insatiable Bonesaw

Occultist's Pursuit

Staff of the Demon Trickster

Staff of the Demon Trickster

Golden Full-Bore Bonanza

Golden Full-Bore Bonanza

Remnants of Ascension

Remnants of Ascension

Lash of the Lizard Kin

Lash of the Lizard Kin

Arena Champion

Avatar of the Impossible Realm

Avatar of the Impossible Realm

Desert Burn Saddle

Desert Burn Saddle

Thirst of Eztzhok — Off-Hand

Thirst of Eztzhok - Off-Hand

Inscribed Blades of Voth Domosh

Inscribed Blades of Voth Domosh

Inscribed Fractal Horns of Inner Abysm

Inscribed Fractal Horns of Inner Abysm

Inscribed Staff of Perplex

Inscribed Staff of Perplex

Provocation of Ruin Mask

Provocation of Ruin Mask

Colar of the Ardalan Interdictor

Colar of the Ardalan Interdictor

Unusual Ageless Apothecary

Unusual Ageless Apothecary

Inscribed Empyrean

Provocation of Ruin Bracers

Provocation of Ruin Bracers

Frosty the Sew-Man

Frosty the Sew-Man

Inscribed Turstarkuri Pilgrim Head

Inscribed Turstarkuri Pilgrim Head

Vigil Triumph

Golden Latticean Shards

Golden Latticean Shards

Golden Infernal Chieftain

Golden Infernal Chieftain

Genuine Kantusa the Script Sword

Genuine Kantusa the Script Sword

Genuine Golden Nothlic Burden

Genuine Golden Nothlic Burden

Yulsaria's Glacier

Iron Surge

Ice Blossom

Blackshield Protodrone

Whispers of the Damned

Whispers of the Damned

Golden Immortal Pantheon

Golden Immortal Pantheon

Swift Claw

Masque of Awaleb

Masque of Awaleb

Bladeform Legacy

Golden Mandate of the Stormborn

Golden Mandate of the Stormborn

Primal Paean

Genuine Kantusa the Script Sword

Genuine Kantusa the Script Sword

Bitter Lineage

Arms of Desolation

Arms of Desolation

Origins of Faith

Origins of Faith

Great Sage’s Reckoning

Great Sage's Reckoning

Golden Cyrridae

Swine of the Sunken Galley

Swine of the Sunken Galley

Sullen Harvest

Pauldrons of the Demon Trickster

Pauldrons of the Demon Trickster

Sea Rake’s Bridle

Sea Rake's Bridle

Jagged Honor Legs

Jagged Honor Legs

Sylvan Vedette

Genuine Eternal Radiance Blades

Genuine Eternal Radiance Blades

Scorching Talon

Golden Mantle of Grim Facade

Golden Mantle of Grim Facade

Dark Artistry Belt

Dark Artistry Belt

Astral Drift

Insatiable Bonesaw

Occultist's Pursuit

Staff of the Demon Trickster

Staff of the Demon Trickster

Golden Full-Bore Bonanza

Golden Full-Bore Bonanza

Remnants of Ascension

Remnants of Ascension

Lash of the Lizard Kin

Lash of the Lizard Kin

Arena Champion

Avatar of the Impossible Realm

Avatar of the Impossible Realm

Desert Burn Saddle

Desert Burn Saddle

Thirst of Eztzhok — Off-Hand

Thirst of Eztzhok - Off-Hand

Inscribed Blades of Voth Domosh

Inscribed Blades of Voth Domosh

Inscribed Fractal Horns of Inner Abysm

Inscribed Fractal Horns of Inner Abysm

Inscribed Staff of Perplex

Inscribed Staff of Perplex

Provocation of Ruin Mask

Provocation of Ruin Mask

Colar of the Ardalan Interdictor

Colar of the Ardalan Interdictor

Unusual Ageless Apothecary

Unusual Ageless Apothecary

Inscribed Empyrean

Provocation of Ruin Bracers

Provocation of Ruin Bracers

Frosty the Sew-Man

Frosty the Sew-Man

Inscribed Turstarkuri Pilgrim Head

Inscribed Turstarkuri Pilgrim Head

Vigil Triumph

Golden Latticean Shards

Golden Latticean Shards

Golden Infernal Chieftain

Golden Infernal Chieftain

Genuine Kantusa the Script Sword

Genuine Kantusa the Script Sword

Genuine Golden Nothlic Burden

Genuine Golden Nothlic Burden

Yulsaria's Glacier

Iron Surge

Ice Blossom

Blackshield Protodrone

Whispers of the Damned

Whispers of the Damned

Golden Immortal Pantheon

Golden Immortal Pantheon

Kaige

Dota-2: Dota-2 with large scale reinforcement learning

Dota-2 is a multi-players real-time strategy game (RTS), which is played on a squared map with two teams locating on diagonal corners. Each team have 5 players, each controls a hero with specific skills. On each team also have a set of creeps which is not controllable but attach opponent automatically. Players can earn gold coin on killing opponent’s creeps then upgrade skill and items.

Dota-2 (source from original paper)

Main challenges of Dota-2 for RL

  1. Long-time horizon: 30 frames per second last for 45mins, roughly 20,000 step per episode.
  2. Partially-observed state: players only see nearby environment
  3. High-dimensional state space: 16,000 valued state vector
  4. High-dimensional action space: valid action number range from 8,000 to 80,000 each step
  1. 4 frames per action
  2. discrete action return by RL
  3. certain game mechanics are hand-scripted rather than controlled by RL policy
  4. some properties of the enviroment were randomized to ensure sufficiently diverse training games for robustness.

State space and encoding:

  1. instead of using pixels on the screen, we organized info into a set of data array
  2. all float info and booleans are normalized before feeding in to neural network, we also keep running mean and standard deviation of all data ever observed.
  3. after normalized by mean and std, state are clipped between [-5, 5]
state space (source from original paper)
source from original space
source from original paper
  1. a primary action (30): noop, move, attach, activate spell, activate items, etc
  2. a set of parametric action: delay(4), unit selection(189), offset(80)
  3. action mask to filter valid action per step
  4. factored action space: 30x4x189x81=1,837,080
  5. Some actions are scripted, following engineering practice, we start from as small set of action for RL and increase gradually.
  6. scripted actions: ability builds, item purchasing, item swap and courier control
  1. win the game
  2. we given reward for a set of actions which are good for human players.
  3. all rewards are zero-sum, subtracting from each hero’s reward the average of opponent’s rewards
  4. game time weighting: rewards are much bigger in magnitude in plater game phase due to more skillful hero than early phase. Policy will focus on later phase learning and ignored earlier stages. To avoid this, reward is normalized according to time step

Although, tau=1 is the ultimate goal (team win), we find that lower tau reduce gradient variance in early training, which leads to clearer rewards for learning mechanical and tactical ability.

source from original paper

Neural network artichecture

  1. 158,502,815 parameters in total (policy and value function) 0.15B
  2. observations are processed and pooled into a single vector summarizing the state
  3. single layer of LSTM
  4. output of LSTM are projected (linear projection) into action heads and value function head.
source from original paper
  1. rollout worker(51,2000 CPU for game runing)
  2. 512 GPUs for network inference (forward pass)
  3. 512 GPUs for training
  4. 10 months training time
source from original paper
Новости:  Темная сторона дота
Оцените статью
Dota Help