Gutinstinct

02 Feb 2019

Gut Instinct: Creating Scientific Theories with Online Learning

Our intuition is that scientific crowdsourcing will most usefully contribute to domains where science is nascent and/or highly contextual. The human microbiome project is both. This paper explores the potential of coupling online citizen science with learning materials to create scientific questions. Example: Foldit players discovered protein structures that helped scientists understand how the AIDS virus reproduces. The main contribution of this paper is demonstrating that a crowd of online non-expert learners can collaboratively perform useful scientifc work. Gut Instinct, which brings together learnes to perform useful collaborative brainstorming on a citizen science project while developing expertise. Collectively aggregating many people's responses can produce faster, better, and more reliable results - at much larger scale - than lone individuals can, at least errors andd biases are independent events. Our novel contribution is an explicit integration of learning. Hypotheses: Learning improves quality of work on relevant problems. Working on relevant real-world problems improves learning. Working while learning improves learners’ en- gagement with the learning material.

A3c

25 Jan 2019

Reading Group: Asynchronous Methods for Deep Reinforcement Learning V Minh. et. al.

The sequence of observed data encountered by an online RL agent is non-stationary, and online RL updates are strongly correlated. By storing the agent's data in an experience replay memory, the data can be batched or randomly sampled from different-steps.

Drawbacks of experience replay

Asynchronous RL Framework

They present multi-threade asynchronous variants of one-step SARSA, one-step Q learning, n-step Q learning and advantage actor-critic. Actor-critic being an on-policy search method and Q-learning being an off-policy value-based method Gorila Framework Hogwild!

Learn By Doing Rl Actor Critic

01 Jan 2019

Implementing and Understanding Actor Critic Algorithms

I was following Seregey Levine's lecture and using the following pseudocode on page 19 to implement the algorithm:

The code is here. (unfinished)