User:Ob7/sandbox/Off-policy learning

This is not a Wikipedia article: It is an individual user's work-in-progress page, and may be incomplete and/or unreliable. For guidance on developing this draft, see Wikipedia:So you made a userspace draft.

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs
This page was last edited by Cydebot (talk | contribs) 5 years ago. (Update timer)

Finished writing a draft article? Are you ready to request an experienced editor review it for possible inclusion in Wikipedia? Submit your draft for review!

In reinforcement learning, off-policy learning refers to the problem of learning about a policy while data is generated using a different policy. A typical case is policy evaluation, where the objective is to estimate the value function for a given stationary policy. In off-policy policy evaluation, the objective is then to estimate that function while a different policy is followed.

User:Ob7/sandbox/Off-policy learning

References

External links