<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>depth first search &#187; rl</title>
	<atom:link href="http://www.depthfirstsearch.net/blog/tag/rl/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.depthfirstsearch.net/blog</link>
	<description>“We can only see a short distance ahead, but we can see plenty there that needs to be done.&#34;</description>
	<lastBuildDate>Sun, 05 Feb 2012 13:00:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Matrix Syntax for Sparse Programming</title>
		<link>http://www.depthfirstsearch.net/blog/2008/07/09/matrix-syntax-for-sparse-programming/</link>
		<comments>http://www.depthfirstsearch.net/blog/2008/07/09/matrix-syntax-for-sparse-programming/#comments</comments>
		<pubDate>Wed, 09 Jul 2008 18:57:03 +0000</pubDate>
		<dc:creator>JS</dc:creator>
				<category><![CDATA[computer science]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[rl]]></category>

		<guid isPermaLink="false">http://www.depthfirstsearch.net/blog/?p=479</guid>
		<description><![CDATA[In response to a recent Coding Horror piece on Spartan programming I&#8217;d like to humbly submit two examples from my own recent work. First, in C++ we have: for &#40;int i = 0; i &#60; nstates; i++&#41; &#123; for &#40;int a = 0; a &#60; nactions; a++&#41; &#123; Q&#91;i&#93;&#91;a&#93; += alpha * delta * e&#91;i&#93;&#91;a&#93;; [...]]]></description>
			<content:encoded><![CDATA[<p>In response to a recent <a href="http://www.codinghorror.com/blog/archives/001148.html">Coding Horror</a> piece on <a href="http://ssdl-wiki.cs.technion.ac.il/wiki/index.php/Spartan_programming">Spartan programming</a> I&#8217;d like to humbly submit two examples from my own recent work.</p>
<p>First, in C++ we have:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> nstates<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> a <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> a <span style="color: #000080;">&lt;</span> nactions<span style="color: #008080;">;</span> a<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        Q<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#91;</span>a<span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span><span style="color: #000080;">=</span> alpha <span style="color: #000040;">*</span> delta <span style="color: #000040;">*</span> e<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#91;</span>a<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        e<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#91;</span>a<span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span><span style="color: #000080;">=</span> gamma <span style="color: #000040;">*</span> lambda<span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Next, in Python with Numpy:</p>
</pre>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #008000;">self</span>.<span style="color: black;">Q</span> += <span style="color: #008000;">self</span>.<span style="color: black;">alpha</span> <span style="color: #66cc66;">*</span> delta <span style="color: #66cc66;">*</span> <span style="color: #008000;">self</span>.<span style="color: black;">e</span>
<span style="color: #008000;">self</span>.<span style="color: black;">e</span> <span style="color: #66cc66;">*</span>= <span style="color: #008000;">self</span>.<span style="color: black;">gamma</span> <span style="color: #66cc66;">*</span> <span style="color: #008000;">self</span>.<span style="color: black;">ld</span></pre></div></div>

<p>These are identical temporal difference updates in two versions of a Sarsa reinforcement learning agent. Having reasonable syntax and semantics for vector and matrix operations is essential for sparse programming in any kind of numerically intensive setting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.depthfirstsearch.net/blog/2008/07/09/matrix-syntax-for-sparse-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Revenue Management and RL</title>
		<link>http://www.depthfirstsearch.net/blog/2008/05/08/reinforcement-learning-and-revenue-management/</link>
		<comments>http://www.depthfirstsearch.net/blog/2008/05/08/reinforcement-learning-and-revenue-management/#comments</comments>
		<pubDate>Thu, 08 May 2008 17:41:09 +0000</pubDate>
		<dc:creator>JS</dc:creator>
				<category><![CDATA[computer science]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[rl]]></category>

		<guid isPermaLink="false">http://www.depthfirstsearch.net/blog/?p=441</guid>
		<description><![CDATA[We&#8217;ve had some interesting discussion recently on the future goals of reinforcement learning. Penetration into industry seems to be a hot topic within the community, with the consensus being that reinforcement learning is currently too difficult to apply to problems without expertise. A related issue is that RL has no standard tookit, unlike machine learning, [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve had some interesting discussion recently on the future goals of reinforcement learning. Penetration into industry seems to be a hot topic within the community, with the consensus being that reinforcement learning is currently too difficult to apply to problems without expertise. A related issue is that RL has no standard tookit, unlike machine learning, which now has <a href="http://www.cs.waikato.ac.nz/~ml/index.html">Weka</a> among others.</p>
<p>Then, of course, there is the search for the killer app, a reinforcement learning application that demonstrably revolutionizes an industry. The killer app must of course be preceded by a killer problem. Allow me to humbly submit one such potential area of application: Revenue Management.</p>
<p>I am most familiar with airline revenue management, the method by which airlines choose prices for the seats they sell. Most of the interesting dynamics of airline revenue management policies come from two key properties.</p>
<ol>
<li>Customer behavior is difficult to predict.</li>
<li>An empty seat is perishable.</li>
</ol>
<p>Modern airlines have next to no capital. Everything from the terminal, offices, runways, baggage carts, and planes is rented or leased. What airlines are is an orchestration of the revenue that flows through the organization of these combined resources, revenue that starts in your wallet.</p>
<p>The optimization problem is to pick prices for seats that maximize profit. Airlines have a number of nifty tricks to do this, like observing that certain kinds of travelers (business travelers with expense budgets) book late whereas recreational flyers book early. This allows for price discrimination.</p>
<p>Of course this optimization problem has a natural reward signal &#8211; profit, a reasonably tractable set of actions (setting prices) and a nice random state transition (customers purchase seats). Enter reinforcement learning.</p>
<p>UPDATE: A quick search of Google scholar shows that some work has already been done in this area: <a href="http://citeseer.ist.psu.edu/gosavi04reinforcement.html">link</a>. <a href="http://www.google.com/url?sa=t&amp;ct=res&amp;cd=1&amp;url=http%3A%2F%2Fwww.springerlink.com%2Findex%2FD81H827HFU4ANCCG.pdf&amp;ei=HTwjSIWtL4KuigHCgIGCDA&amp;usg=AFQjCNEJw310jXGLJRltNsFs_WVmkJ9MyQ&amp;sig2=ZtEHjy9rXeerb69HNwc8Ug">link</a>. It&#8217;s nice to know that I&#8217;m on the curve, even if I&#8217;m behind on it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.depthfirstsearch.net/blog/2008/05/08/reinforcement-learning-and-revenue-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What I&#039;m Reading</title>
		<link>http://www.depthfirstsearch.net/blog/2008/04/02/what-im-reading-2/</link>
		<comments>http://www.depthfirstsearch.net/blog/2008/04/02/what-im-reading-2/#comments</comments>
		<pubDate>Thu, 03 Apr 2008 01:40:51 +0000</pubDate>
		<dc:creator>JS</dc:creator>
				<category><![CDATA[computer science]]></category>
		<category><![CDATA[cmac]]></category>
		<category><![CDATA[rl]]></category>

		<guid isPermaLink="false">http://www.depthfirstsearch.net/blog/2008/04/02/what-im-reading-2/</guid>
		<description><![CDATA[A New Approach to Manipulator control: The Cerebellar Model Articulation Controller by J.S Albus (1975) It is an old but good paper on a novel perceptron architecture that is widely used for reinforcement learning these days. I&#8217;ve created a bare-bones implementation in Python for those who are interested. You&#8217;ll need numpy and matplotlib. Here&#8217;s a [...]]]></description>
			<content:encoded><![CDATA[<p><em>A New Approach to Manipulator control: The Cerebellar Model Articulation Controller by J.S Albus (1975)</em></p>
<p>It is an old but good paper on a novel perceptron architecture that is widely used for reinforcement learning these days. I&#8217;ve created a bare-bones <a href="http://www.depthfirstsearch.net/blog/cmac/">implementation</a> in Python for those who are interested. You&#8217;ll need numpy and matplotlib.</p>
<p>Here&#8217;s a sin curve we all know and love:</p>
<p><a href="http://www.depthfirstsearch.net/blog/wp-content/uploads/2008/04/sin.png" title="Sin Wave"><img src="http://www.depthfirstsearch.net/blog/wp-content/uploads/2008/04/sin.thumbnail.png" alt="Sin Wave" /></a></p>
<p>And here&#8217;s the CMACs training error over time:</p>
<p><a href="http://www.depthfirstsearch.net/blog/wp-content/uploads/2008/04/error.png" title="CMAC Error"><img src="http://www.depthfirstsearch.net/blog/wp-content/uploads/2008/04/error.thumbnail.png" alt="CMAC Error" /></a></p>
<p>And finally a picture of the function that the CMAC outputs after training:</p>
<p><a href="http://www.depthfirstsearch.net/blog/wp-content/uploads/2008/04/cmac.png" title="CMAC Function"><img src="http://www.depthfirstsearch.net/blog/wp-content/uploads/2008/04/cmac.thumbnail.png" alt="CMAC Function" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.depthfirstsearch.net/blog/2008/04/02/what-im-reading-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Applications of Machine Learning</title>
		<link>http://www.depthfirstsearch.net/blog/2007/12/08/applications-of-machine-learning/</link>
		<comments>http://www.depthfirstsearch.net/blog/2007/12/08/applications-of-machine-learning/#comments</comments>
		<pubDate>Sun, 09 Dec 2007 01:31:26 +0000</pubDate>
		<dc:creator>JS</dc:creator>
				<category><![CDATA[computer science]]></category>
		<category><![CDATA[ml]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[rl]]></category>

		<guid isPermaLink="false">http://www.depthfirstsearch.net/2007/12/08/applications-of-machine-learning/</guid>
		<description><![CDATA[It turns out that optimizing warehouse tasks is hard.]]></description>
			<content:encoded><![CDATA[<p>It turns out that <a href="http://pasquinade.blogspot.com/2007/10/optimization.html">optimizing warehouse tasks</a> is <a href="http://www.cs.ualberta.ca/~mgh/PUBLICATIONS/icml01.pdf">hard</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.depthfirstsearch.net/blog/2007/12/08/applications-of-machine-learning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Debugging and Machine Learning</title>
		<link>http://www.depthfirstsearch.net/blog/2007/12/08/debugging-and-machine-learning/</link>
		<comments>http://www.depthfirstsearch.net/blog/2007/12/08/debugging-and-machine-learning/#comments</comments>
		<pubDate>Sat, 08 Dec 2007 17:00:25 +0000</pubDate>
		<dc:creator>JS</dc:creator>
				<category><![CDATA[computer science]]></category>
		<category><![CDATA[ml]]></category>
		<category><![CDATA[rl]]></category>
		<category><![CDATA[robots]]></category>

		<guid isPermaLink="false">http://www.depthfirstsearch.net/2007/12/08/debugging-and-machine-learning/</guid>
		<description><![CDATA[As I near completion on my final project for a course on reinforcement learning, I came across the following from Sutton&#8217;s page on tile coding: With the code described so far, there is a small probability that unrelated inputs will hash into some of the same tiles. In a group of tilings, usually there will [...]]]></description>
			<content:encoded><![CDATA[<p>As I near completion on my final project for a course on reinforcement learning,  I came across the following from Sutton&#8217;s page on <a href="http://www.cs.ualberta.ca/~sutton/tiles2.html">tile coding</a>:</p>
<blockquote><p>With the code described so far, there is a small probability that unrelated inputs will hash into some of the same tiles. In a group of tilings, usually there will be no more than one such &#8220;collision&#8221;, so that it is not a big problem; the learning process will sort it out. There will not be a big effect on performance unless the memory is too small or the hash functions are poorly designed. Nevertheless, the possibility of such a problem is annoying. When one&#8217;s program doesn&#8217;t work, there is a tendency, deserved or not, to suspect a failure of the hashing function.</p></blockquote>
<p>I did, in fact, discover that my memory size was too small, resulting in a number of collisions. That was not the only problem with my agent, but one of many.</p>
<p>Of recent related significance, the UTCS/ART autonomous vehicle team did not make the finals in the Urban Challenge. One of the technical problems the team faced was a bad Ethernet cable that delayed critical sensor readings by as much as five seconds. The thread here is that debugging (in the classic sense as a programmer art) does not apply easily to systems that exhibit degrees of homeostasis or non-determinism (e.g. Ethernet protocol, TD-learning).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.depthfirstsearch.net/blog/2007/12/08/debugging-and-machine-learning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Database Caching 1/17 queries in 1.421 seconds using disk: basic
Object Caching 487/517 objects using disk: basic

Served from: www.depthfirstsearch.net @ 2012-02-07 14:10:30 -->
