Funded PhD Opportunity Contextual Probability based Approach to Reinforcement Learning

This opportunity is now closed.

Subject: Computer Science and Informatics


Reinforcement learning is a machine learning paradigm based on Markov decision process, where an agent tries to maximise the accumulated reward it receives when interacting with a dynamic and uncertain environment. The agent is not told how to behave, but instead must learn the policy (rule of actions), through interactions, which yields the most reward by trial-and-error. Reinforcement learning has various applications including games, robotics, computer vision, and algorithmic trading and portfolio management. In the last few years, we have witnessed the renaissance of reinforcement learning. The combination of this paradigm and deep neural networks is behind many breakthrough technologies. One striking example is Alpha Go, the first computer Go program to defeat human Go masters. Nevertheless, there is still room for improvement when using this paradigm in problems where e.g. the environment is Brownian motion when the reward, which is a key component of reinforcement learning, is not easy to define. One example with such an environment is (financial) automated trading, which generated (2014) orders for more than 75 percent of the stock shares traded on United States exchanges (Wikipedia).

An automated trading system (ATS) is a computer program that automatically creates orders based on predefined rules, and submits them to an exchange. The rules determine when to enter a position, when to exit a position and how much. ATSs can be designed to trade stocks, options, futures and foreign exchange products, and they can execute repetitive tasks at speeds with orders of magnitude greater than any human equivalent.

This project will study policy optimization and reward formulation in Brownian motion environments from the contextual probability perspective. Contextual probability (Wang, Murtagh 2008) is a secondary probability defined in terms of a given primary probability in a systematic way, and they have a simple linear relationship. Accepting the principle of indifference, contextual probability can be estimated from data sample through neighbourhood counting, which is a kernel function. The interaction between primary and secondary probabilities can continue until equilibrium. The project will formulate policy optimisation as the interaction between primary and secondary probabilities under various constraints in order to optimise the accumulated reward. The project will evaluate the findings in a prototype automated trading system.

Hui Wang, Fionn Murtagh (2008) A Study of the Neighborhood Counting Similarity, IEEE Transactions on Knowledge and Data Engineering, 449-461.

Essential Criteria

  • Upper Second Class Honours (2:1) Degree from a UK institution (or overseas award deemed equivalent via UK NARIC)

Desirable Criteria

If the University receives a large number of applicants for the project, the following desirable criteria may be applied to shortlist applicants for interview.

  • First Class Honours (1st) Degree
  • Masters at 70%


    Vice Chancellors Research Scholarships (VCRS)

    The scholarships will cover tuition fees and a maintenance award of £14,777 per annum for three years (subject to satisfactory academic performance). Applications are invited from UK, European Union and overseas students.


    The scholarship will cover tuition fees at the Home rate and a maintenance allowance of £ 14,777 per annum for three years. EU applicants will only be eligible for the fees component of the studentship (no maintenance award is provided).  For Non EU nationals the candidate must be "settled" in the UK.

Other information

The Doctoral College at Ulster University

Launch of the Doctoral College

Current PhD researchers and an alumnus shared their experiences, career development and the social impact of their work at the launch of the Doctoral College at Ulster University.

Watch Video

Key Dates

Submission Deadline
Monday 19 February 2018
Interview Date
9 to 23 March 2018

Contact Supervisor

Professor Hui Wang

Other Supervisors

Apply online

Visit and quote reference number #237808 when applying for this PhD opportunity