# PhD Statistics

Graduate study in Statistics is very essential because it leads to some specific business positions including portfolio analysis, design studies, statistical analysis, computer simulation and software design (Data Analytics), testing, and other areas of operations research. Many laboratories, both government and private, maintain independent research staffs that include statisticians. Their work often deals with the development of new technology, including design and analysis of experiments, software development, and numerical simulation, such as weather and climate forecasting, which depends heavily on the use of supercomputers. The PhD programme in Statistics prepares students for managerial positions and also research work, leading to peer-reviewed publications as well as employment opportunity in research institutions as faculty. The diversity of applications of Statistics is an exciting and challenging, which is one reason why the demand for well-trained statisticians continues to be strong.

The programme emphasizes on the teaching of theory and principles of mathematics, statistical theory and methodology, and applications to provide the basis for meaningful practical applications. Option I (Mathematical Statistics) requires candidates to undertake research involving rigorous mathematical and statistical theories to promote knowledge in the area of Mathematical Statistics while Option II (Applied Statistics) will require candidates to develop various innovative techniques to solve real-life problems. This is in line with the Department of Mathematics’ graduate training of students to formulate abstract mathematical models for real-world problems and also design and apply appropriate computer-based solutions to real-world problems.

**Aims and Objectives**

The aim of the PhD programme is to produce high calibre graduates with rigorous research and analytical skills, who are well-equipped to go onto postdoctoral research, or employment in industry and private/public service that require the application of high level Statistical concept to solve problems. Graduates of the programme will:

- Be competent in mainstream advanced statistical theory and modelling;
- Be exposed to modern developments in Statistics;
- Have the ability to design and conduct research in academic/industrial settings;
- Have the ability to serve as better bridges between the academic and corporate worlds;
- Have an appreciation and necessity to enable them undertake postdoctoral research in Statistics.

**Content of courses for each semester**

**YEAR ONE SEMESTER ONE**

**STAT 751: Probability and Measure Theory (3, 0, 3)**

The topics include: Discrete and Continuous random variables and their probability distributions; Construction of Lebesque measure on R: extension of the length of an interval; Extension of parameters (from rings of subset to algebra, uniqueness of extension, Caratheodory method using outer measure of general sets); Measure spaces and measurable functions (Definition, vector lattice properties of the space of measurable, real-valued functions on a measure space); Integration (Definition of integralbility, integration of real-valued, measurable functions defined on general space, momnotone convergence theorem, dominated convergence theorem, Fatou’s Lemma).

**STAT 753: Advanced Epidemiology (2, 2, 3)**

Topics include: causal inference, missing data, directed acyclic graphs, effect modification, measurement error, validity and reliability, study design, confounding and bias, diagnostic testing. The methods will be illustrated through studies of the epidemiology of both infectious and non-infectious diseases.

**STAT 755: Advanced Statistical Quality Control (2, 2, 3)**

Topics include: the quality of limits for process behaviour charts, autocorrelated data and process behaviour charts, degrees of freedom for process behaviour charts, process behaviour charts and chaos theory, power function for control charts, Cusum and EWMA techniques, the role of the normal distribution, the central limit theorem, precontrol and charts, the analysis of mean, manufacturing specification setting, using small amount of data for limits. Statistical software will be used to apply the techniques to real-life case studies from manufacturing and service industries.

**STAT 757: Advanced Stochastic Processes (2, 2, 3)**

Topics include: Random Process, Spectral representation of random processes, Poisson process, birth-death process, and renewal process, Discrete-time Markov chains, Semi-Markov processes and continuous-time Markov chains, Hidden Markov models, Filtering and prediction of random processes, Queueing and loss models.

**STAT 759: Algorithm for Data Science (2, 2, 3)**

Methods for organizing data (hashing, trees, queues, lists, priority queues). Streaming algorithms for computing statistics on the data. Sorting and searching. Graph models and algorithms for searching (shortest paths, and matching). Neural networks (DNNs, CNNs, and RNNs), with Tensor Flow 2.0. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.

**STAT 761: Advanced Categorical Data Analysis (2, 2, 3)**

Topics include: Linear mixed model, estimation in Gaussian mixed model, model diagnostics and variable selection, generalized linear mixed model (GLMM) for binary outcome, GLMM for multi-category nominal outcome, for ordinal outcome, GLMM for counts, likelihood based inference, generalized estimating equation, generalized least squares, GLMM diagnostics and variable selection. Statistical software will be used to apply the techniques to real-life data.

**STAT 763: Advanced Statistical Inference (3, 0, 3)**

Topics include: Review of likelihood functions, maximum likelihood of functions of the parameter, Wald test and confidence intervals, likelihood ratio test and confidence intervals, algorithms for maximising the likelihood (Newton-Raphson, Expectation-maximization), score function and Fisher information, quasi-likelihood, generalized estimating equations, generalised least squares, weighted least squares, model selection using likelihood.

**STAT 765: Advanced Econometrics (2, 2, 3)**

Topics include: Review of ordinary least squares, method of moments, maximum likelihood, heteroscedasticity and autocorrelation. Lagged dependent variable model, econometrics of panel data, generalized least squares, distributed lag model, vector error correction model. Statistical software will be used to apply the techniques to real-life data.

**STAT 767: Computational Statistics and Data Science (2, 2, 3)**

Topics include: Data science concepts and processes, data wrangling, data visualizations, Web analytics, predictive modelling and assessment techniques, kernel and local polynomial nonparametric regression, basis expansion and spline regression, generalized additive models, classification and regression tree, bootstrap resampling and inference, cross-validation.

**YEAR ONE SEMESTER TWO**

**STAT 752: Advanced Survival Analysis (2, 2, 3)**

Topics include: Review of survival analysis, classical survival analysis, Cox proportional model for time-dependent covariates, frailty models, cure models, Poisson models, competing risks survival analysis, multivariate survival data, joint modelling.

**STAT 754: Advanced Bayesian Statistics (2, 2, 3)**

Topics include: Review of modelling in classical approach, Bayesian linear models, Bayesian generalized linear models for binary, count and ordinal data, Bayesian linear mixed models and Survival models, Bayesian parameter estimation and uncertainty, model validation and variable selection. Statistical software will be used to apply the techniques to real-life data.

**STAT 756: Advanced Time Series Analysis (2, 2, 3)**

Topics include: Review of Nonstationary time series models, nonlinear time series models (TAR, SETAR, STAR, models), conditional volatility models (ARCH, GARCH, models), stochastic volatility models, state-space models. Model building, estimation, model validation and forecasting. Multivariate time series models and cointegration techniques. Statistical software will be used to apply the techniques to real-life data.

**STAT 758: Advanced Spatial Statistics (2, 2, 3)**

Topics include: spatial data manipulation and mapping, spatial descriptive statistics, spatial regression model for heterogeneous categorical outcome, spatial principal component analysis, point pattern analysis, spatial interpolation using Geostatistics. Statistical software *R* will be used to apply the techniques to real-life data.

**STAT 760: Advanced Multivariate Analysis (2, 2, 3)**

Topics include: Review of multivariate normal distribution, bivariate linear regression model, bivariate logistic regression model, extension to multivariate linear regression, multivariate logistic regression model, exploratory factor analysis, path analysis, structural equation modelling, repeated measures. Statistical software will be used to apply the techniques to real-life data.

**STAT 762: Advanced Sampling Methods (2, 2, 3)**

Topics include: Review of probability and non-probability sampling techniques, unequal probability sampling, two-stage sampling, two-dimensional sampling, capture-recapture sampling, randomized response sampling, nonresponse correction techniques, weighting adjustment.

**STAT 764: Advanced Experimental Design and Analysis (2, 2, 3)**

Topics include: Review of design of experiment, sample size planning, Statistical power, within-subject designs, mixed model, categorical outcome, nested designs, partially nested designs, repeated measures and related designs, two-level factorial and fractional factorial design. Multi-level design and analysis of variance, missing data. Statistical software will be used to apply the techniques to real-life data.

**STAT 766: Artificial Neural Networks (2, 2, 3)**

Introduction to artificial neural networks: Biological neural networks, Pattern analysis tasks: Classification, Regression, Clustering, Computational models of neurons, Structures of neural networks, Learning principles. Linear models for regression and classification: Polynomial curve fitting, Bayesian curve fitting, Linear basis function models, Bias-variance decomposition, Bayesian linear regression, Least squares for classification, Logistic regression for classification, Bayesian logistic regression for classification. Feed-forward neural networks: Pattern classification using perceptron, Multilayer feed-forward neural networks (MLFFNNs), Pattern classification and regression using MLFFNNs, Error back propagation learning, Fast learning methods: Conjugate gradient method, Auto-associative neural networks, Bayesian neural networks. Radial basis function networks: Regularization theory, RBF networks for function approximation, RBF networks for pattern classification. Kernel methods for pattern analysis: Statistical learning theory, Support vector machines for pattern classification, Support vector regression for function approximation, Relevance vector machines for classification and regression. Self-organizing maps: Pattern clustering, Topological mapping, Kohonen’s self-organizing map. Feedback neural networks: Pattern storage and retrieval, Hopfield model, Boltzmann machine, Recurrent neural networks.

**STAT 768: Advanced Statistical Computing (2, 2, 3)**

Topics include: Programming in *R*, writing functions and scripts, optimizing functions, creating *R* packages with documentations, debugging, generating pseudo random numbers, Monte Carlo simulation in parameter inference, Bootstrapping techniques, Bayesian methods and Markov chain Monte Carlo Simulation.

**STAT 851: Thesis Work and Report Writing I (0, 15, 0) **

Research proposal writing and presentation.

**STAT 853: Seminar I (0, 15, 0)**

This is the first of four seminars organized in the Department. All students in the programme are expected to attend all seminars. Each student is expected to make his/her own presentation on a project proposal. The topic must relate to statistical issues including insurance, medicine, mortality and mobility, health outcomes, economics, policy, pension, social phenomena, mathematical finance, statistics, and other related fields with particular reference to the advancement of the statistics profession.

**STAT 855 Capstone (2, 2, 3)**

The capstone project will be an analysis using any statistical software tool that answers a specific scientific/business question: (1) A large and complex dataset will be provided to learners and the analysis will require the application of a variety of methods and techniques introduced in previous courses, including Computational Statistics I & II, statistical modelling as well as interpretations of these results in the context of the data and the research question. Report writing on a project to include the following sections: Motivation, problem definition, and existing approaches, Proposed solution and details of implementation, Results, conclusion, and directions for future work

**YEAR TWO SEMESTER TWO**

**STAT 852: Thesis Work and Report Writing II (0, 15, 0) **

Thesis report writing and presentation.

**STAT 854: Seminar II (0, 15, 0)**

This is the second in the sequel of seminar presentations. All students in the programme are expected to attend all seminars. Each student is expected to make his/her own presentation on the experiential research learning progress made on his/her research.

**YEAR THREE SEMESTER ONE**

**STAT 951: Thesis Work and Report Writing III (0, 15, 0) **

Thesis report writing and presentation.

**STAT 953: Seminar III (0, 15, 0)**

This is the third in the sequel of seminar presentations. All students in the programme are expected to attend all seminars. Each student is expected to make his/her own presentation on the experiential research learning progress made on his/her research.

**YEAR THREE SEMESTER TWO**

**STAT 952: Thesis Work and Report Writing IV (0, 15, 0**)

Thesis report writing and presentation.

**STAT 954: Seminar IV (0, 15, 0)**

This is the fourth in the sequel of seminar presentations. All students in the programme are expected to attend all seminars. Each student is expected to make his/her own presentation to discuss the findings of his/her research.

**YEAR FOUR SEMESTER ONE**

**STAT 1051: Thesis Work and Report Writing V (0, 15, 0)**

Thesis report writing and presentation.

**STAT 1053: Seminar V (0, 15, 0)**

This is the fourth and final in the sequel of seminar presentations. All students in the programme are expected to attend all seminars. Each student is expected to make his/her own presentation to discuss the findings of his/her research.

**YEAR FOUR SEMESTER TWO**

**STAT 1052: Final Thesis Report (0, 15, 0)**

The research is undertaken in either an applied area or theoretical development of statistical methods, after presentation of the proposal as specified in STAT 851. The final write-up of the thesis should be submitted by the end of the fourth academic year of study.

**STAT 1054: Final Seminar (0, 15, 0)**

Presentation of the final thesis (Viva voce)