# PhD Statistics

Graduate study in Statistics is very essential because it leads to some specific business positions including portfolio analysis, design studies, statistical analysis, computer simulation and software design (Data Analytics), testing, and other areas of operations research. Many laboratories, both government and private, maintain independent research staffs that include statisticians. Their work often deals with the development of new technology, including design and analysis of experiments, software development, and numerical simulation, such as weather and climate forecasting, which depends heavily on the use of supercomputers. The PhD programme in Statistics prepares students for managerial positions and also research work, leading to peer-reviewed publications as well as employment opportunity in research institutions as faculty. The diversity of applications of Statistics is an exciting and challenging, which is one reason why the demand for well-trained statisticians continues to be strong.

The programme emphasizes on the teaching of theory and principles of mathematics, statistical theory and methodology, and applications to provide the basis for meaningful practical applications. Option I (Mathematical Statistics) requires candidates to undertake research involving rigorous mathematical and statistical theories to promote knowledge in the area of Mathematical Statistics while Option II (Applied Statistics) will require candidates to develop various innovative techniques to solve real-life problems. This is in line with the Department of Mathematics’ graduate training of students to formulate abstract mathematical models for real-world problems and also design and apply appropriate computer-based solutions to real-world problems.

**Aims and Objectives**

The aim of the PhD programme is to produce high calibre graduates with rigorous research and analytical skills, who are well-equipped to go onto postdoctoral research, or employment in industry and private/public service that require the application of high level Statistical concept to solve problems. Graduates of the programme will:

- Be competent in mainstream advanced statistical theory and modelling;
- Be exposed to modern developments in Statistics;
- Have the ability to design and conduct research in academic/industrial settings;
- Have the ability to serve as better bridges between the academic and corporate worlds;
- Have an appreciation and necessity to enable them undertake postdoctoral research in Statistics.

**Content of courses for each semester**

**YEAR ONE SEMESTER ONE**

**STAT 701: Mathematical Statistics (4, 0, 4)**

Topics to be covered include the following: Order statistics; Theory of estimation: Criteria of estimation, sufficiency, completeness, uniqueness and exponential class probability density functions, Cramer-Rao inequality and methods of estimation; Statistical hypotheses testing: Review of significance test, Power function, losses and risks, most powerful, generalised, likelihood ratio, conditional and sequential tests; Decision theory: Basic concepts, decision criteria, minimax and Bayesian estimation criterion. Non-parametric statistics: Various estimation methods based on kernels, smoothing splines, local polynomials, etc. would be considered.

**STAT 751: Probability and Measure Theory (2, 4, 4)**

The topics include: Discrete random variables and their probability distributions; Continuous random variables and their distributions multivariate probability distributions; Construction of Lebesque measure on R: extension of the length of an interval; Extension of parameters (from rings of subset to algebra, uniqueness of extension, Caratheodory method using outer measure of general sets); Measure spaces and measurable functions (Definition, vector lattice properties of the space of measurable, real-valued functions on a measure space); Integration (Definition of integralbility, integration of real-valued, measurable functions defined on general space, momnotone convergence theorem, dominated convergence theorem, Fatou’s Lemma).

**STAT 753: Demographic Methods (2, 4, 4)**

This course introduces the basic techniques of demographic analysis. Students will become familiar with the sources of data available for demographic research. Population composition and change measures will be presented. Measures of mortality, fertility, marriage and migration levels and patterns will be defined. Life table, standardization and population projection techniques will also be explored.

**STAT 755: Statistical Quality Control (2, 4, 4)**

Topics include the definition of quality, its measurement through statistical techniques, variable and attribute control charts, CUSUM charts, multivariate control charts, process capability analysis, design of experiments, and classical and Bayesian acceptance sampling. Development of statistical concepts and theory underlying procedures used in quality control applications. Sampling inspection procedures, the sequential probability ratio test, continuous sampling procedures, process control procedures, and experimental design.** **Statistical software will be used to apply the techniques to real-life case studies from manufacturing and service industries.

**STAT 757: Stochastic Process (2, 4, 4)**

Review of probability theory, regularity of stochastic processes, convergence of random process. Random Process, Special Processes, Poisson Process, Stationarity, Continuity, differentiation and integration, Ergodicity, Special classes of Random Process (Markov Sequence and Processes) Point processes, Analysis of Queues, Brownian motion and its Martingales, Diffusion processes.

**STAT 759: Algorithm for Data Science (2, 4, 4)**

Methods for organizing data (hashing, trees, queues, lists, priority queues). Streaming algorithms for computing statistics on the data. Sorting and searching. Graph models and algorithms for searching (shortest paths, and matching). Neural networks (DNNs, CNNs, and RNNs), with Tensor Flow 2.0. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.

**STAT 761: Analysis of Categorical Data (2, 4, 4)**

This course introduces methods for analyzing response data that are categorical, rather than continuous. Topics include: categorical response data and contingency tables. Generalized linear models; Linear models for binary data, Generalized linear models for counts, moments and likelihood for generalized linear models, Inference for generalized linear models, fitting generalized linear models, Quasi-likelihood and generalized linear models, Generalized additive models. Log-linear and logit models, Poisson regression, model diagnostics, ordinal data, estimation procedures. Procedures in statistical packages that can handle generalized linear models will be covered.

**STAT 763: Operation Research (2, 4, 4)**

Dynamic programming and heuristics. Project scheduling; probability and cost considerations in project scheduling; project control. Critical path analysis. Reliability problems, replacement and maintenance costs; discounting; group replacement, renewal process formulation, application of dynamic programming. Queuing theory in practice: obstacles in modeling queuing systems, data gathering and testing, queuing decision models, case studies. Game theory, matrix games; minimax strategies, saddle points, mixed strategies, solution of a game. Behavioural decision theory, descriptive models of human decision making; the use of decision analysis in practice.

**STAT 765: Theory of Econometrics (2, 4, 4)**

This course covers the. Topics include Review: mathematical expectation, Sampling distributions and inference, Regression basics. Multivariate regression: matrix form, Dummy variables and interactions; testing linear restrictions using F-tests; Inference problems - heteroscedasticity and autocorrelation. Instrumental variables and 2SLS; simultaneous equations models; measurement error. Panel Data Models, Volatility models: ARCH and GARCH family models, and multivariate volatility models. Practical using EVIEWS and R software

**STAT 767: Applied Machine Learning Techniques for Industry (2, 4, 4) **

This course relies entirely on available open source implementations in scikit-learn, tensor flow, R and Python for all implementations. Steps in Data Analytics and Missing Data Imputation (Data cleaning and preparation, model selection and evaluation techniques, Missing Data Imputation such as Rpart, Mice imputation, Mean imputation, KNN imputation). Machine Learning Algorithms (Supervised, Unsupervised, Reinforcement machine learning algorithms applications), -Gradient Boosting techniques. Machine Learning Algorithm and Usage in Applications (Filtering Spam, regression and naïve-bayes Filtering Spam, Data Wrangling: APIs and other tools for scrapping the Web). Feature Generation and Feature Selection (Extracting Meaning From Data for (customer) retention, feature Generation for brainstorming, role of domain expertise, and place for imagination), Feature Selection algorithms, Filters; Wrappers; Decision Trees; Random Forests. Recommendation Systems (Building a User-Facing Data Product, Algorithmic ingredients of a Recommendation Engine, Dimensionality Reduction, Singular Value Decomposition, Principal Component Analysis). Data Visualization (Basic principles, ideas and tools for data visualization, building of dash boards). Data Science and Ethical Issues (privacy, security, ethics).

**YEAR ONE SEMESTER TWO**

**STAT 752: Survival Analysis (2, 4, 4)**

Survival distributions, Types of censored data, Estimation for various survival models, Non-parametric estimation of survival distributions, The proportional hazard and accelerated lifetime models for covariate data, Regression analysis with lifetime data. Practical Aspects; Statistical models for transfers between multiple states (e.g., alive, ill, dead), the multi-state Markov model, relationship between probabilities of transfer and transition intensities, estimation for the parameters in these models; The binomial and Poisson models of mortality.

**STAT 754: Bayesian Statistics (2, 4, 4)**

To introduce the concepts of Bayesian inference and the analysis of data using Bayesian methods. The concept of prior and posterior distributions; connections with the classical approach; estimation and loss; hypothesis testing and the Bayes factor; Bayesian computation and Markov Chain Monte Carlo.

**STAT 756: Advanced Time Series Analysis (2, 4, 4)**

Univariate time series: stationary, autocorrelation function, trends, ARIMA processes, unit roots, fractional ARIMA processes, forecasting, distributed lags, maximum likelihood estimation (MLE), model selection criteria, regression models with ARIMA errors and spectral analysis. Multivariate time series: Stable vector autoregressive (VAR) models, cointegration techniques, Vector error correction models (VECM), structural VARs and VECMs, Unit roots and cointegration in panels. Threshold models: TAR, STAR, ESTAR and LSTAR Models.

**STAT 758: Spatial Statistics (2, 4, 4)**

Types of spatial data: geostatistical, lattice or areal, and point process data. Specification and fitting of probability models for spatial data using geostatistical techniques of kriging and point process methods for spatial case control and area-level analysis. Use of stochastic processes and hierarchical models to represent the complex dependencies that often arise. Clustering and cluster detection of events. Use of a specific statistical package such as R (open source), ArcGIS or ISATIS to implement spatial data analysis.

**STAT 760: Multivariate Analysis (2, 4, 4)**

Multivariate Normal distribution, Distribution of sample mean and covariance multiple, partial and canonical correlation; multivariate regression; tests on means and covariances; MANOVA; principal components analysis; factor analysis; discriminant analysis and classification; cluster analysis; multidimensional scaling.

**STAT 762: Advanced Sample Survey Methods (2, 4, 4)**

Sample survey designs: Basic concepts of sampling, Sampling designs: sampling with varying probabilities; stratified, systematic, multistage techniques of sample design: multiphase designs; selection with probability proportional to size (PPS); Probability sampling procedures, estimation of population total, mean and proportion. Non-probability sampling procedure, Jacknife and Bootstrap procedures for resampling. Complex Surveys. Ratio and regression estimations; panel design; model based sampling survey errors, and re-sampling methods. Use of appropriate software to calculate standard errors (variance estimation).

**STAT 764: Design and Analysis of Experiments (2, 4, 4)**

An introduction to the design and analysis of experiments, Topics include the design and analysis of completely randomized designs, randomized block designs, Latin square designs, incomplete block designs, factorial designs, fractional factorial designs, nested designs and split-plot designs and response surface designs. Students will complete and present a research project on an advanced topic in experimental design. Applications involve the use of a statistical software package. Experimental design and analysis: Basic concepts of planning and designing experiments, multiple comparisons, randomized block designs, factorial designs, nested and split-plot designs, Latin square designs; Analysis of covariance and confounding; Application, and use of statistical computing packages (such SPSS, R, Genstat, Excel, etc.).

**STAT 766: Artificial Neural Networks (2, 4, 4)**

**Introduction to artificial neural networks: **Biological neural networks, Pattern analysis tasks: Classification, Regression, Clustering, Computational models of neurons, Structures of neural networks, Learning principles**. Linear models for regression and classification: **Polynomial curve fitting, Bayesian curve fitting, Linear basis function models, Bias-variance decomposition, Bayesian linear regression, Least squares for classification, Logistic regression for classification, Bayesian logistic regression for classification**. Feed-forward neural networks: **Pattern classification using perceptron**, **Multilayer feed-forward neural networks (MLFFNNs), Pattern classification and regression using MLFFNNs,** **Error back propagation learning**, **Fast learning methods: Conjugate gradient method**, **Auto-associative neural networks**, **Bayesian neural networks**. Radial basis function networks: **Regularization theory**, **RBF networks for function approximation**, **RBF networks for pattern classification. **Kernel methods for pattern analysis: **Statistical learning theory, Support vector machines for pattern classification, Support vector regression for function approximation, Relevance vector machines for classification and regression.** Self-organizing maps: **Pattern clustering**, **Topological mapping**,** Kohonen’s self-organizing map. **Feedback neural networks: **Pattern storage and retrieval, Hopfield model, Boltzmann machine, Recurrent neural networks.

**STAT 768: Data Science Computational Models for Social Mining (2, 4 ,4) **

Mining Social-Network Graphs (Social networks as graphs, Clustering of graphs, direct discovery of communities in graphs, Partitioning of graphs, Neighborhood properties in graphs). Sentiment Analysis: automatic detection of people’s sentiment towards a topic, event, product, or persons. Practical applications in various domains will be discussed (e.g., predicting stock market prices, or presidential elections). Emotion and Mood Analysis: automatic detection of people’s emotions (angry, sad, happy) by analyzing various media such as books, emails, lyrics, online discussion forums. Practical applications in various domains (such as predicting depression, categorization of songs). Belief Analysis and Hedging: automatic detection of people’s beliefs (committed belief and non-committed beliefs) from social media. Analysis of the use of hedging as a communicative device in various media: online discussions, scientific writing or legal discussions. Deception Detection (e.g., detecting fake reviews online, or deceptive speech in court proceedings). Argumentation Mining: automatic detection of arguments from text, such as online discussion or persuasive essays. Practical application for various domains (e.g., political, legal or education (e.g., improving students’ skills in writing persuasive essays). Social Power: automatic detection of power structure in organizations by analyzing people’s communications such as emails. Extracting Social Networks from text, such as networks of characters from novels, or networks from social media (e.g., people holding particular opinions, or network of friends). Personality and Interpersonal Stance.

**YEAR TWO SEMESTER ONE**

**STAT 851: Seminar I (2, 4, 9)**

This is the first of four seminars organized in the department. Each student in the Department or Programme is expected to attend all seminars scheduled. Each student is expected to make his/her own presentation on a project proposal. Topics must relate to statistical issues such as insurance, medicine, mortality and mobility, health outcomes, economics, policy, pension, social phenomena, mathematical finance, statistics, and other related fields with particular reference to the advancement of the statistic profession.

**YEAR TWO SEMESTER TWO**

**STAT 852: Seminar II (2, 4, 9)**

This is the second in the sequel of seminar presentations. Each student in the Department or Programme is expected to attend all seminars scheduled. Each student is expected to make his/her own presentation on the experiential research learning progress made on his/her research.

**YEAR THREE SEMESTER ONE**

**STAT 953: Seminar III (2, 4, 9)**

This is the third in the sequel of seminar presentations. Each student in the Department or Programme is expected to attend all seminars scheduled. Each student is expected to make his/her own presentation on the progress made on his/her research.

**YEAR THREE SEMESTER TWO**

**STAT 954: Seminar IV (2, 4, 4)**

This is the fourth and final in the sequel of seminar presentations. Each student in the Department or Programme is expected to attend all seminars scheduled. Each student is expected to make his/her own presentation to discuss the findings of his/her research.

**YEAR FOUR SEMESTER ONE**

**STAT 1053: Seminar V (2, 4, 4)**

A statistics project is undertaken in either an applied area or theoretical development of statistical methods, after presenting a proposal as specified in STAT 710. The final write-up of the project should be submitted by the end of the fourth academic year of study.

**STAT 1054: Final Seminar (2, 4, 4)**

Oral examination on submitted thesis.