# MSc. Applied Statistics

The MSc. Applied statistics programme is designed to give students/ researchers with a comprehensive training that will equip them for careers as professional statisticians who can work efficiently in Business, Consulting firms, Education, Medicine, Government, Research Institutions, Financial Institutions and any other establishment that requires the use of Statistical concepts and applications. The emphasis is on developing each student’s ability in solving substantial and real life problems, communicating effectively the findings and in developing new techniques. The programme equally provides students with good preparation for entry into the Ph.D. program. Students would be required to write a project work, which substitutes for two courses. The project defense usually replaces the master's final exam. In most applications, students will be assigned to groups would be required to go for oral examinations.

**Aims and Objectives**

On successful completion of the MSc. Applied statistics programme, students will have:

- Acquired the knowledge and skill of applying and adapting statistical theory and modeling techniques to real world problems in both observational and designed studies.
- Acquired skill in dimensionality reduction, classification and multivariate inferential methods for multivariate data.
- Acquired through the data analysis part of the course and the project work, both analytical and computational skills involving the use of statistical computer packages to enable them comprehend and solve a broad range of statistical analysis and also communicate the results effectively.
- Gained some experience of communicating mathematical and statistical arguments and conclusion.
- Develop the ability to model real world situations and assist in the development of solutions to a variety of mathematical and / or statistical problems.

**Content for Courses for each Semester**

**Year 1: Semester 1**

**STAT 501: Probability and Statistical Theory (4, 0, 4)**

Probability theory: General concept of Probability Theory; Random variables and Random Vectors and it’s properties; Expectation and independence, convergence concepts, Law of Large numbers, Central limit theorem, Characteristic functions, multivariate transformations of variables. Distribution Functions: Univariate Discrete and Continuous Probability Distributions: Binomial, Geometric, Negative Binomial, Hypergeometric, Negative Multinomial, Poisson, Uniform, Exponential, Chi-square, Gamma, Beta, Normal, Pareto, Lognormal and Weibull; Multivariate Probability Distributions including Bivariate Normal distribution and Order Statistics, Multivariate Hypergeometric, Multivariate Logarithmic Series.

**STAT 503: Statistical Inference (4, 0, 4)**

Formulation of the principles of statistical decisions as an aspect of the theory of games, Bayes Minimax and admissible rules. The main theorems of statistical decision theory. Invariants and equivalent decision rules. Methods of solving for minimax, admissible minimax and invariant and equivalent rules. Particular applications to location parameter problems. Decision theoretic approach to the theory of hypothesis testing. Bayes tests. Neyman-Pearson and the generalized Neyman-Pearson Lemma. Uniformly Most Powerful Test, unbiased, invariant and locally most powerful tests. Invariant and minimax tests. Maximum likelihood estimation and asymptotic theory.

**STAT 505: Categorical Data Analysis**

Quick review of discrete probability distributions: binomial, multinomial, Poisson. Introduction to the concept of likelihood. Tests for one-way tables using Pearson’s *X*2 and likelihood-ratio *G*2 statistics. Introduction to contingency tables. 2 × 2 and *r* × *c* tables, tests for independence and homogeneity of proportions, Fishers exact test, odds ratio and logit, other measures of association. Introduction to 3-way tables, full independence and conditional independence, collapsing and Simpson’s paradox. Introduction to generalized linear models. Poisson regression. Logistic regression for dichotomous response, including interpretation of coefficients, main effects and interactions, model selection, diagnostics, and assessing goodness of fit. Log linear models (and graphical models) for multi-way tables.

**STAT 507: Further Regression Analysis (3, 1, 4) **

Simple Linear Regression Model: Model for E(Y|x), model for distribution of errors, least squares estimation of model for E(Y|x), Estimation of variance, Regression through the origin. Inferences for Simple Linear Model: Inferences concerning the slope (confidence intervals and t-test), Confidence interval estimate of the mean Y at a specific X , Prediction interval for a new Y , Analysis of Variance partitioning of variation in Y ,R-squared, calculation and interpretation. Diagnostic procedures for aptness of model: Residual analyses, Plots of residuals versus fits, residuals versus x, residuals versus new x, Tests for normality of residuals, Lack of Fit test, Pure Error, Lack of Fit concepts, Transformations as a solution to problems with the model, Weighted Least Squares a solution for variance problems. Matrix Notation and Literacy for Regression Models: X matrix, β vector, matrix formula for estimating coefficients, Linear dependence issues, and Variance-covariance matrix of sample coefficients. Multiple Regression Models and Estimation: Multiple predictor variables, Basic estimation and statistical inference within multiple regression, Interaction terms and the interpretation of interactions. General Linear *F*-test for testing hypotheses: Reduced and Full models associated with hypotheses about the model’s coefficients, *F*-test for general linear hypotheses, Assessing and interpreting the effect of a single predictor variable within a multiple regression, Properly interpreting the t-test, Sequential Sums of Squares, Partial correlation between y and an x-variable. Examining All Possible Regressions to Identify the Potential Models: R2, Adjusted R2, MSE, Cp, AIC, and BIC criteria,, Stepwise algorithms for identifying models. Incorporating Categorical Predictor Variables: Indicator Variables, Interpretation of models containing indicator variables. More Diagnostic Measures and Remedial Measures for Lack of Fit: Variance Inflation Factors, Deleted Residuals, Influence statistics - Hat matrix, Cook's D and related measures. Models not of full rank.

**STAT 509: Applied Stochastic Models **

Review of Probability Theory, Regularity of Stochastic Processes, Convergence of Random Walks to Brownian Motion. Brownian Motion and its Martingales, Diffusion Processes, Stochastic integrals, Ito’s Formula, Stochastic Differential Equations. Stochastic models are used to represent random processes which evolve over time. The course will focus on Stochastic Processes and their applications to real-world situations for students to be able to choose appropriate models, exercise these models using appropriate mathematical and computer techniques and interpret the results in plain language. The topics include the following: Basic knowledge of Probability Theory including Random Variables, Independence and Limit Theorems and Theory of Stochastic Processes; Discrete-Time Markov Chains as applied to Random Walks, Reliability, Branching Processes, Markov Chain Monte Carlo, etc.; Continuous-Time Markov Chains as applied to Queuing Theory, Risk Theory, Population Processes, etc.; Counting Processes, for example, Poisson Processes in time space.

**STAT 511: Statistical Quality Control**

Basic Concepts of Quality Control and Deming’s principles of management. Statistical Process Control: Control Charts with Variables (X bar charts, S chart, P charts) and Attributes; Process Characterization and Capability Analysis, Operations Strategy; CUSUM and EWMA, Short Production Runs. Acceptance Sampling: Sampling Plans, Operating Characteristic Curves, Effects of lot size. Role of Design of Experiments in quality Improvement; Total Quality management (TQM).

**STAT 511: Applied Time Series Analysis **

Time Series Analysis concerns random quantities that evolve over time. Practical examples include stock market prices, interest rates, unemployment levels, consumer demand, temperature, rainfall, river flows, We learn the Basic concepts of time series modelling, discrete time series trends. The classical models- AR, MA, ARMA and ARIMA. Stationary and non-stationary models, estimation and Identification. Forecasting, univariate and multivariate procedure; prediction theory. Spectral theory, the spectral density function; Fourier analysis and harmonic decompositions; periodogram analysis; spectral analysis, effects of linear filters; estimation of spectra; confidence intervals for the spectrum. Use of Statistical Package for time series data.

**STAT 517: Operations Research I (3, 0, 3)**

Getting Started, Linear Programming - Model Formulation & Graphical Method, Simplex Method, Duality & Sensitivity Testing, Transportation Problem, Assignment Problem, Integer Programming, Goal Programming, Game Theory, Waiting Line Models, Inventory Control Models, Dynamic Programming, Replacement Models, Sequencing Models, Nonlinear Programming, Simulation Models

**STAT 519: Sample Survey theory and methods (3, 0, 3)**

Review of basic sampling theories and designs (SRS, Stratified, Systematic, multistage Techniques of sample design: multiphase designs; selection with probability proportional to size (PPS)); sampling with varying probabilities; Optimal allocation strategies; general aspects of replicated and successive sampling bootstrap and jackknife); panel design; model based sampling. Bias and nonresponse: sources of survey errors, non-coverage, nonresponse. Ecological sampling methods, inferential problems of finite populations. Scope: types of surveys undertaken, sampling techniques used, issues and problems. Use of appropriate software to calculate standard errors (Variance estimation).

**STAT 521: Demographic Methods (3, 0, 3)**

This course introduces the basic techniques of demographic analysis. Students will become familiar with the sources of data available for demographic research. Population composition and change measures will be presented. Measures of mortality, fertility, marriage and migration levels and patterns will be defined. Life table, standardization and population projection techniques will also be explored.

**Year 1: Semester 2**

**STAT 502: Non-Parametric Methods ****(4, 0, 4)**

Statistical procedures based on ranks, order statistics, signs, permutations and runs (simple one-sample tests; order statistics, empirical distribution function, ranks and runs; general nature of nonparametric tests, allocation of scores, confidence intervals; efficiency and robustness considerations; dealing with tied observations. Goodness of fit tests. General two-sample and c-sample problems; linear rank tests; Wilcoxon's rank sum test; Kruskal-Wallis test; Friedman test). Testing for randomness, symmetry and independence (Measures and tests for association; analysis of contingency tables; Kendall's tau, Spearman's rank correlation; coefficient of concordance). ** **Efficiency of rank tests by Pitmans and Bahadur . Smoothing and Spline techniques, nonparametric Regression.

**STAT 512: Design and Analysis of Experiment (3, 0, 3)**

Basic ideas and assumptions. Randomization, multiple comparisons, Randomized blocks and Latin squares, Balanced Incomplete block designs, Alpha design. Design of factorial experiments, confounding and fractional replication. Random effects; components of variance; mixed model – maximum likelihood, BLUE and BLUP. Concept of design resolution; Response surface methodology and model building applications. Analysis of Covariance. Application and use of Statistical Computing packages.

**STAT 504: Multivariate Data Analysis (4, 0, 4)**

Basic knowledge of Multivariate Calculus, Linear Algebra and Probability/Statistics is assumed. The topics will include: Multivariate Normal distribution, Properties and related distributions. Wishart and Hotellings distributions, Application of estimation of parameters and tests of means and covariance matrix; Multivariate Linear Regression Models; Multivariate Analysis of Variance (MANOVA); Principal Components, Factor Analysis, Canonical Correlation Analysis, Discrimination and Classification, Cluster Analysis; Use of Statistical Computing packages to analyze real-life data.

**STAT 510: Applied Stochastic Processes II ****(3, 0, 3)**

Brownian Motion and its application in Industry and Finance. Reflected Processes in Higher dimension. Renewal Theory, Stochastic Algorithms in Optimization, Markov Decision Processes. Concepts of Simulation and emphasis on Computer Simulation methods using Spreadsheets and Computer Algebra Systems (MAPLE, MATHEMATICA, MATHCAD). Random Time change and 1-dimensional diffusions, Brownian Motion on the half line. Convergence of Markov Chains to Diffusions, Reflected processes in Higher Dimensions.

**STAT 514: Concepts of Bayesian Inference **

Elements of decision theory: Statistical games; the no data problem. Loss and regret, mixed actions, the minimax principle, Bayes actions; decision with sample data; decision rules, risk function, Bayes decision rules. Bayesian inference: Problems associated with classical approach; Bayes' approach: prior and posteri distributions; specification of prior distribution; Bayesian estimation, properties of Bayes' estimators; Bayesian tests and confidence sets; examples of situations where Bayesian and classical approaches give equivalent or nearly equivalent results. One-parameter and multiparameter models, predictive checking and sensitivity analysis. Simulation of probability distributions. Sequential methods: Sequential probability ratio test; Stein fixed width confidence intervals. Current methodological issues in Statistics.

**STAT 504: Introduction to Longitudinal Data Analysis (4, 0, 4)**

The goals of the course are to develop the skills necessary to identify an appropriate technique, estimate models, and interpret results for independent research and to critically evaluate contemporary social research using advanced quantitative methods. The course will be applied in the sense that we will focus on estimating models and interpreting the results, rather than on understanding in detail the mathematics behind the techniques. Introduction to Longitudinal Data Analysis, Fixed effects models, Random effects models, GEE models and GLS models with complex error structures, Missing data analysis, H-F and G-G adjustments of the F- statistic, Profile, Polynomial, mean and other transformations. Applications in ecology and health sciences.

**STAT 518: Advance Topics in Operations Research (3, 0, 3)**

Dynamic programming and heuristics. Project scheduling; probability and cost considerations in project scheduling; project control. Critical path analysis. Reliability problems; replacement and maintenance costs; discounting; group replacement, renewal process formulation, application of dynamic programming. Queuing theory in practice: obstacles in modelling queuing systems, data gathering and testing, queuing decision models, case studies. Game theory, matrix games; minimax strategies, saddle points, mixed strategies, solution of a game. Behavioural decision theory, descriptive models of human decision making; the use of decision analysis in practice.

**STAT 506: Data management and Statistical Computing**

This course aims to give an overview of techniques in numerical analysis that are useful in the advanced practice of statistics. The course is roughly divided into three parts: Evaluation of special functions, numerical linear algebra (linear solvers, matrix factorizations, eigenvalue problems). Optimization (unconstrained methods, simplex method, active set methods, penalty function methods, combinatorial optimization). Simulation (importance and rejection sampling, Markov chain methods, exact methods). Application of these procedures in solving problems in statistics. The course will cover some theoretical issues, but will primarily focus on the design and implementation of algorithms.