A Social Statistics Research and Training Grid (SocStat)
Project Overview
Observational social science data sets are relatively small (18GB), but the intricacies of human behaviour create a complex set of interdependencies between the variables of a data set. We illustrate how the complex and comprehensive nature of the models that reflect these intricacies take our analysis onto a computational GRID. An appropriate computational environment would allow an array of novel and interesting statistical models to be formulated and fitted. The GRID would provide the computational environment for a proper exploration of these new models. The prime movers have a history of constructive collaboration stretching over 20 years. This GRID would allow for a much closer working relationship over a much broader front, allowing us to pool modelling and programming experience in longitudinal and multivariate statistical analysis more generally. We have particular interests in developing more general methods for sample selection bias, models for non-parametric latent variables and synthetic estimation รข where a multiplicity of datasets is exploited simultaneously. The comprehensive models needed to understand socio-economic behaviour will require the maximization of a complicated function in many hundreds, perhaps thousands, of dimensions (parameters).
Objectives
- To construct an extended statistical methodology for correlated responses (correlation arises from unobserved variables) that possess state dependence (the dependence of current behaviour on earlier or related outcomes) in the presence of nonstationarity which occurs when there are changes in the scale and relative importance of the systematic relationships over time. State dependence is a very important type of causality, and our methodology disentangles it from the confounding effects.
- To develop parametric models which allow for a correlation between the initial response and subsequent response and to remove the restrictions implied by time constant random coefficients
- To illustrate the superiority of our methods in three applications. First, in an analysis of the relationship between educational attainment, truancy and part-time work using data from the Youth Cohort Study (versions 6 to 9) and the National Child Development Survey (NCDS). Second, in estimating a semi-markov model of labour market transition behaviour with individual level correlation between the states (i.e. risks) of employment, unemployment and out-of-the-labour-market using data from the British Household Panel Survey (BHPS). Third, in estimating a panel model of employment participation and hourly earnings, allowing for dropout of respondents, using BHPS and NCDS surveys. All applications are of substantive importance and policy relevance.
- To train social scientists to use the GRID and to provide a consultancy service