Всем добрый день.
Самая первая/последняя книга по данной тематике, которую читал - Mair P. Modern Psychometrics with R (Use R!). 2018,
https://doi.org/10.1007/978-3-319-93177-7 . В книге есть картинка (в приложении), примеры получения результатов PCA при помощи Gifi на интервальных данных. Объяснение математики там коротенькое:
"First of all, Gifi models involve dimension reduction, just as principal component analysis (PCA) and correspondence analysis (CA). Let p be the number of dimensions which needs to be fixed a priori. Let H be an n × m data matrix. Correspondingly, hj represents the column vector for variable j with kj as the number of categories. For each variable we define an indicator matrix Gj of dimension n×kj, consisting of 0s and 1s in the case of categorical data. These indicator matrices can be then collected in an indicator supermatrix G = (G1| . . . |Gm). Each variable is associated with a matrix Yj of dimension kj × p containing the category quantifications. The final component we need is the matrix X. It contains the so-called object scores and is of dimension n × p. At the end of the day, each person gets a score in the p-dimensional space, and each category of variable j gets an optimally scaled category quantification in p dimensions. Since we scale both the objects and the variables, these methods are sometimes referred to as dual scaling methods. Putting all these ingredients together, Gifi establishes the following loss function:
σ(X, Y1, . . . , Ym) = m ЗНАК СУММЫ j=1 tr(X − Gj Yj )(X − Gj Yj )
The right-hand side of the equation represents a sum-of-squares (SS) expression that needs to be minimized. This can be achieved by an alternating least squares (ALS) algorithm. This loss formulation is very general, and, depending on the particular Gifi model we fit, it simplifies correspondingly, or, for some versions, it can even get more complicated (see De Leeuw and Mair, 2009a)."
По моему опыту использования движок в пакете Gifi делает оптимальное шкалирование лучше функции
lineals пакета {aspect}. Настроек там много, лучший вариант трансформации приходится искать перебором. К сожалению в пакете нет отдельной возможности получить просто трансформацию данных без последующего выполнения PCA. В этой связи приходится трансформированные данные извлекать из большого контейнера с данными. Есть ещё пакеты {optiscale}, {bestNormalize} и {smacof}, которые могут быть полезны при решении вопроса трансформации данных.
Ссылки из книги:
De Leeuw, J. (1988). Multivariate analysis with linearizable regressions. Psychometrika, 53, 437?454.
De Leeuw, J., & Mair, P. (2009a). Gifi methods for optimal scaling in R: The package homals. Journal of Statistical Software, 31(1), 1?21.
https://www.jstatsoft.org/index.php/jss/article/view/v031i04De Leeuw, J., & Mair, P. (2009b). Simple and canonical correspondence analysis using the R package anacor. Journal of Statistical Software, 31(5), 1?18.
http://www.jstatsoft.org/v31/i05/De Leeuw, J., Mair, P., & Groenen, P. J. F. (2017). Multivariate analysis with optimal scaling.
http://gifi.stat.ucla.edu/gifi/_book/Gifi, A. (1990). Nonlinear multivariate analysis. Chichester: Wiley.
Haegeli, P., Gunn, M., & Haider, W. (2012). Identifying a high-risk cohort in a complex and dynamic risk environment: Out-of-bounds skiing?An example from avalanche safety. Prevention Science, 13, 562?573.
Hoyle, R. H., Stephenson, M. T., Palmgreen, P., Pugzles Lorch, E., & Donohew, R. L. (2002). Reliability and validity of a brief measure of sensation seeking. Personality and Individual Differences, 32, 401?414.
Jacoby, W. G. (1991). Data theory and dimensional analysis. Thousand Oaks: Sage.
Jacoby, W. G. (1999). Levels of measurement and political research: An optimistic view. American Journal of Political Science, 43, 271?301.
Koller, I., Levenson, M. R., & Glück, J. (2017). What do you think you are measuring? A mixedmethods procedure for assessing the content validity of test items and theory-based scaling. Frontiers in Psychology, 8(126), 1?20.
Linting, M., Meulman, J. J., Groenen, P. J. F., & van der Kooij, A. J. (2007). Nonlinear principal components analysis: Introduction and application. Psychological Methods, 12, 336?358.
Профили авторов пакета {Gifi} с их публикациями:
https://www.researchgate.net/profile/Jan_De_Leeuwhttps://www.researchgate.net/profile/Patrick_MairМожет кому-то пригодится.