Semantic Factor Analysis of Language Model Embeddings.
semanticfa performs exploratory factor analysis on
language model embeddings of psychological scale items, recovering
latent factor structure entirely from item text — no human response data
required.
Sign-flip by keying, L2-normalize, cosine similarity
"atomic"
L2-normalize, cosine similarity (no sign-flip)
"squid"
Subtract questionnaire-mean embedding, then cosine
"mean_centered_pearson"
Mean-center → cosine = Pearson correlation
References
Milano, N., Luongo, M., Ponticorvo, M., & Marocco, D. (2025).
Semantic analysis of test items through large language model embeddings
predicts a-priori factorial structure of personality tests. Current
Research in Behavioral Sciences, 8, 100168.
doi:10.1016/j.crbeha.2025.100168
Casella, M., Luongo, M., Marocco, D., Milano, N., & Ponticorvo,
M. (2024). LLM embeddings on test items predict post hoc loadings in
personality tests. Ital-IA 2024, CEUR Workshop
Proceedings.
Guenole, N., D’Urso, E. D., Samo, A., Sun, T., & Haslbeck, J. M.
B. (preprint). Enhancing Scale Development: Pseudo Factor Analysis of
Language Embedding Similarity Matrices. OSF: https://osf.io/3mpzb/
Pellert, M., Lechner, C. M., Sen, I., & Strohmaier, M. (2026).
Neural network embeddings recover value dimensions from psychometric
survey items on par with human data (SQuID). Findings of the ACL:
EACL 2026, 5738–5752.
Pokropek, A. (2026). From keyword-based text measures to latent
variables: Confirmatory factor analysis with word embeddings. EPJ
Data Science. doi:10.1140/epjds/s13688-026-00654-1
Kmetty, Z., Koltai, J., & Rudas, T. (2021). The presence of
occupational structure in online texts based on word embedding NLP
models. EPJ Data Science, 10, 55.
doi:10.1140/epjds/s13688-021-00311-9
Christensen, A. P., Garrido, L. E., & Golino, H. (2023). Unique
Variable Analysis: A network psychometrics method to detect local
dependence. Multivariate Behavioral Research, 58(6), 1165–1182.
doi:10.1080/00273171.2023.2194606
Golino, H. (2026). Optimizing the landscape of LLM embeddings with
Dynamic Exploratory Graph Analysis for generative psychometrics.
arXiv:2601.17010.
Wulff, D. U., & Mata, R. (2025). Semantic embeddings reveal and
address taxonomic incommensurability in psychological measurement.
Nature Human Behaviour, 9(5), 944–954.
doi:10.1038/s41562-024-02089-y
Wulff, D. U., & Mata, R. (2026). Escaping the jingle-jangle
jungle: Increasing conceptual clarity in psychology using large language
models. Current Directions in Psychological Science, 35(2),
59–65. doi:10.1177/09637214251382083
Hommel, B. E., & Arslan, R. C. (2025). Language models
accurately infer correlations between psychological items and scales
from text alone. Advances in Methods and Practices in Psychological
Science, 8(4). doi:10.1177/25152459251377093
License
GPL (>= 3)
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.