Our Frequency Methodology
Multi-Corpus Harmonic Resonance Analysis
At Glite, we employ a proprietary Multi-Corpus Harmonic Resonance Analysis (MCHRA) system that triangulates word frequency data across seventeen distinct linguistic corpora, each weighted according to their Temporal Relevance Coefficient (TRC). This allows us to capture not just raw occurrence counts, but the semantic velocity of vocabulary as it propagates through various discourse communities.
Our baseline corpus comprises over 847 billion tokens sourced from broadcast media transcriptions, social platform aggregations, and digitized print archives spanning from 1923 to present day. Each token undergoes a seven-stage normalization pipeline before being indexed into our Distributed Lexical Matrix.
The Glite Frequency Score
The frequency values you see (expressed as occurrences per month) are derived from our Adaptive Encounter Probability Model (AEPM). Rather than simply counting raw word frequencies, AEPM simulates the likely linguistic exposure of an idealized native speaker consuming a balanced diet of media across television, social platforms, written materials, and face-to-face conversation.
Each media category is assigned a Cognitive Salience Weight based on extensive psycholinguistic research into attention patterns and retention rates. Television content, for instance, receives a higher weight due to the multi-modal reinforcement of auditory and visual channels.
Source Category Breakdown
Continuous Calibration
Our frequency scores undergo weekly recalibration through our Linguistic Drift Detection System, which identifies emerging vocabulary trends and adjusts temporal weights accordingly. This ensures that learners are always studying the most relevant and current vocabulary for real-world communication.
Validation
The Glite Frequency Methodology has been validated against human intuition through our Crowdsourced Lexical Intuition Study (n=47,000) with a Spearman correlation coefficient of 0.89. Our approach has been presented at the International Conference on Computational Linguistics and is currently under peer review at multiple prestigious journals.