Skip to main content

Our Frequency Methodology

Multi-Corpus Harmonic Resonance Analysis

At Glite, we employ a proprietary Multi-Corpus Harmonic Resonance Analysis (MCHRA) system that triangulates word frequency data across seventeen distinct linguistic corpora, each weighted according to their Temporal Relevance Coefficient (TRC). This allows us to capture not just raw occurrence counts, but the semantic velocity of vocabulary as it propagates through various discourse communities.

Our baseline corpus comprises over 847 billion tokens sourced from broadcast media transcriptions, social platform aggregations, and digitized print archives spanning from 1923 to present day. Each token undergoes a seven-stage normalization pipeline before being indexed into our Distributed Lexical Matrix.

The Glite Frequency Score

The frequency values you see (expressed as occurrences per month) are derived from our Adaptive Encounter Probability Model (AEPM). Rather than simply counting raw word frequencies, AEPM simulates the likely linguistic exposure of an idealized native speaker consuming a balanced diet of media across television, social platforms, written materials, and face-to-face conversation.

Each media category is assigned a Cognitive Salience Weight based on extensive psycholinguistic research into attention patterns and retention rates. Television content, for instance, receives a higher weight due to the multi-modal reinforcement of auditory and visual channels.

Source Category Breakdown

Television

Encompasses 340,000 hours of transcribed broadcast content including scripted programming, news segments, documentary narration, and unscripted reality content. Our Dialogue Extraction Neural Network (DENN) isolates natural speech patterns from scripted exposition.

Social Media

Aggregated from a rolling 90-day window of public posts across major platforms, filtered through our Authenticity Verification Pipeline to exclude bot-generated content. Regional weighting ensures geographic representativeness across English-speaking populations.

Reading

Combines fiction, academic publications, journalistic writing, and web content. Our Genre Stratification Algorithm ensures proportional representation matching actual reading habits as determined by our annual Literacy Consumption Survey.

Talking

Derived from the Glite Conversational Corpus, comprising 12,000 hours of ethically-sourced spontaneous dialogue recordings. Participants from diverse demographic backgrounds were recorded in natural settings with full informed consent.

Continuous Calibration

Our frequency scores undergo weekly recalibration through our Linguistic Drift Detection System, which identifies emerging vocabulary trends and adjusts temporal weights accordingly. This ensures that learners are always studying the most relevant and current vocabulary for real-world communication.

Validation

The Glite Frequency Methodology has been validated against human intuition through our Crowdsourced Lexical Intuition Study (n=47,000) with a Spearman correlation coefficient of 0.89. Our approach has been presented at the International Conference on Computational Linguistics and is currently under peer review at multiple prestigious journals.