Most models of English word recognition limit their domain of simulation to one syllable words and there is little straightforward empirical data to guide the development of more complex models of reading that would simulate the full set of words a reader is usually exposed to. However, typical reading material consists of polysyllabic words that are influenced by factors which are not present in one syllable words, such as the influence of stress on pronunciation, the influence of context, and the impact of segmentation ambiguity. An issue that arises, therefore is whether the present models, eventually very successful at simulating one-syllable words, in fact present a convincing solution to an inappropriately worded problem.In this study, we present an attempt to reach a clearer understanding of polysyllabic word reading. As a result of the lack of empirical or modeling data, corpus analysis seems the most appropriate technique to use to try to systematically investigate the role of different possible factors on performance when reading aloud. A quantitative description of grapheme-phoneme associations of monosyllabic and disyllabic English words (with their British English pronunciations) is provided as well as details of the methodology adopted for segmenting semi-automatically the spelling and pronunciation of the words into graphemes and phonemes. The data obtained on the distribution of the pronunciations of the different graphemes of the language are used to proceed to a comparison of the predictability of the pronunciation of monosyllabic and disyllabic words. We argue that these data indicate that current theories of monosyllabic word reading cannot be taken as satisfying theories of reading for the whole range of words a reader is exposed to.