Text size: / +


Use this form to search and/or filter the Helsinki Corpus. Please note that due to the large size of the corpus and the extensive set of parameters search may take a while.

In the case of texts that have more than one set of parameters (different sets of parameters referring to different divisions of the text), all such parameter sets are searched and a match is reported if search succeeds in any of the parameter sets. However, old parameter sets which have been superseded by errata-corrected ones are not searched. For more on parameter sets, see the manual's entry on the textClass element. For general information about the parameters used, consult the Reference Codes section of the original manual.

Reset strings and parameters

Search strings
Search string (text):case-insensitive
Search string (title):case-insensitive
Search string (author):case-insensitive

The first field searches through the corpus texts, the second one through text titles, the third one through authors' names.

You may use Perl-style regular expressions. For example:

  • soul will match all occurrences of ‘soul’, including ‘soul’, ‘souls’, ‘ensoul’, and so on
  • \bsoul will match all words beginning with ‘soul’: thus ‘soul’ and ‘souls’ but not ‘ensoul’
  • (s|f)oul\b will match all words ending either ‘soul’ or ‘foul’
  • chaucer|gower will match all occurrences of ‘chaucer’ and ‘gower’
  • \bhe?art\b will match ‘heart’ and ‘hart’
  • \b(\w+)\b\s+\b\1\b will match all sequences in which a word is repeated once (a maximum of 5 backreferences (\1, \2, ...) is supported)
  • let\s*te will match the form ‘lette’, as well as ‘let te’ and other forms where any amount of white space separates ‘let’ and ‘te’

Ash (æ, Æ), e caudata (ę), eth (ð, Ð), yogh (ȝ, Ȝ), thorn (þ, Þ), crossed thorn (ꝥ, Ꝥ) and l with stroke (ł) may be escaped as follows, respectively: @a, @A, @e, @d, @D, @g, @G, @t, @T, @tt, @TT and @l. Alternatively, if your system supports this, you may enter Unicode characters directly.

You may also leave the search string fields empty, in which case the corpus is simply filtered by the parameters set below.


Select (or deselect) multiple options by holding down the Ctrl or the Shift key.

Note that OX does not include O1, O2, O3 and O4, but is used to mark those texts for which no more exact dating than ‘–1150’ is available. Similarly for MX and and EX. Please refer to the original manual for detailed information on the parameterisation.

Date of original
Date of manuscript
'Dialect' Reference Code Values
'Verse' or 'Prose' Reference Code Values
'Text Type' Reference Code Values
'Relationship to Foreign Original' Reference Code Values
'Language of Foreign Original' Reference Code Values
'Relationship to Spoken Language' Reference Code Values
'Sex of the Author' Reference Code Values
'Age of the Author' Reference Code Values
'Social Rank of Author' Reference Code Values
'Audience Description' Reference Code Values
'Participant Relationship' Reference Code Values
'Interaction' Reference Code Values
'Setting' Reference Code Values
'Prototypical Text Category' Reference Code Values

Sort search results
by author
by title
by date