User:Chocolateboy/Dashes
From Wikipedia talk:Manual of Style (dates and numbers):
(The file normalized.txt referenced below is a version of the 20040727 cur table dump with talk pages removed (script available on request). Due to stack overflow issues in Perl's recursive regular expression engine, a few longer articles are also excluded from these statistics.)
Globally, spaced hyphens are at least 15 times more common than ndashes and mdashes combined:
- grep '–' normalized.txt | perl -pe '$_ = join ($/, /–/g) . $/' | wc -l
- > 14663
- grep '—' normalized.txt | perl -pe '$_ = join ($/, /—/g) . $/' | wc -l
- > 16526
- grep ' - ' normalized.txt | perl -pe '$_ = join ($/, / - /g) . $/' | wc -l
- > 494155
Likewise, hyphens are approximately 40 times more popular than dashes for date ranges:
- grep '\]\] – \[\[' normalized.txt | perl -pe '$_ = join ($/, /\]\] – \[\[/g) . $/' | wc -l
- > 2698
- grep '\]\]–\[\[' normalized.txt | perl -pe '$_ = join ($/, /\]\]–\[\[/g) . $/' | wc -l
- > 2599
- grep '\]\]-\[\[' normalized.txt | perl -pe '$_ = join ($/, /\]\]-\[\[/g) . $/' | wc -l
- > 59366
- grep '\]\] - \[\[' normalized.txt | perl -pe '$_ = join ($/, /\]\] - \[\[/g) . $/' | wc -l
- > 160911
As you can also see from those stats (which exclude some date ranges and include some non-date-ranges: patches welcome!), spaced hyphens are used approximately 3 times more often than unspaced hyphens.
chocolateboy 23:39, 16 Sep 2004 (UTC)