Elixir/HTML dump scraper: Difference between revisions

typo
retitle
Line 22: Line 22:
For those interested in the results, please skip ahead to the [https://phabricator.wikimedia.org/T332032#9011167 raw summary results], but much more detail and analysis will be published in the future.
For those interested in the results, please skip ahead to the [https://phabricator.wikimedia.org/T332032#9011167 raw summary results], but much more detail and analysis will be published in the future.


=== Challenging to find references in wikitext ===
=== Obstacles to finding references in wikitext ===
A raw reference is straightforward in wikitext and looks like: <code><nowiki><ref>This footnote.</ref></nowiki></code>.  If this were the end of the story, it would be simple to parse references.  What makes it more complicated is that many references are produced using reusable templates, for example: <code><nowiki>{{sfn|Hacker|Grimwood|2011|p=290}}</nowiki></code>.
A raw reference is straightforward in wikitext and looks like: <code><nowiki><ref>This footnote.</ref></nowiki></code>.  If this were the end of the story, it would be simple to parse references.  What makes it more complicated is that many references are produced using reusable templates, for example: <code><nowiki>{{sfn|Hacker|Grimwood|2011|p=290}}</nowiki></code>.