Elixir/HTML dump scraper: Difference between revisions
typo |
retitle |
||
| Line 22: | Line 22: | ||
For those interested in the results, please skip ahead to the [https://phabricator.wikimedia.org/T332032#9011167 raw summary results], but much more detail and analysis will be published in the future. | For those interested in the results, please skip ahead to the [https://phabricator.wikimedia.org/T332032#9011167 raw summary results], but much more detail and analysis will be published in the future. | ||
=== | === Obstacles to finding references in wikitext === | ||
A raw reference is straightforward in wikitext and looks like: <code><nowiki><ref>This footnote.</ref></nowiki></code>. If this were the end of the story, it would be simple to parse references. What makes it more complicated is that many references are produced using reusable templates, for example: <code><nowiki>{{sfn|Hacker|Grimwood|2011|p=290}}</nowiki></code>. | A raw reference is straightforward in wikitext and looks like: <code><nowiki><ref>This footnote.</ref></nowiki></code>. If this were the end of the story, it would be simple to parse references. What makes it more complicated is that many references are produced using reusable templates, for example: <code><nowiki>{{sfn|Hacker|Grimwood|2011|p=290}}</nowiki></code>. | ||