Wikipedia:Database queries

related topics
{math, number, function}
{system, computer, user}
{style, bgcolor, rowspan}
{work, book, publish}

You can download the database from download.wikipedia.org and use MySQL to query.

Asking developers to run queries on the live database

Some developers have the ability to run queries on the live database. You may request they do this at tswiki:Query service.

Users with both technical access to the database and the willingness to perform queries may be found by viewing the list of query-service users.

Using an external tool

It was once possible to run an sql query tool under http://www.wikisign.org, which supported Wikipedias of major languages etc. Currently inactive.

Notes

  • When copying a query from this page, some browsers copy a '*' in front of the SELECT statement. This has to be removed!
  • Most of these queries return article titles. These do not contain the namespace! The namespaces have numeric codes in the separate cur_namespace field. 0 is the regular article namespace, others can be found on Wikipedia:Namespace

To avoid time-consuming or malformed queries, here are some useful suggestions (please add more - some requests might be found on this page's talk page):

Note: to convert the returned list in mysql to wiki-format use: s/(\ |\|)//g;s/^/*\[\[/;s/$/\]\]/

Quick instructions for running your own MySQL queries

This entire process can be completed in ~1 hour on a broadband connection. <screenshots coming soon>

Example: (sample output)

mysql> SELECT c.cur_title FROM wikipedia.cur c
    -> WHERE c.cur_text LIKE "%directory.google.com%";
  • Also consider installing the MySQL Query Browser, which is a very pretty graphical user interface for MySQL queries. To use it, you start the MySQL server on the command line, and then run this program on top of it. Of the options it gives you, you need only to type in 'localhost' for the server address and push enter. (You will likely need to restart if installing both the MySQL server and query browser at the same time.)<screenshots coming soon>

[edit] Modifiers

To limit results:

"SELECT ... LIMIT 20".

You can also add an offset;

"SELECT ... LIMIT 100, 20" will give you 20 records starting at the 101st.

[edit] Queries to find new users

Here's a couple of queries to find the 20 most newly created users - useful to find people to give welcome messages. It only counts users that have made edits (most users who create a login never make a single contribution and are just passing by, so it's not really worth welcoming them).

  • SELECT user_name, COUNT(*) FROM user, cur WHERE user_id=cur_user GROUP BY user_id ORDER BY user_id DESC LIMIT 20

The above query gives the user names of the 20 most recently created users and the number of edits they have made. The query is not perfect - only edits that have not been 'overwritten' by other edits are counted here. To do the same thing looking only at 'overwritten' edits use:

  • SELECT user_name, COUNT(*) FROM user, old WHERE user_id=old_user GROUP BY user_id ORDER BY user_id DESC LIMIT 20

[edit] Queries to find stub articles

For short pages containing "see" (takes about 2 seconds):

  • SELECT cur_title FROM cur WHERE cur_is_redirect=0 AND LENGTH(cur_text)<50 AND cur_text LIKE '%see%'

finds all pages with less than 50 chars.

[edit] Queries to find articles with one edit (new)

SELECT cur_title FROM cur WHERE cur_is_new AND cur_namespace=0;

Finds all pages that have only been edited once.

[edit] Queries to find inappropriate links

[edit] Double Redirects

Article namespace:

  • SELECT ca.cur_namespace, ca.cur_title FROM cur ca, cur cb, links l WHERE ca.cur_is_redirect=1 AND cb.cur_is_redirect=1 AND l.l_from=ca.cur_title AND l.l_to=cb.cur_id AND ca.cur_namespace=0 LIMIT 250

Talk namespace:

  • SELECT ca.cur_namespace, ca.cur_title, FROM cur ca, cur cb, links l WHERE ca.cur_is_redirect=1 AND cb.cur_is_redirect=1 AND l.l_from=CONCAT("Talk:",ca.cur_title) AND l.l_to=cb.cur_id AND ca.cur_namespace=1 LIMIT 250

These queries are slow. You may have to reduce the limit for them to work.

Full article ▸

related documents
Wikipedia:Browser notes
Client-side scripting
Denormalization
Initialization vector
Serial number
Code coverage
World file
Oberon programming language
Occam (programming language)
True BASIC
SNOBOL
AutoLISP
Bourne shell
Pliant
Curl (programming language)
Tiny BASIC
Computer algebra system
Server-side scripting
Wikipedia:MediaWiki FAQ
Wikipedia:Free On-line Dictionary of Computing/I - K
Asterisk
Java API for XML Processing
Wikipedia:Federal Standard 1037C terms/computer programming terms
Accumulator (computing)
PureBasic
Java applet
Rich Text Format
Erlang unit
Flyweight pattern
Dynamic HTML