Wikipedia:Statistieken/Bestbezocht/December 2007

Uit Wikipedia, de vrije encyclopedie

This week for the first time, Domas Mituzas has made visitor statistics available. In the last 5 days (120 hours), the following were the 300 most visited pages on the Dutch Wikipedia:

It would be interesting to know how these statistics can best be used. --LA2 15 dec 2007 01:47 (CET)[reageren]

Die betaalde developers zijn weer hard aan het werk zo te zien. Zoiets hadden we nog helemaal niet. - Berkoet (voorheen Dammit) 15 dec 2007 01:52 (CET)[reageren]
Dat kun je wel zeggen Berkoet, maar Leon blijkt zijn wikicharts net helemaal veranderd te hebben waardoor die maar half werkt. Statistieken geven trouwens (per definitie) niet altijd dezelfde uitkomsten. Overigens merk ik dat die tools praktisch onvindbaar zijn als je ze nodig hebt. De Main page van tools.wikimedia.de is gebrekkig, de zoekfunctie idem, tools staan gegroepeerd in subpagina's van gebruikers zodat je moet weten van welke gebruiker een bepaalde tool afkomstig is, en vooral: de meeste paina's zijn niet gecategoriseerd en zwerven dus maar ergens los in de ruimte. De enige plek waar ze allemaal bij elkaar staan, blijkt deze lijst van Interiot te zijn, die op datum gesorteerd is en die je dus helemaal langs moet lopen als je iets specifieks zoekt. De zoektocht kostte me daarnet anderhalf uur. Misschien dat een van de geregelde gebruikers aldaar hier toch eens de aandacht op kan vestigen. - Art Unbound 15 dec 2007 13:19 (CET)[reageren]
Kan dit overzicht misschien in een uitklap-sjabloon worden ondergebracht, maar dan ook in kolommen voor beter overzicht? --VanBuren 15 dec 2007 14:01 (CET)[reageren]
Done! I broke the list into 6 columns of 50 items each. But I doubt that this makes the information more comprehensible. The big problem is that this is the top 300 items from a list of 1 million entries (including special pages, images, talk pages, etc.). How do we make the full list useful? Consider these logarithmically distributed entries: 1st place Speciaal:Zoeken (1902929 page views), 2 Hoofdpagina (664182), 5 Speciaal:Volglijst (27329), 10 Wiki (14913), 20 Verenigde Staten (8043), 50 Amsterdam (4644), 100 Zuid-Afrika (3213), 200 HDTV (2287), 500 Obesitas (1552), 1000 Badr Hari (1086), 2000 Le Corbusier (736), 5000 Hematocriet (414), 10000 Antidiuretisch hormoon (248), 20000 Engelse buldog (141), 50000 Amstrad GX4000 (59), 100000 Afbeelding:ChakraDiag.jpg (27), 200000 Yvon Durelle (10), 500000 Altiphrynoides (3), 1000000th place Afbeelding:E-mail Otterlo.PNG (1 page view). --LA2 15 dec 2007 16:11 (CET)[reageren]
Interresant om te zien dat, afgezien van een paar 'voor de hand liggende' pagina's (zoeken, hoofdpagina, etc.), vrijwel alle meest bezochte artikelen te maken hebben met nieuwsfeiten (Ike Turner, Arne Jansen, etc.). Wikipedia wordt blijkbaar gezien als een goede bron voor snelle achtergrondinformatie. Husky (overleg) 15 dec 2007 16:45 (CET)[reageren]
I guess everything below 10,000 should be discarded altogether to begin with. Nr. 4 (Arne Jansen) is a recent death but doesn't even show up on the Wikicharts list of December (= last 15 days), so low numbers are too untrustworthy. Most interesting question should be: what are the main ways for visitors to access Wikipedia? Speciaal:Zoekpagina is by far the most visited page but you can only access it from within of course. Hoofdpagina is second meaning that lots of visitors start by entering wpikipedia.nl right away and search from there (roughly, with each Main page visit 3 search ops are performed). The other main access is by external search. "Arne Jansen" may be accessed from outside, but nr. 54, Foekje Dillema (another recent death) will most probably be accessed from the Main page (too old for anyone to remember this athlete). Illnesses like Tuberculose (actuality) and Leukemie are most likely external search results. It's a lot of guesswork but I should say, most visitors turn to Wikipedia Main page and start browsing from there. Now I wonder how Speciaal:Willekeurig can be third? Lots of these visits must be out of utter boredom ;) - Art Unbound 15 dec 2007 17:19 (CET)[reageren]
Klopt deze lijst wel?, De Kroeg zie ik hier niet in staan (maar wellicht heb ik nu onderhand een leesbril nodig). Peter boelens 15 dec 2007 18:05 (CET)[reageren]

De Kroeg staat op plek 81 (event ctrl+f), maar na een test blijkt dat Leon's tool nog altijd goed werkt. Ik blijf er dus bij dat de core developers hun tijd beter aan belangrijker dingen kunnen besteden. - Berkoet (voorheen Dammit) 15 dec 2007 18:10 (CET)[reageren]

I guess that the vast majority find Wikipedia after a Google search. This is the sum of all page views for specific topics, such as Arne Jansen, Ike Turner, HDTV, Obesitas, Bard Hari, Le Corbusier and thousands of other pages. Then there are a few million page views from people who go to the Main page and start searching from there. Among all pages, the Main page (and Search) are the most popular, but the sum of the other pages is far greater. --LA2 15 dec 2007 18:25 (CET)[reageren]
Op nr. 88 Afbeelding:Red pog.svg (3450). Dit betekent waarschijnlijk dat er aardig wat bezoekers op een rood bolletje klikken op een positiekaart (waar uiteraard niets te vinden is dan de afbeeldingbeschrijving..). Offtopic: Enig idee om dergelijke nutteloze klik-acties te voorkomen? Michiel1972 15 dec 2007 19:05 (CET)[reageren]
Technisch geen idee of het mogelijk is, maar mij lijkt het het beste als je door op het rode bolletje te klikken gewoon bij de coördinatenpagina komt (dus dezelfde pagina als waar je komt als je op de coördinaten rechtsbovenin klikt). Erwin1990 16 dec 2007 00:28 (CET)[reageren]
LA2, this is what I meant. Maybe you could do the additions in an instant, I couldn't. I would seriously like to know which are greater: access from outside, or access from within. Your sum isn't conclusive there, most of thee topics will be found from within. Now, what we really want to know, which of both is true - most views come from outside, or from within. These stats don't give the answer and not even a clue yet. - Art Unbound 16 dec 2007 00:37 (CET)[reageren]
I don't think you can answer that question from the data we have here. You would need the full web server logs with referer URLs and client IP addresses. --LA2 16 dec 2007 03:29 (CET)[reageren]

Day by day trend[bewerken | brontekst bewerken]

Below is an attempt at finding the daily trends in visitor statistics. At each bullet is the page's rank on that day. In the parenthesis is an indicator of the advancement of the rank, compared to the previous day, where +2.0 means the old rank was a 10+2.0 = 100 times higher number. After the semicolon is the number of visits to the page on the given day. Only pages that moved more than +/- 0.5 are listed, in order to filter out the "usual suspects". For each day, 30 entries are listed. --LA2 16 dec 2007 05:09 (CET)[reageren]

11 dec 12 dec 13 dec 14 dec 15 dec

First of all, big kudos to all people involved, reliable usage data is something we've been doing without for far too long. However, from a webanalysis standpoint, this list, and for that part also the list above, are not really interesting. They represent a far too small snapshot of the site usage, and only tell you something about current affairs in the language area of that specific wikipedia. To really gauge page popularity, a much longer period is necessary, I'd say at least 4-6 months. This is also the reason why so many Special pages (Watchlist, Random etc) show up, because the regular users of the site, the editors themselves, are now overrepresented in your sample.

But even beyond page popularity, there is a much more interesting area of usability of the site, which is not a resultant of page hits, but rather of usage paths thru the site. Is it in someway possible to keep session data of users, perhaps not all visitors but only a small sample of users, and see what are the most used 3- or 4-step paths followed? Again, keep in mind that only long sample periods will give you an accurate average. So I know it's a lot of fun to juggle around all these nice statistics, perhaps you can come up with colorful graphs even, but perhaps you can spend your precious time more effectively. Regards, Mhaesen 17 dec 2007 10:25 (CET)[reageren]

Wat ben ik blij dat ik nog nooit een cent aan Wiki(m/p)edia gedoneerd heb, want ik hoef dit soort onzinnige lijstjesonzin niet te sponsoren :). CaAl (overleg) 17 dec 2007 11:51 (CET)[reageren]
Currently all we have is an hourly summary of the page view counts. All I have done is to add these numbers up by day and week. So it's not possible to track navigation paths from this data set. And these data have been available only since December 9, 2007, so in four months time we will have data for four months. Apart from the top lists above, the current data can be used as an indication (nb: not an exact science) of relative popularity among a set of articles. For example, how do articles about Maastricht compare? On Monday December 10, we had by rank 232 Maastricht (480 page views), 6117 Maastricht Aachen Airport (78), 6320 Verdrag van Maastricht (76), 8447 Flikken Maastricht (62), 9084 Sint-Servaasbasiliek (Maastricht) (58), 10396 Basiliek van Onze Lieve Vrouwe Tenhemelopneming (Maastricht) (53), 10713 Vrijthof (Maastricht) (51), 20455 Station Maastricht (30), 20486 Servatius van Maastricht (30), 26095 Universiteit Maastricht (24). It might come as a surprise to an encyclopedia editor that the airport is more popular than the treaty (verdrag), or that people interested in the airport (probably looking for time tables and driving directions) end up in an encyclopedia, rather than a travel website. This is an example of facts that can be extracted from the data, but not presented in a simple list. --LA2 17 dec 2007 14:18 (CET)[reageren]

Some preliminary analysis conclusions[bewerken | brontekst bewerken]

When you plot the numbers in a graph with a logarithmic Y axis, thereby skipping the top 10, the result is more or less a straight line for the second half of the plot. This means that you can use e.g. the last 100 entries (i.e. rank 200 and up) for a reasonably safe extrapolation. With those extrapolated numbers, it is possible to calculate the surface area below the entire graph, resulting in total page views. The result is:

  • A cross-over point (i.e. 1 page view) somewhere between 4.000 and 5.000 articles, implying that only 1% of the (380k) articles is regularly (more than once per week) viewed.
  • A total of some 4,5 mio page views in 5 working days. After correction for a bit lower traffic during week-ends that results in say 5,5 mio per week. That represents 0,35% of the page views for all projects in all languages (= 7 billion a month).

A few other interesting conclusions:

  • About half of the visitors of the main page get there via one of the main portals wikipedia.nl or wikipedia.be. The other half apparently remembers the somewhat complicated url and/or has wikipedia in a favourite list or similar.
  • The next logical step for somebody arriving at the main page is to use the search function in the main menu. The three times higher rate for special:search confirms my opinion that Wikipedia's search function is not very well adapted to the behaviour of the average visitor. Btw a conclusion I've drawn already long ago, see also [1]. - Rgds RonaldB 17 dec 2007 18:19 (CET)[reageren]
I'm not sure exactly how you calculated this. For the full week December 10–16 (Mon–Sun), the logfiles for nl.wp count 25.29 million page views of 1.17 million distinct URLs. These page views include Special:Search/Xyz and talk pages, so the number of URLs is not limited to the 380k articles. However, things like w/index.php?action=edit are not included, only URLs starting with /wiki/. For individual days, the number of page views varies between 2.69 million (Friday) and 4.11 million (Monday). The number of distinct URLs in a given day varies between 444 thousand (Friday) and 513 thousand (Wednesday). --LA2 18 dec 2007 02:54 (CET)[reageren]

This is a sample of (a part of) the XL sheet:

Rank Page Count Cum. count
1 Speciaal:Zoeken 1902929 1902929
2 Hoofdpagina 664182 2567111
3 Speciaal:Willekeurig 79729 2646840
etc.
300 Marketingmix 1898 3755778
400 1585,27941 3914305,941
500 1320,446022 4046350,543
1000 529,4132666 4387728,625
2000 85,10266404 4472831,289
3000 13,68016989 4486511,459
4000 2,199073909 4488710,533
5000 0,353498977 4489064,032

The formula for the column "cum. count" is obvious. Beyond rank 300 I applied the following formula (this one for XL row, which is rank 400):

=GROWTH(C$200:C$300;A$200:A$300;A301).

This computes the value for x=400 based on the values for rank 199-300, assuming that part is an exponential function. In the last column you can see the value creeping asymptotically to some 4,5 mio.

You could argue that the extrapolation is sensitive for my assumption. In the worst case there are very many articles with just one page view. That would mean a maximum of a bit less than 380k hits with a single page view, turnig the 4,5 mio into 5 mio.

The difference could be explained if my assumption that the values were given for 120 hours are wrong, but instead they are measured over 120 hours but calculated back to 24 hours. That would explain the difference of about a factor 5. It also would mean that the share of nl:w in total page views goes up to some 1,75%, which is closer to my gut feeling. - Rgds RonaldB 18 dec 2007 03:48 (CET)[reageren]