Thursday, February 22, 2007

A Simple Model to Estimate Daily Visits of A Web Site

yeeyan.com, a translation 2.0 web site, was launched (actually, re-launched because we had a same-url-blog-site before) around the middle of December, 2006. A month later, on January 21st, 2007, I developed the following model to help our team to foresee the growth of daily visits of our web site.


List of Symbols:

V(t): number of daily visitors at time t
Vnew(t): number of daily new visitors at time t
Vold(t): population size of old (sustained) visitors at time t
p: the probability that an old visitor visits the site in a day
rnew: drop rate of new visitors
rold: drop rate of old visitors

Note that a new visitor is defined as a visitor who hasn't visited the web site for a certain period, for example, a month.

The model consists of two equations:

(1) V(t) = Vnew(t) + p * Vold(t)

(2) Vold(t) = (1-rold)*Vold(t-1) + (1-rnew)*Vnew(t-1)

In stable status:

Vnew(1-rnew) = Vold*rold => Vold = Vnew*(1-rnew)/Vold
Thus, V = [1+p*(1-rnew)/rold]*Vnew

Static Analysis:

V is proportional to Vnew, which represents how effectively the web site is promoted to potential users.

V is also somewhat proportional to p, which is a reflection of how often the content of the web site is updated.

V is inverse proportional to rold, which reflects how "sticky" the web site is (the smaller rold, the stickier the web site). rold is often seen as the most critical factor. Some people believe this should be the only factor of concern. Theoretically, if rold was zero, V would be infinite.

Dynamic Analysis:

p has nothing to do with dynamic characteristic. It doesn't appear in the recursive equation (2); any change in p directly goes to V and there is no transit time.

rold, again, is the major factor that affects the system dynamics. If you know something about control theory and Z-transform, you will learn that (1-rold) is the only root of the system. When rold gets closer to zero, the root gets closer to 1 and it takes longer time for the system to reach its stable status.

Discussion:

(1) All parameters can be obtained by using Google Analytics;

(2) Though all parameters are time varying, within a certain period, they change slowly and thus can be seen as constants;

(3) The estimation / forecast is valid only for short term or intermediate term. I tracked the records of daily visits of our web site, and they are very close to my estimation by now.