четверг, 10 января 2013 г.

A perl script for gathering blogger stats and showing it on a terminal

I decided to experiment and write this post in English. Recently I uploaded a small perl script bloggerstats on the github (see here). The script can help to gather statistics of a blog hosted on the blogger.com. The stats can be dumped on a terminal or redirected to a file. As well pretty charts in PNG format can be created. See images below.


The usage of the script is shown with -h or --help option. The script requires several mandatory and optional perl modules: JSON, URI::Escape, Encode, Data::Dumper (for dumping received JSON data in debug mode), Text::TabularView (for printing result in pretty tables), Chart::Bars and Time::Local (for making charts). All modules are either normally shipped with perl distributions or available in the system repository (I found all of them in my standard Fedora 17 rpm repository). The script also requires several external programs: sed, curl (for network communication with blogger.com) and ngrep (for sniffing sensitive data required by blogger.com, see below).

To start using it one should provide basic configuration settings in file $HOME/.bloggerstatrc. I put a template file bloggerstatrc.tmpl in the github repository to start with. Here is its content:
$blogID      = 'put-here-your-blog-id';
$bloghost    = 'blogger.com';

$statsurl    = "http://www.$bloghost/blogger_rpc?blogID=$blogID";

$start_year  = 2010;
$start_month = 5;       # months start from 1, so January is 1

# AUTOGENERATED (do not delete this line!)
Actually .bloggerstatrc is an ordinary perl source file and is to be sourced from bloggerstats, so you can put there any statements that perl can compile, but those 5 variables defined in this template are mandatory ones. Of course you have to substitute your real blogID instead put-here-your-blog-id (this is the value of the parameter blogID in any related HTTP GET request to blogger.com: you can find it in the address line in your browser). You also have to substitute proper values of $start_year and $start_month: they must correspond to the first bin of the all-time visits chart in your blog. The line with comment starting with AUTOGENERATED must reside below your regular settings: it is used by the sniffer for updating other sensitive variables (cookies, headers and xsrf token) below it. Every time you run bloggerstats -w all content below this line gets removed!

Why do I say here about sniffer? Unfortunately blogger.com API seems to be closed at the moment. The only thing known is that it uses GWT (Google Web Toolkit) and JSON for sending data to the client. GWT produces very large obfuscated Javascript code which finally creates xsrf token and sends it to the server. The algorithm that creates it is unknown. The generated token is valid through approximately 24 hours. Besides the token blogger.com checks for sessionID cookies (which can also become invalid but not so often as the token) and GWT related HTTP headers. All these data are collected by the sniffer and put below the comment line starting with AUTOGENERATED in .bloggerstatsrc. As soon as sniffing network interfaces requires root privileges the script must possess them. To achieve this login as root, open some new sudoers file (say /etc/sudoers.d/users) with visudo and put there following lines
Cmnd_Alias       NETTASKS = /usr/sbin/ngrep
<your-login-id>  ALL = NOPASSWD: NETTASKS
where <your-login-id> is your system login name. Word NOPASSWD in the second line is important: it prompts sudo do not ask password when starting ngrep.

So now that you configured basic parameters in .bloggerstatsrc you may want to launch the sniffer by running bloggerstatrc -w in a terminal (it will wait until stats request is sent to blogger.com via a browser), open your blogger.com stats page in a browser or just refresh it. The sniffer must exit (though it may fail to exit at very heavy network load: refresh stats page several times in this case) and now bloggerstats is ready to gather blogger statistics. As soon as saved xsrf token or cookies become invalid (in one day or so) you will see errors, running the sniffer once again will bring bloggerstats back to life.

Finally I want to show settings for tabular highlights as seen on the first image above (look here about hl and Term::Highlight). File .hlrc:
snippet bstats  -b -82 '^(?:\+-+)+\+$' '^\|\s+' '\s+\|$' \
                '\s+\|\s+(?=.*\s+\|$)'-67 \
                '(?<=^\|)\s+(?:Overview|Page|Keyword|Site|URL|Country)\s+' \
                '(?<=^\|)\s+(?:Browser|OS)\s+' -215 '(?<=\|)\s+\d+\s+(?=\|$)' \
                -rb -48 '(?<=^\|)\s+Today\s+(?=\|)'
File .hl_functions:
function bloggerstats
{
    `env which bloggerstats` $@ | hl -sbstats
}
Vielleicht schreibe ich nächstes Mal auf Deutsch :)

Комментариев нет:

Отправить комментарий