Posts Tagged Piwik

Piwik 1.9.2 does not honor DoNotTrack by default — Blame IE10?

Today Piwik 1.9.2 was released, and the first point of the release notes contains a bold move on the part of the Piwik development team: To NOT honor DoNotTrack by default for IE10. Why? Because IE10 has DoNotTrack enabled by default.

This is arguably a step in the wrong direction for the Piwik team, and DoNotTrack in general, as Piwik is forgoing honoring the setting just for the sake of collecting analytics data. It’s actually a step backwards from Piwik’s own 1.9.1 release, in which honoring DoNotTrack was enabled by default, and the user was recommended against changing it.

Click for larger image

Instead, what Piwik should have done is simple: Allow the user to choose at install or configuration time whether or not to honor the setting, with a clear statement of why each way, such as the following example text:

Respect DoNotTrack Preference?

It is encouraged to respect user privacy choices by respecting the DoNotTrack browser privacy setting. Please be aware, though, that IE10 has it enabled by default and some traffic may not be recorded.

How do you want to handle this?


[X] Respect DoNotTrack in all situations (recommended)
[ ] Respect DoNotTrack for all browsers except IE10 (for more accurate stats)
[ ] Do not respect DoNotTrack at all (not recommended)

I have opened a ticket against Piwik here: http://dev.piwik.org/trac/ticket/3531#ticket

This is just my thought. What are yours? Please feel free to share in the comments below.

, , , ,

1 Comment

How to automatically purge Piwik logs using a cron job

If you’re using Piwik Analytics, you may know that over time your Piwik database will continue to grow over time. The piwik log tables as well as the archive tables will continue to grow until you purge them.

The Piwik developers are aware of this and a ticket has been in their system for a while now to develop a way to automatically purge the Piwik logs. The FAQ describes a SQL query you can run to purge the Piwik logs manually, but you may want a way to do this automatically. Fortunately, here’s the solution, with a few caveats.

UPDATE: As of Piwik 1.5, the following method is depreciated. Piwik now has log purging implemented. See Piwik FAQ #42.

First, before proceeding, check the FAQ link for the correct SQL query to run. The SQL query has changed at least once that I’m aware of. The below guide reflects the SQL query that was posted as of the day this article was posted.

Second, this method uses a script which will contain a SQL password, so it does have some security implications. You can avoid these by making sure the script has the correct permissions. This guide will assume that your SQL user’s name is ‘piwik’ and you are running the script as unix user ‘www-data’. Please adjust for your individual configuration.

Third, and lastly, it’s suggested to run this query monthly. You will want to make sure your logs have been archived for the proceeding month. In most normal installations this is the case. The only time this would not be, is if you are archiving using a cron job and your cron job has not been running.

So here goes:

Step one:

Find a place where your cron user will have access to the SQL file. A suggested place is the misc/cron directory within your piwik installation, right next to the archive.sh file. Let’s call this file purge.sql. It’s not necessary for it to not be accessible from the web, as it won’t have any sensitive information in it — just the SQL query. Create this file and paste the following SQL query into it:

DELETE piwik_log_visit, piwik_log_link_visit_action
FROM piwik_log_visit INNER JOIN piwik_log_link_visit_action
WHERE piwik_log_visit.idvisit = piwik_log_link_visit_action.idvisit
AND visit_first_action_time <= DATE_SUB(CURDATE(), INTERVAL 30 DAY);
OPTIMIZE TABLE piwik_log_visit, piwik_log_link_visit_action;

Step two:

Create the script which will call the SQL query. This script will contain your SQL password, so it’s a good idea to make sure it is mode 0700 and outside a web-accessible directory. We will call this purge.sh. Assuming your database is called piwik, your SQL username is piwik, and your password is piwik123, create the file with this query:

mysql piwik -upiwik -p'piwik123' < /var/www/piwik/misc/cron/purge.sql

For your reference, the mysql command sequence is:

mysql -u -p < mysql.sql

Step three:

Time to schedule the cron job. You can either edit the crontab yourself from the terminal ( crontab -e ), or use your favorite web-GUI scheduler such as cPanel, Webmin, etc.

For the terminal users, you’ll want the crontab line to read:

1 0 1 * *     sh /path/to/purge.sh

This will schedule the purge.sh script to run at 12:01am on the first day of the month.

For the web-GUI users, refer to your GUI’s documentation.

Once this is done, just relax. Auto-purging will take place as scheduled.

Questions or comments about this? See anything that could be improved? Have a better way of doing this? Your comments are welcome, as always.

, , ,

Leave a comment

Clicky Web Analytics

I had written a previous post roughly comparing a few web analytics programs, using some criteria that was important to me, and I had purchased a license for Mint to use on one site. Having two sites, though, I wanted them to use the same analytics package, so I shelled out the extra $30 for a second Mint site license.

That was the easy part. The real pain came when I actually set it up. I had to install a second copy of Mint (because it’s a per-install/per-domain license) so I had to install a second copy, copy over all my plug-ins and configure it, making sure I used the same login and password (so I wouldn’t get them mixed up) and configure the software alike with the first install.

Then I realized something: I had to go to each domain’s Mint installation to view stats. I couldn’t view the stats for both sites in the same view. (Though there is a plug-in for that, but it gets installed to a single domain install — I’d have to install it to both installations and mirror the setup again — What a pain!)

So, out $30 for the new license and realizing after the fact it wasn’t a good fit for my setup, I went to Piwik, which is an Open-Source, self-installed web analytics package. You install it to a single location and set up tracking for all your websites from it. It’s a fairly good piece of software, but I ran into several nasty show-stopping bugs: zeroing visitors in the database and an issue with PHP and the archive.sh cron job (not even mentioning the still-unresolved ever-growing database issue). I want an analytics package I don’t have to fight with to get good information out of. I want to spend my time using the information I can gather, not spend the time fighting with my analytics software.

Then I tried out Clicky Web Analytics. I have to say I am extremely happy with the service, and the pricing. No software to have to think about (or keep up to date), pricing is extremely fair (in fact, the best I’ve seen with 1 site being completely free), and the feature set is unparalleled. Real-time stats, including content, search terms, referrers, individual actions, a customizable dashboard, even iPhone and Android-specific mobile versions. A full API, RSS feeds, and site widgets round off the service offering, and that’s just at the free level. Paid versions (starting at $5/mo or $30/yr for 3 sites) get even more features, such as advanced data segmentation and the ability to name visitors using either the web interface or a CMS plugin.

The real icing on the cake with Clicky? They provide a non-js tracking code (in the form of a 1×1 transparent pixel) that you can insert on sites that don’t support javascript (like WordPress.com, Craigslist, eBay, MySpace, etc) so you can track pageviews even there!

I really recommend that you check out Clicky, even if (especially if) you only have one site to manage — it’s free.

Clicky Web Analytics

, , , ,

1 Comment

Web Analytics Reviews

I wrote this piece after going through several analytics packages in search of the best fit for my sites and needs. Here’s what I came up with…

What I’m looking for in an analytics tool:

Ability to track:

  • Page views / Visits and visitor counts (the usual)
  • Referrers / Searched terms from search engines
  • IP addresses (for access spam control)
  • Logged in user names (What my users are looking at)
  • Near-realtime stats (as close to realtime as possible, not having to wait until the next day)
  • Integration with Drupal/WordPress as a maintained module (less important)

The tools:


Google Analytics

What it does

  • Tracks page views / visitor data / search terms

What it doesn’t do:

  • IP addresses
  • Logged-in user name
  • Near-realtime stats

Cost: FREE

Traffic limit: 1 million page views per day.

Other thoughts: Some think that stats collected are used by Google to adjust search results.


Woopra

What it does:

  • IP addresses
  • Logged-in user names++
  • Page views / visits and visitor counts
  • Chat with your users##
  • Real-time stats
  • Referrers / search terms
  • Integration with Drupal

What it doesn’t do:

  • Clicking on the ‘chat’ function would crash the app.
  • Clicking on the ‘analytics’ once crashed my app for several hours

Cost: Free to Expensive$$

Traffic Limit: Depends on pricing tier

Other thoughts: Requires desktop app install (Windows / Mac / Linux [Java-based]). At 3,000 monthly page views for the free package and 10,000 for the $5/mo package, this is the pricey end of analyics packages. Considering all the bugs I had with it, I would consider something else. Unlimited sites, pricing plan is per-site.


Clicky

What it does:

  • IP addresses
  • Page views / visits and visitor counts
  • Real-time stats**
  • Referrers / search terms
  • Integration with Drupal

What it doesn’t do:

  • Logged-in user names

Cost: Free to expensive$$

Traffic Limit: Depends on pricing tier

Other thoughts:

Stats are near-realtime as they are displayed on a webpage. A refresh takes care of loading the newest stats. Free plan is good for 3,000 page views a day. $5/mo plan is good for a lot more. Pricing plan is tied to your account. Limit to the number of websites you can track.


Piwik

What it does:

  • IP addresses
  • Page views / visits and visitor counts
  • Real-time stats (via the Live! plugin)
  • Referrers / search terms
  • Integration with Drupal

What it doesn’t do:

  • Logged-in user names

Cost: FREE

Traffic Limit: As much as your webserver/database server can stand.

Other thoughts: This is a self-hosted software, which means you have to install it on your webserver or hosting account. Free / Open source. Though the usability and feature-set is impressive, there are a number of serious bugs which can be show-stoppers for the non-technical or easily-frustrated user. Database grows over time and requires manual purging. Can put a serious load on DB servers on high-traffic sites. No limit to the number of sites you can track. Set  up login/password access for others to view stats.


Mint

What it does:

  • IP addresses
  • Page views / visits and visitor counts
  • Real-time stats**
  • Referrers / search terms
  • Integration with Drupal
  • Logged-in user names++

What it doesn’t do:

Come free.

Cost: $30/site

Traffic Limit: As much as your webserver/database server can stand.

Other Thoughts: At $30/site this can be expensive in multi-site operations, but this is a very well polished software package. Database growth is kept in check as detailed stats are only held for 6 weeks. Totals are kept forever. This happens to be my software package of choice.


** There’s a catch.

$$ Cost varies according to traffic / number of monitored websites

++ Requires Drupal module

## Though this is a feature, I never got it to work correctly.

, , , , ,

Leave a comment