Advertisements

How to automatically purge Piwik logs using a cron job

If you’re using Piwik Analytics, you may know that over time your Piwik database will continue to grow over time. The piwik log tables as well as the archive tables will continue to grow until you purge them.

The Piwik developers are aware of this and a ticket has been in their system for a while now to develop a way to automatically purge the Piwik logs. The FAQ describes a SQL query you can run to purge the Piwik logs manually, but you may want a way to do this automatically. Fortunately, here’s the solution, with a few caveats.

UPDATE: As of Piwik 1.5, the following method is depreciated. Piwik now has log purging implemented. See Piwik FAQ #42.

First, before proceeding, check the FAQ link for the correct SQL query to run. The SQL query has changed at least once that I’m aware of. The below guide reflects the SQL query that was posted as of the day this article was posted.

Second, this method uses a script which will contain a SQL password, so it does have some security implications. You can avoid these by making sure the script has the correct permissions. This guide will assume that your SQL user’s name is ‘piwik’ and you are running the script as unix user ‘www-data’. Please adjust for your individual configuration.

Third, and lastly, it’s suggested to run this query monthly. You will want to make sure your logs have been archived for the proceeding month. In most normal installations this is the case. The only time this would not be, is if you are archiving using a cron job and your cron job has not been running.

So here goes:

Step one:

Find a place where your cron user will have access to the SQL file. A suggested place is the misc/cron directory within your piwik installation, right next to the archive.sh file. Let’s call this file purge.sql. It’s not necessary for it to not be accessible from the web, as it won’t have any sensitive information in it — just the SQL query. Create this file and paste the following SQL query into it:

DELETE piwik_log_visit, piwik_log_link_visit_action
FROM piwik_log_visit INNER JOIN piwik_log_link_visit_action
WHERE piwik_log_visit.idvisit = piwik_log_link_visit_action.idvisit
AND visit_first_action_time <= DATE_SUB(CURDATE(), INTERVAL 30 DAY);
OPTIMIZE TABLE piwik_log_visit, piwik_log_link_visit_action;

Step two:

Create the script which will call the SQL query. This script will contain your SQL password, so it’s a good idea to make sure it is mode 0700 and outside a web-accessible directory. We will call this purge.sh. Assuming your database is called piwik, your SQL username is piwik, and your password is piwik123, create the file with this query:

mysql piwik -upiwik -p'piwik123' < /var/www/piwik/misc/cron/purge.sql

For your reference, the mysql command sequence is:

mysql -u -p < mysql.sql

Step three:

Time to schedule the cron job. You can either edit the crontab yourself from the terminal ( crontab -e ), or use your favorite web-GUI scheduler such as cPanel, Webmin, etc.

For the terminal users, you’ll want the crontab line to read:

1 0 1 * *     sh /path/to/purge.sh

This will schedule the purge.sh script to run at 12:01am on the first day of the month.

For the web-GUI users, refer to your GUI’s documentation.

Once this is done, just relax. Auto-purging will take place as scheduled.

Questions or comments about this? See anything that could be improved? Have a better way of doing this? Your comments are welcome, as always.

Advertisements

, , ,