Privacy-First E-Commerce Analytics with Magento 2 + Metabase
Introduction
In e-commerce it is required to analyse the data collected by the shop software to identify sales trends, customer purchase behaviour, product performance. The challenge is to do this without affecting the live data, therefore a third party tool is often used. However, if customer data is exported from the e-commerce platform it must be anonymized so that it does not violate GDPR.
I was faced with this situation, and decided to export the data from a magento 2 instance i manage into Metabase so that i can build visualizations & dashboards for use by the customer.
Docker setup
As with many things, I found that it was best to create a docker setup to manage the services used. This allowed me to spin up a quick test environment locally, then easily deploy the stack to a production environment.
I do not want to access the Database of magento directly, and as i have said the data in it should not contain any personally identifiable information (PII), so i decided to create a MySQL database service in the stack where the anonymised data will be imported. Therefore i have 3 services: Metabase, PostgresSQL & MySQL.
I named the MySQL container stats because it holds only the anonymised analytics database — separate from both the Magento production DB and Metabase itself.
Below is the compose.yaml i used
volumes:
metabase_postgres_data:
metabase_metabase_data:
metabase_stats_data:
services:
metabase:
image: metabase/metabase
ports:
- "3333:3000"
environment:
- MB_DB_TYPE=postgres
- MB_DB_DBNAME=${POSTGRES_DB}
- MB_DB_PORT=5432
- MB_DB_USER=${POSTGRES_USER}
- MB_DB_PASS=${POSTGRES_PASSWORD}
- MB_DB_HOST=postgres
depends_on:
- postgres
- stats
restart: always
volumes:
- metabase_metabase_data:/metabase-data
postgres:
image: postgres:14
environment:
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- metabase_postgres_data:/var/lib/postgresql/data
restart: always
stats:
image: mysql:8.0.43
environment:
- MYSQL_ROOT_PASSWORD=${STATS_ROOT_PASSWORD}
- MYSQL_DATABASE=${STATS_DB}
- MYSQL_USER=${STATS_USER}
- MYSQL_PASSWORD=${STATS_PASSWORD}
volumes:
- metabase_stats_data:/var/lib/mysql
restart: always
These variables (POSTGRES_DB, POSTGRES_USER, etc) should be defined in a
.envfile in the same directory ascompose.yaml, so you keep passwords out of the version-controlled file.
# .env
POSTGRES_DB=metabase
POSTGRES_USER=metabase
POSTGRES_PASSWORD=ChangeMe!
STATS_ROOT_PASSWORD=AnotherStrongPw!
STATS_DB=anonymised_stats
STATS_USER=stats_user
STATS_PASSWORD=stats_password
Export anonymized Magento data
I looked into several tools available on github to anonymize magento data:
- Smile-SA/gdpr-dump
- mpchadwick/dbanon
- DivanteLtd/anonymizer
dbanon
Written in go, can be used by piping output from mysqldump through the binary:
mysqldump mydb | dbanon -config=myconfig.yml | gzip > mydb.sql.gz
The configuration requires quite some manual work to prepare all the tables in the magento installation.
There is an example configuration file available here, but not all of the database is covered, it was not clear in the documentation if it was necessary to create a configuration for all tables, or only the ones which require anonymisation. As i wanted to get up and running quickly, i abandoned my research with this option.
As is written in GO then i imagine it would be quite fast, i might get back to investigating at a later date, if i do i will report back.
A run anywhere database anonymizer
DivanteLtd/anonymizer
This is written in Ruby, however, after reading the documentation, which i found a little confusing, i decided not to use it.
It seems quite capable, but suffers from unclear (imo) documentation.
Universal tool to anonymize database. GDPR (General Data Protection Regulation) data protection act supporting tool.
Smile-SA/gdpr-dump
I chose this option finally, as i found the documentation easy to understand, and was able to work out how to modify the configuration of the exporter quite easily. There is a template configuration for magento2 which covered the basics, but did cover my requirements.
Utility that creates anonymized database dumps (MySQL only). Provides default config templates for Magento, Drupal and Shopware.
I wanted to keep the customer email address as a unique identifier in all tables, as this can be used to analyse sales related information, for example if customers purchase items repeatedly. The original method of anonymizing the email address made every email address individual. I needed a method of anonymising the email addresses, but making them consistant, so that the same email address would always result in an identical anonymised address, and therefore allowing linking of records across the various entities.
I managed to do this by combining converters using the ‘chain’ converter. I prepended the email address with a salt value, then hashed it using md5 (to keep the email prefix short). Then i appended a domain to the hashed value to keep the appearance of an email address, here is an example of the configuration:
tables:
_shared_converters:
# YAML parser will define the anchor here
converters:
salted_hash: &salted_hash
converter: 'chain'
parameters:
converters:
- converter: 'prependText'
parameters:
value: "super_secret_salt_2025::"
- converter: 'hash'
parameters:
algorithm: 'md5'
- converter: 'appendText'
parameters:
value: "@example.com"
Because i am lazy and didn’t want to repeat all that typing for each email field in every table i decided to use an YAML anchor &salted_hash to improve the ease of reuse. I tried adding this at the root of the YAML but the parser expects certain nodes at root level, and the extra node caused an error. Tables however are ignored if they do not exist, so i created the fake table _shared_converters to contain the definition of the reusable YAML anchor &salted_hash .
Here are some examples of how i used it in my configuration
extends: 'magento2'
version: '2.4.8'
...
tables:
...
customer_entity:
converters:
email: *salted_hash
sales_order:
converters:
customer_email: *salted_hash
...
Now all email addresses are anonymised, and are identical across all tables allowing for this to be used as an identifier.
The other fields were satisfactorily anonymised for me with the default settings. I did have to add configuration for the additional tables which 3rd party modules had installed.
To use the gdpr-dump you have to install the phar.
wget https://github.com/Smile-SA/gdpr-dump/releases/latest/download/gdpr-dump.phar
chmod +x gdpr-dump.phar
./gdpr-dump.phar --version
You then have to create your configuration file, see the wiki
Using your configuration file, you can execute the command:
./gdpr-dump.phar /path/to/your-config.yaml > dump-$(date +%Y%m%d%H%M%S).sql
I added the database connection info into the configuration file, to keep the command options to a minimum.
It is also possible to compress the dump by piping it through gzip or bzip2
./gdpr-dump.phar /path/to/your-config.yaml | gzip -9 > dump-$(date +%Y%m%d%H%M%S).sql.gz
./gdpr-dump.phar /path/to/your-config.yaml | bzip2 -c -9 > dump-$(date +%Y%m%d%H%M%S).sql.bz2
Hooray, now we have an anonymised sql dump ready for import into metabase
Load the anonymized dump into Metabase
As you will have generated the anonymised dump file on the production host, because, you know, PII should not leave the production environment 😉 You will need to download it to wherever you have setup your docker stack.
gunzip -c /path/to/your-dump.sql.gz | docker compose exec -T stats \
mysql -u root -p stats
You will be prompted for the MySQL root password, and then the file will be imported (after being unzipped) into the docker container.
Now you can setup your Queries & Dashboards as you wish in metabase, still keeping any associations based on customer email in a GDPR friendly manner.
Summary
With this setup, you can safely analyse Magento 2 sales data inside Metabase without exposing any personally identifiable information.
By combining GDPR-compliant anonymization using gdpr-dump with a lightweight Docker stack, you get a repeatable, privacy-first analytics environment.
It’s a practical balance between insight and compliance — one that can be adapted easily for any other system requiring anonymized reporting.
Next up: how to automate this process to keep your Metabase dashboards up to date with fresh, anonymized data.