· 7 min read · André Flitsch

Privacy-First E-Commerce Analytics with Magento 2 + Metabase

Privacy-First E-Commerce Analytics with Magento 2 + Metabase

Introduction

In e-commerce it is required to analyse the data collected by the shop software to identify sales trends, customer purchase behaviour, product performance. The challenge is to do this without affecting the live data, therefore a third party tool is often used. However, if customer data is exported from the e-commerce platform it must be anonymized so that it does not violate GDPR.

I was faced with this situation, and decided to export the data from a magento 2 instance i manage into Metabase so that i can build visualizations & dashboards for use by the customer.

Docker setup

As with many things, I found that it was best to create a docker setup to manage the services used. This allowed me to spin up a quick test environment locally, then easily deploy the stack to a production environment.

I do not want to access the Database of magento directly, and as i have said the data in it should not contain any personally identifiable information (PII), so i decided to create a MySQL database service in the stack where the anonymised data will be imported. Therefore i have 3 services: Metabase, PostgresSQL & MySQL.

I named the MySQL container stats because it holds only the anonymised analytics database — separate from both the Magento production DB and Metabase itself.

Below is the compose.yaml i used

volumes:
  metabase_postgres_data:
  metabase_metabase_data:
  metabase_stats_data:

services:
  metabase:
    image: metabase/metabase
    ports:
      - "3333:3000"
    environment:
      - MB_DB_TYPE=postgres
      - MB_DB_DBNAME=${POSTGRES_DB}
      - MB_DB_PORT=5432
      - MB_DB_USER=${POSTGRES_USER}
      - MB_DB_PASS=${POSTGRES_PASSWORD}
      - MB_DB_HOST=postgres
    depends_on:
      - postgres
      - stats
    restart: always
    volumes:
      - metabase_metabase_data:/metabase-data

  postgres:
    image: postgres:14
    environment:
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    volumes:
      - metabase_postgres_data:/var/lib/postgresql/data
    restart: always

  stats:
    image: mysql:8.0.43
    environment:
      - MYSQL_ROOT_PASSWORD=${STATS_ROOT_PASSWORD}
      - MYSQL_DATABASE=${STATS_DB}
      - MYSQL_USER=${STATS_USER}
      - MYSQL_PASSWORD=${STATS_PASSWORD}
    volumes:
      - metabase_stats_data:/var/lib/mysql
    restart: always

These variables (POSTGRES_DB, POSTGRES_USER, etc) should be defined in a .env file in the same directory as compose.yaml, so you keep passwords out of the version-controlled file.

# .env
POSTGRES_DB=metabase
POSTGRES_USER=metabase
POSTGRES_PASSWORD=ChangeMe!
STATS_ROOT_PASSWORD=AnotherStrongPw!
STATS_DB=anonymised_stats
STATS_USER=stats_user
STATS_PASSWORD=stats_password

Export anonymized Magento data

I looked into several tools available on github to anonymize magento data:

  • Smile-SA/gdpr-dump
  • mpchadwick/dbanon
  • DivanteLtd/anonymizer

dbanon

Written in go, can be used by piping output from mysqldump through the binary:

mysqldump mydb | dbanon -config=myconfig.yml | gzip > mydb.sql.gz

The configuration requires quite some manual work to prepare all the tables in the magento installation.

There is an example configuration file available here, but not all of the database is covered, it was not clear in the documentation if it was necessary to create a configuration for all tables, or only the ones which require anonymisation. As i wanted to get up and running quickly, i abandoned my research with this option.

As is written in GO then i imagine it would be quite fast, i might get back to investigating at a later date, if i do i will report back.

mpchadwick/dbanon

A run anywhere database anonymizer

Go5917

DivanteLtd/anonymizer

This is written in Ruby, however, after reading the documentation, which i found a little confusing, i decided not to use it.

It seems quite capable, but suffers from unclear (imo) documentation.

DivanteLtd/anonymizer

Universal tool to anonymize database. GDPR (General Data Protection Regulation) data protection act supporting tool.

Ruby37956

Smile-SA/gdpr-dump

I chose this option finally, as i found the documentation easy to understand, and was able to work out how to modify the configuration of the exporter quite easily. There is a template configuration for magento2 which covered the basics, but did cover my requirements.

Smile-SA/gdpr-dump

Utility that creates anonymized database dumps (MySQL only). Provides default config templates for Magento, Drupal and Shopware.

PHP21658

I wanted to keep the customer email address as a unique identifier in all tables, as this can be used to analyse sales related information, for example if customers purchase items repeatedly. The original method of anonymizing the email address made every email address individual. I needed a method of anonymising the email addresses, but making them consistant, so that the same email address would always result in an identical anonymised address, and therefore allowing linking of records across the various entities.

I managed to do this by combining converters using the ‘chain’ converter. I prepended the email address with a salt value, then hashed it using md5 (to keep the email prefix short). Then i appended a domain to the hashed value to keep the appearance of an email address, here is an example of the configuration:

tables:

  _shared_converters:
    # YAML parser will define the anchor here
    converters:
      salted_hash: &salted_hash
        converter: 'chain'
        parameters:
          converters:
            - converter: 'prependText'
              parameters:
                value: "super_secret_salt_2025::"
            - converter: 'hash'
              parameters:
                algorithm: 'md5'
            - converter: 'appendText'
              parameters:
                value: "@example.com"

Because i am lazy and didn’t want to repeat all that typing for each email field in every table i decided to use an YAML anchor &salted_hash to improve the ease of reuse. I tried adding this at the root of the YAML but the parser expects certain nodes at root level, and the extra node caused an error. Tables however are ignored if they do not exist, so i created the fake table _shared_converters to contain the definition of the reusable YAML anchor &salted_hash .

Here are some examples of how i used it in my configuration

extends: 'magento2'
version: '2.4.8'

...

tables:

...

  customer_entity:
    converters:
      email: *salted_hash

  sales_order:
    converters:
      customer_email: *salted_hash
...

Now all email addresses are anonymised, and are identical across all tables allowing for this to be used as an identifier.

The other fields were satisfactorily anonymised for me with the default settings. I did have to add configuration for the additional tables which 3rd party modules had installed.

To use the gdpr-dump you have to install the phar.

wget https://github.com/Smile-SA/gdpr-dump/releases/latest/download/gdpr-dump.phar
chmod +x gdpr-dump.phar
./gdpr-dump.phar --version

You then have to create your configuration file, see the wiki

Using your configuration file, you can execute the command:

./gdpr-dump.phar /path/to/your-config.yaml > dump-$(date +%Y%m%d%H%M%S).sql

I added the database connection info into the configuration file, to keep the command options to a minimum.

It is also possible to compress the dump by piping it through gzip or bzip2

./gdpr-dump.phar /path/to/your-config.yaml | gzip -9 > dump-$(date +%Y%m%d%H%M%S).sql.gz

./gdpr-dump.phar /path/to/your-config.yaml | bzip2 -c -9 > dump-$(date +%Y%m%d%H%M%S).sql.bz2

Hooray, now we have an anonymised sql dump ready for import into metabase

Load the anonymized dump into Metabase

As you will have generated the anonymised dump file on the production host, because, you know, PII should not leave the production environment 😉 You will need to download it to wherever you have setup your docker stack.

gunzip -c /path/to/your-dump.sql.gz | docker compose exec -T stats \
  mysql -u root -p stats

You will be prompted for the MySQL root password, and then the file will be imported (after being unzipped) into the docker container.

Now you can setup your Queries & Dashboards as you wish in metabase, still keeping any associations based on customer email in a GDPR friendly manner.

Summary

With this setup, you can safely analyse Magento 2 sales data inside Metabase without exposing any personally identifiable information.

By combining GDPR-compliant anonymization using gdpr-dump with a lightweight Docker stack, you get a repeatable, privacy-first analytics environment.

It’s a practical balance between insight and compliance — one that can be adapted easily for any other system requiring anonymized reporting.

Next up: how to automate this process to keep your Metabase dashboards up to date with fresh, anonymized data.