Skip to main content
How to sideload E-Commerce data

Why you should use the Google sideloading mechanism. How to set up the Google sideloading mechanism.

Admetrics avatar
Written by Admetrics
Updated over 3 weeks ago

1. Google Sheets

The Google Sheet sideloading mechanism provides a seamless way for customers to integrate and update their data with the Admetrics dashboard.

To facilitate this process, customers are required to share a Google Drive folder with Admetrics. The Admetrics system is designed to frequently scan this shared folder for any updates or new additions. It automatically bulk loads all sheets that have undergone changes or have been added since the last refresh cycle. Customers have the flexibility to update existing sheets with new data or introduce new data by adding separate sheets to the drive folder.

While the names of the files themselves are not critical, it is imperative that the sheet names and column names within these files strictly adhere to the specifications provided. Example Sheet on Google Spreadsheets

We expect 5 sheets in each file:

  • Orders

  • Lineitems

  • Products

  • Customers

  • Refunds

The schema of each sheet is explained here.

Setup

Share the Google driver folder containing your data with the email address shown in the datasource admin.

Google Sheet Limit

Google Sheets has a limit of 10 million cells.

If this limit is reached while adding data to any of the sheets or it is anticipated that this limit would be reached, Customers will need to create separate sheets file to add data further.

  • In case this limit is reached with Order or Lineitems data, it is advised to split data between the files using a cut off based on order creation date, e.g. all orders / lineitems until 30.09.2024 in the first file and all orders / lineitems from 1.10.2024 onwards in the subsequent file, to avoid the same order appearing in multiple files.

  • In case this limit is reached with Refunds data, it is advised to split data between the files using a cut off based on return date, e.g. all returns until 30.09.2024 in the first file and all returns from 1.10.2024 onwards in the subsequent file to avoid the same return appearing in multiple files.

  • In case this limit is reached with Customers data, it is advised to split data between the files using a cut off based on customer creation date, e.g. all customers created until 30.09.2024 in the first file and all customers from 1.10.2024 onwards in the subsequent file to avoid the same customer appearing in multiple files.

For ease of maintainability customers can also choose to split data between files based on date intervals instead of cut-off dates: e.g.

  • Monthly (01.01.2024 - 31.01.2024)

  • Quarterly (01.01.2024 - 31.03.2024)

  • Half-yearly (01.01.2024 - 30.06.2024)

  • Yearly (01.01.2024 - 01.12.2024)

Note: It is important that in both approaches the files should contain complete data for the whole day.

2. Amazon S3

Similar to the Google Sheets mechanism, the Amazon S3 sideloading process enables customers to seamlessly integrate and update their data with the Admetrics dashboard using a shared Amazon S3 bucket.

Customers must share an S3 bucket with Admetrics by applying the appropriate bucket policy (see Setup). This Bucket is supposed to contain one folder for each data delivery.

Folder structure

The folder within your bucket is supposed to be named like this:

your-bucket-name/REPORTDATE_INTERVAL_STARTDATE_INTERVAL_ENDDATE

So for example if you deliver data for the time range 2024-01-01 till 2024-01-31 on the 4th of February the folder would be called:

your-bucket-name/2024-02-04_2024-01-01_2024-01-31

This naming ensures we can identify recency of the dataset and the time frame it covers.

Dataset files

The dataset in this folder needs to consist of 5 separate files in TSV format.

  • orders.tsv

  • lineitems.tsv

  • products.tsv

  • customers.tsv

  • refunds.tsv

The schema of each file is explained on the schema page.

Setup

To share an S3 bucket with Admetrics, add a bucket policy that grants specific permissions to our AWS account ID 292397901148. Below is an example policy that you can apply to your bucket. This policy allows Admetrics to list the bucket contents and get objects from the bucket, enabling the sideloading process:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AdmetricsAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::292397901148:root"
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}

Replace your-bucket-name with the name of your actual S3 bucket. After applying this policy, ensure the bucket and files are properly configured according to the Admetrics specifications.

Did this answer your question?