1+1=3, how to find additional value thanks to the data warehouse in Snowflake
- Bluerank & Avon
Working with the data of a huge international organization can be challenging, requiring you to draw conclusions from data scattered in many different places. Thanks to cooperation with Bluerank, our client Avon is able to make key business decisions based on consistent data, integrated in a warehouse created using Snowflake technology.
Challenges
Creating a data warehouse is a process
And even at the beginning of this process, partner data was scattered in many sources - all partner platforms had their own databases, often using different technologies - from relational databases, noSQL and other data warehouses, through Google Analytics, to manual data exporters to Excel files. Such dispersion made it difficult for the "average" employee of the company to access this data, who was supposed to make decisions important for his area. Collecting the list of sources and laying out the structure in Snowflake started the whole project.
Data availability
In addition, every "decision-maker" in the company had to have access to all the tools. In a large organization such as Avon, this involves managing access to information - just organizing access to multiple sources is a difficult task. The platform or process maintainer had to:
- first ensure access to individual data sources with different people responsible for individual platforms,
- understand all the technologies used to store data,
- combine various data manually with files from other sources.
While for a team member focusing on one platform, using, for example, one database in conjunction with GA, this is not an issue. However, for people who are tasked with evaluating the activities of a specific area or the entire company in a holistic way, it can be a real challenge.
Differences in individual markets
One more layer of the issue can be added to the above, which is the issue of different markets and data integrity. The partner operates globally on several dozen markets, and each of them could create its own data sources and reports based on definitions that differ between markets. This aroused the need to standardize certain assumptions and metrics within one "Source of Truth", which was to become Snowflake. For this reason, we knew from the beginning of the process how important the documentation creation would be on our side.
Implementation
The whole process is a cooperation of an interdisciplinary and international team of specialists from various fields - an Indian team specializing in integration processes and the use of Boomi, a Hungarian team of data engineers, and a team of specialists from Bluerank, involved as consultants at all stages of the project. The entire process is supervised by the Project Manager, who during periods of intensive work during the sprint organizes daily half-hour statuses during which the current progress of work is discussed, as well as possible challenges emerging in the project.
We divided the complex process into several stages:
Step 1 - Basic questions
Recognizing the needs of the markets, i.e. answering the questions: what data do they need? What conclusions do they want to draw? In what form do they want to use them? The answers to very basic questions in a large organization are never simple and obvious. It is important to ask them periodically and revise the first answers.
Step 2 - Development of the language
The next step is to translate the needs into a technical specification that can be easily used by the IT department. It should be remembered that these people have no business experience, do not know Google Analytics or other tools - but they have programming and developer knowledge. Our task was to prepare clear instructions - for example, in the case of GA, it was to prepare a list of views, dimensions and metrics, as well as filters and segments that could be downloaded via the API.
Step 3 - Scripts
Next, we moved on to the use of scripts designed to download data. In this case, they were written using Python and R programming languages, and the whole action was additionally supported by a dedicated tool - Boomi, used to download raw data in CSV or JSON format. The downloaded files were placed in the appropriate staging tables in Snowflake.
Step 4 - Data modeling
Modeling raw data into a form useful for business. In other words, step 4 is the use of a dedicated tool and SQL scripts to convert data from JSON format - difficult to use by non-technical people - to a well-known table form. This makes it easy to connect to reporting applications such as Tableau or Data Studio (now Looker Studio) and create your own queries if necessary.
In addition to conversion, the modeling process also makes the transmitted data consistent with others that are already in the warehouse and adds further useful metrics and dimensions to help in subsequent analysis.
Step 5 - Audit
Audit of data correctness in views that will be used by users. Comparing the data in Snowflake with the data in the source aims to find differences, indicate their causes and provide recommendations for corrections in the data model.
Step 6 - Documentation
Before giving the green light, it is necessary to document the dataset, translate what is in the views into business language, identify potential problems - this is the basis without which the use of the warehouse will not be possible. In this step, we also created some sample data analysis scripts for end users to inspire them to use the tool creatively.
Step 7 - Transfer to production
Transferring processes, tables and views from the test environment to production and announcing changes to all users combined with dedicated training. This is a very important and often overlooked element in the multitude of other tasks, and it is crucial for the success of the entire process.
Step 8 - Reporting
The last step is reporting. Prepare reports and dashboards using Tableau and support end users with their questions that arise as they use the reports.
Results
The single-source data integration project was important for several reasons. The most significant of them was the availability of data that had previously been stored in different places, which hindered the possibility of quick analysis, and required extensive knowledge of many technologies used. It involved the need to independently work on combining data from many sources, such as the company's CRM, various client websites or Google Analytics.
Based on the data in Snowflake (alternatively, we also recommend using BigQuery), Avon analysts create reports in Tableau or create SQL scripts directly in the warehouse to extract the data they are interested in.
Teams can use sample scripts created by our team. Transferring data to one data source gives you the confidence that they are reliable, consistent with each other and can be used to make essential business decisions. The fact that the data is in one place saves a lot of time and resources, and also enables easier cooperation between departments. Combining data from different sources also gives you the opportunity to create more in-depth analyzes and draw conclusions about the entire business, not just its individual elements.