Modern Data Warehouse, Analytics Modernization, & Data Landscape

Modern Data Warehouse & Analytics Modernization: Navigating Your Data Landscape
  • April 29, 2024

The data analytics space has seen a lot of growth in the past decade. Companies invest in their technology with new features and functionality that strive to improve the user experience and make them more productive. As technology advances, new tools enter the market with the message that they are modernizing the analytics and business intelligence experience, claiming to do something better or more intelligently than their predecessors. With the rise of modern cloud data warehouses, the modern data stack, and generative AI, companies are experiencing curiosity about what is out there and questioning if their current analytics platform is solving all of their needs.

Modern Data Warehouse - Data storage for all of your dataThis curiosity about analytics modernization can be a healthy exercise, but it must be done carefully. Buying a new tool just because it is more modern and has more features is not a proper justification in itself. And when putting together and justifying the budget, it is not just comparing licensing costs of the old and new tool. There are costs to migrate existing artifacts to the new system, skills up developers on the platform, and training and adoption costs for business users. However, all of these costs can certainly be justified if the new platform provides enough value to the organizations regarding business productivity and ROI and lowers the IT/infrastructure burden. In this blog, I am going to outline the primary questions that organizations’ BI teams should ask themselves when considering an analytics modernization.

  • Are your end users currently using your analytics tools? Why or why not?
  • Is your data volume creating performance issues for your dashboards or refresh schedules?
  • Is your logic (business and/or transformation) stored in your BI tool or your data warehouse?
  • Do AI features like auto-generated visualizations and dashboards add value to your analytics workflow?
  • Will moving to a new analytics platform decrease licensing costs and infrastructure burden?

 

Are your end users currently using your analytics tools? Why or why not?

This is a tough question for a lot of companies to answer. Having to face the fact that they may have undergone a technological investment of money, time, and resources all for adoption never to take off. It happens more than you think, and it is a tough pill to swallow. However, the gut reaction of “a new tool will drive more adoption” is not the right answer. Rather, it is better to understand why your business users are choosing not to engage with your analytics platform first. Are they getting their business questions answered another way, like Excel spreadsheets or reports? Is the analytics platform too difficult for them to navigate or interact with? Can they trust the data that is being represented in their analytics dashboards, or is the dashboard even showing the most up-to-date data?

Asking these questions and understanding the reasons why users are not engaging in your platform needs to come first in the analytics modernization process. It may very well be that the reasons they list for not using the tool can be solved by a more thorough understanding of your platform’s capabilities. At that point, analytics modernization is not what is needed but rather analytics/data literacy. These are vital for rollouts and implementations of any new tools. In addition, assigning stewards or champions who will take ownership of the platform and any questions related to it will help provide a reliable point of contact for your end users with questions. These stewards also carry a technical responsibility as well: vetting and scheduling new platform upgrades and communicating the new features, functionality, and sometimes UI changes that come with the update. Upgrading from a version of your analytics platform that is multiple years or even months old could resolve many of the concerns that end users vocalize (and can ensure that your analytics software is also secure from any prior security vulnerabilities).

As mentioned, this question needs to be answered first because it serves as the basis for the ones to follow. Listening to your user base and capturing their concerns is going to drive what type of project you embark on. If they are fully utilizing the current platform but running into technical challenges or spending more time working on tasks than they think is necessary, a new platform may be of help. But a new analytics tool will not be the silver bullet when it comes to fixing things like poor processes, data governance/quality issues, no stewardship, or lack of roadmap. Those will require data strategy and literacy initiatives because they run deeper than the data analytics level.


Step into the future of data-driven decision-making. Our 'Enterprise Data  Strategy Handbook' awaits to guide you through constructing an effective data  strategy.



Modern Data Warehouse & Analytics Modernization: Is your data volume creating performance issues for your dashboards or refresh schedules?

The data your organization managed ten years ago is almost certainly a fraction of your data today. Assuming you don’t purge historical data, transactional systems alone will multiply the amount of data in your lake/warehouse on a consistent basis. Pair that with the introduction of new data sources and use cases, and your gigabytes of data have probably grown to a couple of terabytes or even petabytes and aren’t slowing down anytime soon. As volume increases, systems that interact with this data will more than likely experience a performance hit if not scaled appropriately. When this happens, you may experience longer data load and refresh times in the data analytics tool or ETL process, which can hamper SLAs. Your users may also experience delayed visualization responsiveness or extended load screens. Just like when you get frustrated that a YouTube video takes more than a couple of seconds to load, your end users will feel the same when they are constantly waiting for their dashboard to render.

When these volume and performance issues occur, companies may implement temporary, band-aid solutions like limiting the scope of the data being analyzed. This may work in maintaining performance but diminishes the quality of analytics insights when only looking at certain time periods or data sources. At this point, it is suitable to start looking into how you can increase the performance of the processing of the data, whether that be at the analytics tool level or the ETL/ELT or data warehouse level. This can be a lengthy exercise in itself, as doubling the horsepower of VM that your analytics tool runs on is typically the last course of action that you want to take. Routes like logic review and optimization are more appropriate first steps but can take time to implement properly.

At the end of the day, sometimes it will fall on the technical architecture and specs of the analytics tool for maintaining performance. For example, using a tool that performs all operations in memory may be fantastic for small data volumes. However, trying to do in-memory operations on terabytes of data on an 8GB RAM VM is not feasible or recommended. Modern analytics tools have shifted in the live-query direction, passing queries to the data warehouse/lakehouse to perform before eventually visualizing the results. This passes the computing and cost burden to the data warehouse, which nowadays has more flexibility in terms of scaling both vertically and horizontally. Some tools even offer a hybrid approach, allowing both live query and in-memory connections in the same dashboard. The live-query model comes with the assumption that the data processing can be performed efficiently and cost-effectively in the data warehouse, however. And it also assumes that the data is designed in the warehouse to be modeled and queried efficiently by the analytics tool. As your data and use cases evolve, it is key to know at what point your current analytics tool will fail to perform to at the level that is expected of it and what your options are to optimize it before considering replacement.

 

Is your logic (business and/or transformation) stored in your BI tool or your data warehouse?

As mentioned previously, analytics tools used to primarily function by loading data into memory and performing the necessary processing and calculations to prepare and visualize the data in the dashboards. In a lot of cases, this logic is proprietary to the software (and its vendor family) and lives inside of the tool. For example, in Power BI, data transformation is done in Power Query with the help of the M language, while front-end calculations and aggregates are performed by DAX. Power BI also has the capability to load data via native connectors prior to working with it in Power Query. On the other hand, a tool like ThoughtSpot cannot perform large-scale data transformation or loading but rather pushes that responsibility to the ELT and data warehouse technologies. ThoughtSpot’s front-end calculations are done inside “worksheets”, which are semantic representations of the data model. If one was to migrate from Power BI to ThoughtSpot, two sets of logic would need to be migrated and converted, the ETL logic and the aggregation logic.

There are numerous benefits to having data logic reside and execute in the data warehouse layer as opposed to the data analytics layer. The primary is that the data warehouse layer utilizes SQL or Python to perform these operations. Compared to proprietary scripting languages or interfaces, this provides greater flexibility, prevents vendor lock-in, can have more governed version control, and is easier to hire qualified talent for. This also is valuable for potential future migrations of data warehouses. Unless something drastic happens in the next decade or so, future data warehouses will still use some form of SQL or Python, allowing for easy recreation of artifacts and data pipelines. Furthermore, the logic is executed once inside the more modern data warehouse and can serve the ready-to-go data to any downstream BI, ML or AI tool, as opposed to each tool needing to execute the logic themselves. While those tools will have their own unique use cases and requirements that involve individual processing, there is room to eliminate redundancy.

If your logic resides in your analytics tool, migrating to a live-query tool and moving your logic to a modern data warehouse can definitely be more performant and resilient. However, there is a significant effort that goes into this migration, especially if the current analytics tool is well-established and has been operating as a pseudo-data warehouse. You will need a team of people that can translate the logic from one language to another and understand the lineage that is associated from each of the scripts, reports, and data sources. On top of that, once the transformation logic migration is complete, there is the migration of the calculations performed inside the dashboards and visualizations. These are typically simple for things like simple sums and counts but can get more complex for things like custom dimensions and advanced aggregations. When you take into consideration the size of your current deployment and have to recreate the dashboards and visualizations once the logic is in place, you could be looking at a multi-year migration to migrate completely.

 

Do AI features like auto-generated visualizations and dashboards add value to your analytics workflow?

Moving away from the developer responsibilities and into the user-focused area of modernization, companies are going to want to consider who is currently utilizing their analytics platform, how involved they are in the creation of new artifacts, and the current backlog of analytics requests. In more “legacy” analytics applications, dashboard design and creation took a considerable amount of skill and knowledge to build content. These users were considered BI developers and would build according to end-user requests. Many times, these requests go through a priority and queueing process as the backlog forms. Over time, analytics platforms have decreased the learning curve to build dashboards and visualizations, promoting “self-service” to business users. The ability to create content decreased the volume of requests for BI developers but increased the need for proper data governance and literacy in regard to published and shared content.

Generative AI and other technologies have decreased the learning curve for analytics creation. A simple text prompt can return multiple KPIs, visualizations, and dashboards that attempt to answer the question at hand and questions related to the original prompt. This can be an incredible time saver regarding creation, modification, and customization. Once again, the quality of the visualization is going to rely heavily on the quality of the underlying data and how it is modeled. And while the content generated may not be 100% of what the end user is asking for, it can serve as a fantastic starting point where minor tweaks or customizations are needed to complete.

While the AI-assisted self-service direction is great, the real question is how much additional value does it give to the organization. A couple of prompts can create dozens of visualizations, but if none of those visualizations end up getting published, then is there really a need for those features? Maybe the business logic is too complex to visualize without developer assistance. This may only apply to a select number of use cases, however. With that in mind, generative AI is not meant to replace analytics creation/consumption completely. Human interaction and interpretation is going to be needed. As you consider an analytics modernization, remember that AI should be evaluated as features that assist in the overall analytics workflow and increase the productivity of the users that are driving it.

 

Will moving to a new analytics platform decrease licensing costs and infrastructure burden?

In the previous sections, I spoke a lot about the costs associated with migrating analytics platforms. These costs can sometimes to be difficult to estimate upfront, as requirements may change or complexities in the migration can be uncovered and extend the timeline. Conversely, licensing and the cost to downsize or eliminate infrastructure are more clear and firm. Licensing structure will vary from tool to tool but can be broken up into a few main categories: user-based, platform-based, and capacity-based, with some technologies having a mix of the three. Nowadays, perpetual licensing is less common due to the rise of SaaS and subscription models. However, one of the benefits of using SaaS technology is not needing to host, manage, and secure a server or VM for the software. This shifts the infrastructure management to the software vendor which reduces the amount of internal management resources but will still require oversight on things like security.

If migrating to a new licensing model, it is crucial to review your company’s current usage and user count while also taking into account expected growth over a few years. Licensing usually can be discounted the longer the subscription term. If you are expecting more widespread adoption or production of use cases it might make sense to account for expansion. It is also important to take into account any new peripheral technology costs associated with migrating to a more modern system. For example, if you are migrating to a live-query analytics platform, each dashboard refresh will send a query to the modern data warehouse. If the data warehouse charges are based on usage, you should expect to see an increase in your data warehouse compute costs. Another consideration is the potential additional of more advanced use cases, such as embedded analytics.

All of these costs should be factored into the total cost of ownership of the analytics platform. While some of the costs, like infrastructure, may permanently decrease from the system you are migrating off of, others, like licensing, might be even higher because of the expectation of new use cases, more users and activity, and additional functionality that the new platform brings. I will once again harp on the project costs associated with the actual migration to the new platform, starting with planning, strategy, and road mapping all the way down to QA, testing, and release. While the costs may be higher, the expected ROI and business justifications should guide your company’s decisions.

Closing Thoughts on Modern Data Warehousing & Analytics Modernization 

Analytics modernization can bring a lot of benefits to an organization. It is crucial, however, to not fall into the pitfalls that come with the “shiny new tool” syndrome and put features and functionality over practical business use cases. Understanding the value that moving to a more modern system can bring comes with proper strategy sessions and road mapping that detail the effort that is going to go into the project, who will be part of the project, how long it will take to implement it, and the estimated time of adoption. In the end, the goal is to get more people enabled with data and to grow the number of use cases that can be solved with data. As more tools enter the ecosystem, analytics modernization isn’t going anywhere, so make sure you can answer the questions that come along with it before embarking on the journey.

 

 

 

 

Related Articles

Leapfrog the data trust gap with powerful insights and governance

July 6, 2023
Organizations want to use the data they have to make data-driven decisions. Yet, in an Accenture study, only 33% of...

Why are companies moving to a modern data warehouse?

February 29, 2024
If you’re on the fence about modernizing your data warehouse, consider this: Data warehouse technology hasn’t changed...

REPORT: Simplify Data Architecture for Faster Insights

December 17, 2021
Organizations face increasing competition and compressed time frames that require intelligent use of all available...