Case Story

Making Data Pipelines Faster

Case Study

At a glance

This case study focuses on how SME Solutions Group provided data pipeline services, ingestion from data sources to algorithm development. The customer in this case study is in the sports and entertainment industry. 

10 X

faster data processing using "Velocity DataFrame"

99 %

uptime compared to constant on-prem issues

60 %

less code base for future maintainability

Challenges

Facing obstacles posed by outdated legacy systems, this customer encountered a range of challenges that impeded their ability to fully utilize their data assets. This wasn't just a small setback in their operations; it served as a major hurdle on their journey towards operational excellence and innovation.

These challenges, deeply rooted in their technological infrastructure, required immediate and strategic action. Here is a brief summary of the main obstacles that SME Solutions Group was brought in to support.

  • The issues with their on-premise architecture were multifaceted and deeply entrenched, posing significant hurdles to the organization’s ambitions for digital transformation. The virtual machines, once the backbone of their IT infrastructure, had become an incessant source of frustration due to their unreliability and propensity for breakdowns. These were not isolated incidents but a chronic pattern that disrupted daily operations and impeded productivity. The technology, in which they had heavily invested years ago, had aged not like fine wine but more like a gallon of milk forgotten in the back of the fridge - souring, and far past its prime.

  • The potential risk of relying heavily on a single developer for coding expertise was a significant concern for the organization, creating a vulnerable situation that needed to be addressed. The bespoke coding language, meticulously crafted by this individual, while a testament to their skill, posed a significant challenge for the organization's operational resilience. With the possibility of changes in circumstances like career advancements or unexpected departures, the organization faced the daunting reality of being left without a crucial team member. This highlighted the urgent need to diversify coding practices and democratize technological knowledge within the organization. Moving towards standardized programming languages and practices, easily understood and adopted by a wider developer community, was crucial in mitigating this risk and fostering a more collaborative and agile technological environment.

  • The system's disarray wasn't just a minor inconvenience; it was a significant impediment that severely restricted department analysts' access to vital data. This restriction was more than a hiccup in the workflow—it was a chokepoint that undermined productivity on a daily basis and fostered a narrow-minded approach to problem-solving. Analysts found themselves trapped in their own silos, unable to access, share, or cross-reference data efficiently. The situation was akin to playing a high-stakes game of telephone, where the message received is taken at face value, without the opportunity for verification or challenge. This led to a cascade of misinterpretations and misinformed decisions, as data that should have been crystal clear became muddled and distorted by the time it reached the decision-makers. The limitations imposed by the system's disarray not only slowed down the process of insight generation but also cultivated an environment where tunnel vision became the norm, not the exception. In this constrained setting, the potential for innovation and strategic foresight was significantly diminished, as teams were forced to navigate through a fog of uncertainty and incomplete information. The need for a solution that could break down these barriers and facilitate a free flow of information was not just desirable but absolutely critical for the revitalization of the organization's operational dynamics.


Solution Overview

To address the above challenges, SME Solutions Group formed a team consisting of a Data Engineer, a Project Manager, a Solutions Architect, and a Technical Lead. The team took a phased approach to the customer's challenges. Listed below are the key initiatives: Data Strategy and Solution Overview

  • We spearheaded the transition from on-premises to cloud-based solutions, specifically leveraging Snowflake.

  • Our team crafted an efficient ingestion pipeline tailored to this new cloud environment.

  • We decoded and refashioned all proprietary algorithms into languages that end users could easily grasp.

  • We provided support on automating processes and integrating best practices, utilizing tools such as Airflow and Astronomer, to streamline operations.

A highlight of the solution is something the Data Engineer aptly named the "Velocity DataFrame,” achieving speeds 10-14 times faster than traditional methods. This out-of-the-box approach wasn't about merely applying a pre-existing solution. This "Velocity DataFrame" leveraged Python, particularly its widely used pandas library essential for data analysis and the de facto language for data scientists. And while traditionally this would result in slow data processing, which is fine for development, it would be less than ideal for the production run. That was the reason for going into the code base itself recreating the base architecture within Python to speed it up, thus resulting in the "Velocity DataFrame". This became an integral drop in "module" for the customer's DevOps team. 

The feedback from the customer was overwhelmingly positive, praising not only the design and implementation but also the results achieved.

 

Key Initiatives:

CLOUD MIGRATION

In migrating from on-premises infrastructure to Snowflake, a thorough assessment of options was crucial. Understanding the unique needs of the organization and the capabilities of Snowflake ensured a seamless transition. The decision to move to Snowflake was propelled by its reputation for stability and reliability. Unlike the on-premises setup, which frequently encountered disruptions and breakdowns, Snowflake consistently delivers uninterrupted data ingestion and processing. This enhanced stability not only minimizes downtime but also ensures a continuous flow of critical data, driving operational efficiency and decision-making. Thus, the migration to Snowflake not only marks a technological shift but also signifies a strategic advancement towards a more robust and dependable data infrastructure.snowflake-cloud-data-platform-1

 

FINDING BALANCE BETWEEN USER ACCESSIBILITY AND PERFORMANCE

Navigating the complex ecosystem of their business posed a significant challenge that demanded a solution built for speed and tailored to the needs of both technical staff and analysts alike. This collaborative effort was supported by the previously mentioned "Velocity DataFrame." The creation of the module enhanced both the accessibility and the speed of data processing. Achieving a harmonious balance between user-friendliness and high performance became a cornerstone of our strategy.

 

ACCESSIBILITY TO THE DATA AND BEYOND

We changed the way data and algorithms were handled, moving away from an outdated, single-structure system to a more flexible, modular architecture. This pivotal shift was not just about breaking down the digital walls that confined our client's operational potential; it was about constructing a more adaptable and scalable framework. By transitioning to this dynamic, modular setup, we unlocked unprecedented processing power across the board. This wasn't merely a technical upgrade—it was an empowerment movement for data accessibility and collaboration.

This new architecture ushered in an era of enhanced interoperability and efficiency, ensuring that different departments and teams could interact with the codebase in ways that were previously unplausible. 

Moreover, the modular design meant that updates or changes could be implemented in one module without disrupting the overall system's functionality. It’s akin to upgrading the engine of a car without needing to redesign the entire vehicle. This flexibility ensured that our client's technological infrastructure could evolve alongside their business needs, without the typical growing pains associated with scaling up.

 

MODERNIZE TEAM STRUCTURE

Under the initiative to modernize team structure, our support facilitated the seamless separation of departments, notably transitioning a data engineer into a distinct department outside the analyst bullpen and into DevOps. This strategic move not only expanded the team but also optimized operational efficiency. Leveraging our expertise, we provided guidance on best practices and fostered interdepartmental communication, ensuring a smooth transition process. 

REDUCE CODEBASE

The code base underwent a significant streamlining, being pared down by at least 60%. In the realm of software development, it is quite common to incur what is known as technical debt. This term refers to the implied cost of additional rework caused by choosing an easy, quick solution now instead of using a better approach that would take longer. Technical debt can accumulate over time as temporary fixes, outdated technologies, and hurriedly written code pile up. Our team diligently worked to reduce this debt for our customer. We undertook a systematic review and reengineering of the codebase, refactoring all verbose, obfuscated, and “spaghetti” code, removing temporary fixes and updating technologies. This proactive approach has not only enhanced the performance and reliability of the customer’s software but also erased the technical debt, thereby providing a more robust and efficient product for our customer.

Business Impacts

This case study illustrates a successful migration from on-premises infrastructure to Snowflake, driven by the organization's commitment to accessibility and operational efficiency. The decision to transition was underpinned by the need to address frequent disruptions and bottlenecks inherent in the on-premises system, notably eliminating the reliance on traditional telephony methods. Leveraging Snowflake's advanced capabilities, the organization redesigned existing open-source code and architectures, particularly harnessing Python, to accelerate the data ingestion pipeline. This re-engineering effort resulted in a remarkable performance improvement, boosting the speed of data processing from 10 to 40 times faster. With a clear focus on enhancing accessibility and efficiency, the business successfully distributed the workload and democratized access to data and algorithms. Through this strategic migration, the organization not only achieved its technical objectives but also positioned itself for greater agility and competitiveness in the data-driven landscape.

 

 

 

Want to know more

About Us

Connect with our team to discuss your organization's data strategy.

Contact Us

Our Subject Matter Experts can help you to understand how we can help you meet your business goals.