November 2, 2022

5 Steps to Get Started with Enterprise Analytics

For companies looking to digitize business operations, custom data analytics are often used to distill key financial and behavioral trends that lead to improved executive decision making. Some examples of common analytics might include trend graphs and automated reporting, Natural Language Processing (NLP) to extract business entities from reviews, or heatmaps describing population densities in physical business locations. However, getting started with analytics can be daunting for several reasons. Many companies have little previous experience with analytics beyond Microsoft Excel. Data sources can be messy or hard to come by. Some companies fear that the maintenance costs and complexity of a cloud analytics platform may be too expensive. However, all these concerns can be overcome.

‍

These five steps may be helpful in resolving these key tensions and launching an analytics program:

‍

1. Prioritize short and long term analytics goals

‍

An analytics platform should be purpose-driven, which is why it is important to have a clear sense of the business needs the platform is designed to address. Distinguishing between short term and long term goals is key, as iteration is required to build a strong analytics platform. Short term goals can be used to develop relatively low-cost, quick wins and build the basis for later development of advanced analytics.

‍

An example of short term goals might be simple dashboards showing activity metrics related to core business drivers, for example, productivity per employee, time of day, and stage of production, to help highlight bottlenecks. These sorts of analytics can be prototyped quickly and then migrated to the cloud when ready, once a company is ready to commit to a cloud service.

‍

Longer term goals typically focus on generating new insights that wouldn’t be possible without advanced analytical methods, such as monitoring equipment health in real time, predicting future demand, or mapping the sentiment of users. These may require the use of machine learning, computer vision, network analysis, or other advanced techniques that can unlock new insights.

‍

2. Prototype

‍

Prototyping naturally follows the process of defining analytics goals. It involves taking the short term goals established in the previous step and translating them into quick, business-driven wins. Prototyping can involve work both inside and outside of the cloud. For example, in recent years, the process of creating dashboards has been accelerated by open source frameworks built in Python or R. These dashboards can be very useful even in so-called “small data” settings where a small prototype dashboard is hosted on an analyst’s machine, drawing data from informal sources, such as a shared company drive.

‍

Dashboards are also easily integrated with cloud data as a first line mechanism for visualizing and conveying insights. For most projects, there is often a relatively low cost option to exploring these prototypes, which can make them appealing to implement. Data goals are abstract and nebulous, while a prototype is a tangible asset that can be evaluated by stakeholders and lead to real-world traction.

‍

3. Hire for data skills

‍

Staffing a tech team to power a company's data-related capabilities is a difficult step - it is deceptively technical. Effective staffing first requires the soft skill of translating high level data-related goals into technical objectives, which must be translated to appropriate roles that are capable of executing the technical requirements.

‍

If there is no person in-house who is capable of performing those judgments, staffing decisions will likely involve hiring a lead technical person who has business acumen. A technical lead should make wider personnel assessments for the technical needs of the project, while also exercising sensitivity to budget constraints.

‍

Generally, necessary skills for an analytics team include:

Familiarity with the major programming languages associated with analytics (Python and R)
Familiarity with plotting and other forms of data visualization
Table cleaning and manipulation, and/or statistical analysis.

‍

Additionally, a technical lead in charge of a data analytics project should be able to perform implementation related work while staffing the team.

‍

Rather than hire talent in-house, a viable alternative is to staff a project through consulting services. The benefit with this hiring approach is that consulting services, when chosen correctly, outsource the complexity of the hiring process, due to the fact that technical experts exist within consultancies. This in turn can lead to quicker movement on short, medium and long term goals depending on the length of the consulting engagement.

‍

It is important for businesses that take this route to carefully consider future maintenance beyond the duration of work with the consulting agency. However, a good consulting agency may also provide guidance on these considerations, particularly assisting with hiring staff to maintain the solutions.

‍

Lastly, it is important that the need for hiring and staffing be continually reassessed as new data goals become apparent.

‍

4. Build a robust technology stack

‍

Given the importance of prototyping to assess the value add of technological ideas, the need for a robust technology stack is required to facilitate both prototype and production development. Initially, short term goals and prototypes can be successfully created using technology and data held in a purely local environment, ( i.e. prototyping “small data” scenarios). However, a more sophisticated technology stack will become the most efficient option as the organization’s data maturity and demands for analytics increase.

‍

Choosing a cloud platform at this stage is often necessary. In general, AWS, Azure, Google Cloud, and other "big players" will likely be comparably effective for most cases. Since these services are typically pay-as-you-use, there is very little risk to choosing one cloud platform over another at this stage; the differences amount to AWS having the most services and being the most mature platform, while Azure is also relatively mature and somewhat cheaper on average than AWS or Google.

‍

Beyond cost, more granular differences between cloud platforms are apparent when deciding on the major elements of a technology stack. While not an exhaustive list, some core elements of a technology stack include:

The data management layer: The database or other data storage software that will house the data. Will it be a relational database such as PostgreSQL, a NoSQL database such as MongoDB, a graph database such as neo4j, or some combination of all three?
The code-based ETL/ELT pipelines: These pipelines are closely connected to the data management layer and involve operations to extract, load, and transform data so that it is in a more digestible format for analysts to use. One example of a common ETL tool is Spark.
The analytics engine: This layer involves the statistical or programmatic components that live on top of the data and distill the data into insights. This layer might involve the use of machine learning, automated reporting, or a combination of both. Additionally, technology might involve the cloud platform’s API service to enable interacting with the analytics engine, and Python or R performing the statistical calculations behind the API.
Front end components/user interfaces: This layer determines how the data will be presented to end users. Some common technologies are R’s dashboarding software, Shiny; or Plotly, a visualization software in Javascript with bindings in both Python and R.
Access control: This layer involves permissions around both external application use, and internal data use.

‍

5. Implement data governance and data quality procedures

‍

Data governance is extremely important because proper permissions and policies help to ensure the quality, integrity, and retention of a company’s data. Failure to institute data governance and quality procedures could lead to vulnerabilities that impact user experience, could contribute to network security failures, data breaches, and other unnecessary business risks.

Access control is typically considered a key component of data governance. There are several models for data access control, but a common one is Role Based Access Control. In Role Based Access Control, data access is built around predefined roles and their permissions. For example, a company might establish a “data scientist” role with access to read and modify data and raw code, while a “consumer” role might only be able to read a few specific tables of data on a platform’s front end.

‍

Other governance controls may be used as the organization becomes more mature and addresses data quality. Data quality checks may include uniqueness and consistency check features - for example, a check might be implemented that ensures no user gets duplicated in a table of users within a back-end database. Consistency measures ensure that select fields are always satisfied (such as ensuring no periods are in a username, or that database usernames match other existing records).

As the adage goes “garbage in, garbage out”, and machine learning models need a “diet” of very clean data.

‍

Iterate

‍

A useful analytics platform will be continually refined as existing data-related goals are completed and new goals are brought into focus. For example, if initial analytics goals involved prototyping an exploratory analysis to visualize revenue drivers in detail, the next step might involve transferring the interface to the cloud, and delivering credentials directly to stakeholders to explore the platform at their convenience and monitor changes and trends. As previous generations of prototypes are completed and productionized, new data related goals should be considered for the next generation and scaled towards strategic business goals accordingly.

‍

With each iteration, new data may require additional governance mechanisms, and depending on the level of effort required, new staffing and deadlines should be carefully considered to meet stakeholder expectations.

‍

Often, an incremental and iterative approach is best to building a new analytics platform, which helps lay an important foundation for a company beginning its tech transformation.

‍

Want to work with Plainspoken?

Plainspoken has extensive experience designing and implementing analytics capabilities using the above steps. Our technological transformations have taken clients to the next level, driving entry to new markets and enabling new organizational capabilities. We’d love to do the same for your business.

‍

Get in touch with us today if you’d like to learn how we can help your organization implement custom data analytics or other data tools.

‍

About the author:

‍

Em Beaman is an experienced Data Science practitioner with 5 years of experience in both predictive modeling and data engineering. Her predictive modeling knowledge ranges from traditional statistics and reporting to Deep Learning and Computer Vision. When she is waiting for models to train she likes taking long walks on the beach. She graduated from Georgetown with a Masters in Statistics in 2018.

‍

5 Steps to Get Started with Enterprise Analytics

Discuss a Project with Plainspoken