read 7 mts

The Client: A large integrated steel manufacturer.

Business Problem: The steel being produced must meet a set of composition tolerances, and by optimizing/adjusting the operating parameters inside the blast furnace the composition of steel can be controlled. The client wanted to predict the composition of the output in near real-time using operating parameters as an input to the model.

Our Approach: Insofe built a predictive model to predict the composition of the steel (Carbon, Silicon content) based on a large number of operating parameters.  Our model building process included understanding the chemistry of steel production and nature of relationships, selecting the model structure to suit the underlying dependencies and training the models.

Results:  For 85% of the cases, the prediction was within the RMSE desired by the client.


The Client: North-American clothing retailer. The retailer had brick and mortar stores across the country in addition to an online store

Business Problem: Shipping costs are a significant factor in impacting the profit margins of e-tailers.  Retailers who have a physical presence, as well as an online presence, have the option of shipping items from a store in addition to shipping from one of the central warehouses.  The decision of how to best fulfill an order is complex due to the number of considerations involved.

In clothing, it is common for as much as half the orders to have more than one item. Also given the variety of items available online, it is frequently the case that there may not be any store that has all the items in an order that may not be in stock.  In such a case, the order needs to be split across multiple stores.

There are multiple considerations:

  1. Reduction of the time for the goods to reach the customer.
  2. Shipping costs, which is dependent on distance and package size for each shipment
  3. Minimizing the number of separate shipments  – receiving multiple packages at different times for the same order tends to annoy customers.
  4. Inventory optimization – Shipping items from stores that have excess inventory reduces inventory costs and reduces the chances of out-of-stock situations in stores.

There are naturally trade-offs, reducing transit time might imply shipping the order in 3 shipments from 3 nearby stores, but the shipping cost will be high and the customer may be annoyed by 3 separate deliveries for a single order. On the other extreme shipping, the entire set of goods in one shipment from a central warehouse may be ideal to reduce the number of shipments, but it does not allow for store inventory to be best utilized. Arriving at the best solution is requires allowing the business to decide the tradeoffs by setting weights for different objectives.

Our Approach: Our approach involved:

  1. Developing a suitable objective function via simulation. We developed a simulation framework that allowed the business to test-drive the optimization and experiment with different relative weights for the various objectives.
  2. An interface to allow the user to tune the weights of different components of the objective function.
  3. Application of business constraints and search for feasible solutions.

The business objective was a combination of  the number of separate shipments,  shipping time, shipping cost, order margins, and store inventory levels

Deployment: The algorithm would process a batch of new orders every 30 mins to an hour. No human decision making was involved.

Results: About US$1M in savings per year through a combination of improved order margin and shipping costs while minimizing the number of split orders (separate shipments to a customer).


The Client: A Fortune 100 pharmaceutical manufacturer.

Business Problem: Honesty in performing and reporting data and results from clinical trials is key to the advancement of medical knowledge. In clinical trials, volunteers are given a treatment whose effectiveness and risks are to be ascertained.  This requires careful controlled experimentation, close monitoring of the volunteers and diligent recording of measurements. There is a huge dependency on honesty and trust in the individuals or organizations that are tasked with performing these clinical trials.  The article : ( ) describes some publicized cases of fraud in clinical trials involving falsification of data ranging from minor alterations to complete fabrication of the data!

Picture credit:

To identify potential falsifications of the data, one has to rely on the presence of some regular patterns in genuine data.  Various types of patterns exist, these concern the distributions of each variable,  correlations among variables, the multivariate structure of the data and patterns across time. Typically, a clinical trial is run across multiple centres with identical protocols.  When data is falsified at one centre,  some of those natural patterns are likely to break down unless data is very carefully falsified.  Sometimes, the data may not exhibit the natural randomness one would expect to see in genuine data. For example, in one well-publicized case that is mentioned in the article referred above, a Norwegian physician and researcher was found to have completely fabricated data – 250 out of the 908 subjects in his data had the same date of birth!

The client desired an improved alert system for a specific type of fraud that can happen during clinical trials. The existing model was detecting only a fraction of the fraud. They wanted a more comprehensive system

Our Approach: We performed an initial study which identified that the current model and evaluation metrics used by the firm were most likely inadequate and cannot reveal a substantial portion of the fraud.  Work was done to

  1. Redefine more effective and custom evaluation metrics
  2. Prototyping ML model and testing
  3. Building the entire production system
  4. Developing full-fledged visualization and reporting system for decision-makers

The models had to comply with legal regulations that excluded the use of certain historical data or excluding the use of certain types of analysis. In our case, we could not make use of patterns over time in order which would have disclosed whether the subject was in the placebo group or in the test group.

Deployment: We delivered and deployed an E2E system that did the data pull from the database, ran ML models to generate alerts and push them back to the database and a visualization layer that provided users (including auditors) with the ability to perform interactive analysis of the data in the context of an alert.

The complete project lasted a little over 6 months, with the client rolling it out across the entire organization.

Results: A complete production system was delivered and implemented on the client’s assets.  The IP and knowledge were fully transferred


Picture credit:

The Client: A large children’s education and play toys company

Business Problem: In the consumer products industry, customer loyalty is a key driver of long term profitability. It is much more expensive to acquire new customers than to retain existing customers.  Competitors are constantly trying to make customers jump ship by offering attractive incentives.

With businesses constantly trying to steal share from competitors, it is key for businesses to know who their most valuable customers are so that they keep them satisfied and make appropriate interventions when there are indications that the customer may be changing loyalties. It would be foolish to try to retain customers who are not very profitable, the cost of the intervention may not justify the return.

The hard part is to be able to distinguish customers who are going to be valuable in the long term (say 2 years ) versus those whose profitability over two years would not justify attempting to retain them.

The concept of customer lifetime value (CLV) is intended to capture the long term profitability of a customer.  For customers with higher CLV, a higher amount of spending on loyalty programs and interventions is justified.

Estimating the CLV of a customer is non-trivial. The data that is available is recent spending history (i.e. their transactions).  However, particularly for new customers, recent spending alone may not indicate their long term spending potential.   In the case of education and play toys, the ability to provide age-appropriate toys from infancy to adolescent years of a child means that loyalty can translate to significant profitability.  Here, the customer is the parent and the arrival of a second or subsequent child is multiplies the profit potential.

A key to estimating customer lifetime value is to identify features from transaction data that are key indicators of a customer long term spend potential.

Our Approach: Our approach involved going beyond the industry-standard general-purpose features of Recency-Frequency-Monetary value to identify additional features that matter in the play toys category and which are indicative of long term spend potential. For this, we used customer data spanning multiple years to predict long term spend.

We built a state of the art machine learning model that bucketed customer’s into seven loyalty categories. The key steps involved

  1. Working with business to identify potential features
  2. Feature refinement to finalize key features that are strong predictors of long-term customer spend.
  3. Building an ML model to perform the prediction.
  4. Integrating it into their Big Data environment and Tableau visualization engine

Deployment: We delivered and deployed an E2E system with a visualization layer.  We also generated rules out of the ML model which provided extremely interesting insights to the business.

Results:  Some really interesting features were designed.  Some of them were game changers and increased accuracy substantially.  INSOFE was able to design a framework of feature engineering based on the learnings of this project which now teaches business managers.

  1. Sri Kiran
  2. ravali
  3. tejaswiniteju
  4. ravali
  5. pallav

Leave a Reply

Your email address will not be published. Required fields are marked *