Using statistical models to predict future sales

BrainTech, LLC: Using Statistics to build Models of Customer Behavior


Use Multiple Regression to develop models that forecast future sales


We use weather forecasts and time-series and seasonal trends for data analysis


Exploratory Analysis using an Iterative Process
 Home Online Tools Resources Log In 
Weather, Sales Trend demand forecasting
exploratory analysis of data using curve-fitting and regressionExploratory analysis has different meanings depending on the context in which it is used. For instance, the terms "Exploratory analysis" and "Confirmatory analysis" are frequently associated with a technique in statistics known as factor analysis. Don't be misled by this common association! When we refer to exploratory analysis, we are referring to a process used to select combinations of variables that are likely to be useful in creating a model that closely reproduce the actual (sales) data. Here is the problem:
  1. There are hundreds of variables that might combine to produce a model that closely reproduces the actual (sales) data. Variables include weather-related data, time-trends, holiday proximity, day-of-week variables, 2-way, 3-way, and 4-way interactions between all of the above (to name just a few!)
  2. Each variable may be correlated with many other variables. For instance, wind speed is correlated with wind gust speed, and may also be correlated with rain and warmer or cooler temperatures.
  3. At this point, some statisticians may be thinking, "Use structural equation modeling or factor analysis to extract a measure of 'bad weather'."  However, this approach is not appropriate, because...
  4. While there are hundreds of possible predictor variables, the number of data points available for analysis are often few (usually in the hundreds). Therefore, it is of critical importance that we avoid simply capitalizing on chance. In other words, models must be logical, meaningful, and likely to contain actual predictor variables (as opposed to containing a mishmash of variables that happen to be correlated with sales by chance).
Back to the home page Take me back to the home page - I already know this "stuff". :)
Try it out Skip the details. Let me generate my own models using the online tool.
Check regression status Is my online regression analysis done yet?


Our Solution:
  1. The exploratory process we use relies on a large amounts of computational power and the computer's ability to quickly generate semi-random and random permutations of variable subsets from the full list of possible variables (see diagram below).
  2. The process we use is considered intellectual property of BrainTech, LLC and cannot be disclosed (but it shouldn't be hard to figure out for those who understand mathematics and statistics).
  3. After exploratory analysis, we are left with a frequency distribution of subsets of variables.
  4. Using the principle of maximum-likelihood, we select subsets of variables that are likely to combine to produce a model that closely fits the actual (sales) data. * Some individuals have suggested using correlations and residuals to iteratively select the variables that produces a model that best fits the data. However, this approach would be similar to performing exploratory factor analysis with hundreds of variables using a tiny set of data (not a good thing)!

Frequency distribution of variable occurrence in promising subsets. If you're lost or confused, you're probably not alone -- here is an example and a diagram that should (somewhat) clarify the process:
  • Perhaps we have the following list of variables: HighTemp, LowTemp, WindSpeed, Rain Amount, Day-of-week (DOW).
    Some 2-way Interactions: LowTemp*WindSpeed, LowTemp*DOW, HighTemp*DOW, HighTemp*Rain.
    Some 3-way Interactions: LowTemp * Rain * WindSpeed, LowTemp * Rain * DOW.
  • Simplified version: After exploratory analysis, we may be left with a distribution that looks like image to the right.
fine-tuning regression model I meant to read about Fine-Tuning models (easier material)
Check regression status Is my online regression analysis done yet?
Back to the home page This was fun, but... Take me back to the home page :)

After this point, we select combinations of the most frequently selected variables and perform the process again. However, this time the initial weighting of each variable and order of insertion and tuning is "seeded" (partially determined, partially random) by the variable's frequency of occurrence. With fewer variables to consider, we build thousands of "mini-models" that fit a subset of the data and validate each model on the remaining portion of data (cross-validation). Based on the frequency of selection of each variable during this process (specifically, the algorithm looks for repeated convergence on the same set of variables), the final output is one reasonably-sized set of variables (on a good day) that are likely to produce a relevant, logical, potentially predictive model.

The offline version of our exploratory analysis tool repeats this process several hundred more times.   (The online tool repeats the process three times, due to cpu constraints).  Each repetition yields a single potential set of variables. These sets are passed on to the next stage: Fine-Tuning potential Models.

Home Page 
Try it Now
Account
Services
Weather
Resources
Contact Us
Free business advertising Home | Privacy Policy | Contact Us  
Copyright © 2006 BrainTech, LLC. All Rights Reserved