We have seen a pivot in interest in Machine Learning in the last couple of years. It would seem that everybody is interested in the current and future potential of this technology.
Artificial intelligence (AI) is a field in computer science that is concerned with utilizing the computational power of computers to better solve advanced problems in science and society using intelligent means/algorithms. It can be broken down into different sub fields, namely: brain inspired AI (Artificial Neural Networks), community inspired AI (Swarm Intelligence), evolution inspired AI (Evolutionary Computation), Fuzzy Logic, and Immunological Computation.
Machine learning as a sub field of AI seeks to leverage intelligent techniques developed through AI to adopt models to certain objectives. These objectives include predictive, and prescriptive analysis. This article discusses a novel machine learning approach for performing predictions around the cashflow process.
This article will focus on introduction of the cash flow model and further articles will go into more details regarding the algorithms, implementation details, and accuracy metrics.
Majority of businesses’ objectives are to make profit through good financial management of income and expenses. Cash flow management is all about the ability to understand or estimate the amount of cash the company will have on hand at any time. Thus the ability to predict cash inflows and cash outflows and evaluate whether shortfall or surplus will occur is a very important part of organisational finance and payment behaviours are driven by this knowledge. Optimization of activities around this cashflow management process is key to producing maximum and efficient cash utilization, balanced cash reserves, and identification of potential cash shortfalls.
The need to correctly predict cash flow at any stage establishes a basis for integration of artificial intelligence. The use of artificial intelligence helps financial subject matter experts to understand all the moving parts within the process including behaviour, payment trends per customer at product level and so forth. The ability to correctly predict cashflow leads to better financial management, investment decision, improved payment tracking and overall payment behaviours and ultimately increased profits.
In the first iteration of our cashflow prediction model we focus on four (4) areas:
We aim to answer the following questions with our model:
When we look at cashflow from a company’s perspective, each customer has a certain culture or habit associated with how they pay their invoices. Some have payment patterns that are easily observable, and some quite hard. In between the two classes of customers lie some who’s patterns are somewhat fuzzy. These particular type of customers emulate patterns that are sophisticated enough that left to a human being to predict, it would take them a long time. If the same expertise or skills for performing such a task can be built into an algorithm, a computer can perform the same task independently at a much faster and error free rate than a human being can. As a result, augmenting the performance of the human being in the task.
The remainder of this article is organized as follows: The data platform which our algorithms interact with is outlined, followed by the algorithms used, and then the standards and frameworks circling the implementation.
We currently use more than 30+ technologies in our data platform so we will provide a brief of the platform with relation to the cash flow process.
Our analysis are based on data extracted from Sage Business Cloud Accounting (formally Sage One). Sage data is pulled into our scalable hadoop cluster where it is stored. Hadoop is a distributed and scalable data platform which has its own storage mechanism (the Hadoop File System — HDFS), which has been designed to offer optimized performance in storage, extraction, and computation. HDFS allows for storage of both structured and unstructured data.
We use a self-service data solution as an intelligent ETL tool for data extraction from the Hadoop environment into a form that can be fed into multiple data engines such as Qlik, Power BI, Tableau, and other visualization technologies. The ETL tool is similar to Apache Hive, however, with some additional benefits, especially around complex, and unstructured data processing. The tool provides a federated data layer to Qlik Sense directly.
Predictive analysis is a form of analysis where some known variable(s) are used to predict the value(s) of some unknown variable(s). For example, given previous quarter’s statistics, one can predict this quarter’s statistics, given that some consistency is observed.
In cashflow, our focus is to predict customer payment, supplier payment, and expense payment behaviours. When an invoice is issued to a customer, a due date is given, however, most customers do not pay their invoices on the due date. Therefore, given sufficiently enough invoices previously issued to the same customer, we can filter these invoices by sale or item type and project a dynamic pattern that emerges when we look at how many days it took them to settle each of the invoices. This pattern, which can be seen as an average payment behaviour, is used to predict when newly issued invoices would be paid. A lot of methods could be applied, however, one with minimal error remains key to excellent cashflow predictions, which is what we have achieved. The similar technique works for supplier invoices. Expenses follow different trends per expense account, and therefore require account description level predictions. A unique model has been developed to predict expense payment dates and amount, taking into account company growth rates.
In a classification task, the error is easily measurable, however, in a predictive task, not as easy. Therefore, at the same time the hypothesis is produced, the relative error metric has to accompany it. With invoice payment date predictions, the error is rather an interval as opposed to a single number or target. Modelling the customer invoice payment behaviour as a system, the mean range remains invariant over time, and thus, using properties of Ergodic Theory of dynamical systems with invariant measure, we were able to derive a fit that better approximates the pattern. This fit is build using machine learning (parameter optimization) techniques emanating from numerical analysis — a field in applied mathematics that is utilized mostly in artificial intelligence and machine learning tasks.
Our analytics work is done mainly in Python, however, we are able to build equally competent solutions in Java, Scala, R, C#, and C++. Python was chosen because it allows for implementation of complex algorithms with minimal code length, and coding time investments. Moreover, most of the work is I/O bound, and therefore depends lesser on the language’s performance.
Our analysis are performed with Python 3, and results pushed into the hadoop cluster where our self-service data product is configured to extract and make data available for visualization which is implemented with Qlik Sense.
“In God we trust, all others bring data.” — W Edwards Deming
Advisors: Improve your advisory and reporting services to your customers!
CEO’s & CFO’s: Do you run #SageBusinessCloudAccounting for your company?
Affiliate Marketers: Are you a marketing company or social influencer? Join our Affiliate Marketing Partnership Model today!
Wunderlist Task Management Analysis