A commercial, ‘scan and store’ document service provider in the UK approached Fountech Solutions, with the idea of improving the speed and efficiency of their services to enhance profitability by reducing labour costs & time.
The organisation was involved in receiving paper and e-format files from a variety of commercial organisations, which are obligated to store documentation for legal purposes for at least seven years. Businesses use third-party digital storage solutions to avoid having to store paper records, because they are a fire-hazard and take up a lot of physical space. Furthermore, the third party can retrieve a document for a client much more efficiently than looking through archives manually.
The client’s solution would automatically index all documents passing through its system, however, some documents couldn’t have their class predetermined, so a human would need to manually read and classify each of those ‘strays’. Obviously, this exposed the system to human error, potentially meaning that each and every document would not receive equal consideration. Furthermore, the cost of classifying manually is unsustainably high.
Traditional (non-AI) software couldn’t provide an adequate solution because it requires a given doc’s structure be rigidly defined; this is inherently problematic because those ‘fuzzy’ classifications might well be inaccurate. As there are an infinite amount of features that may combine to define a class, the problem becomes ever more complex. Also, there is no industry wide standard taxonomy, so each company has its own classification structure. An identical document might be classified differently depending upon which organisation owns it.
AI driven software was required to deal with the complexities of so many database (DB) queries in short spaces of time. In order to make our solution flexible, the solution was nurtured in Fountech’s labs, with each theory tested in practical terms at the clients normal places of activity. This proved to be an invaluable strategy, ironing out problems as they arose, rather than on the first day the new solution was installed.
Production of a model - The classifier’s role is first to train itself on all the documents already known to be in a given class, so that it can model weighted features within them.AI is adaptive, it doesn’t just model class features as they are, it can adapt, retraining to incorporate changes that may occur overtime, whereas non - AI input remains static.
AI is Perceptive - Even the most minute changes are detected and modelled.These are likely to go unnoticed by humans and a combination of small changes can influence the outcome of a close-call classification. A weight is calculated for each class feature depending on the likelihood of finding that feature across all classes; this carries more weight if it appears frequently in its own class and rarely in others. This list of feature-weights for each class is referred to as a data model.
Database Querying - The solution connects to a DB with multiple connections so that batches of documents can be retrieved simultaneously.
Text Pre-processing - Like humans, AI will get confused if meaningful text is hidden amongst irrelevant or ‘noisy’ text. ‘Stop-words’ are the most common form of noise – those which link meaningful words(e.g. “and” or “the”). Thus, a document needs to be stripped of all ‘noise’ before processing.
Document Deconstruction - There are infinite ways to re-interpret text and derive meaning. The most valuable approach is to split documents of each class into their comprising words so that a vocabulary is generated per class. Capturing more quality features during deconstruction is vital for training an accurate model from a given lexicon.
Activity Logging - Log all actions taken and results generated for trace-back purposes.
Real-world testing of the solution - During the development cycle we held multiple functionality test demos for the client. Due to data-privacy issues, the client couldn’t provide us with direct access to their databases without a representative of their personnel in attendance. As the tests were performed using the client’s hardware, we found that initial processing was simply far too slow and taxing on servers, requiring us to redesign. However, once we streamlined the classification verification querying system, the situation was alleviated immediately, and results quickly improved.
Our client reported that, as time went on, the solution became ever more efficient. Fewer human interventions to classify documents were required, and retrieval times became faster as the accuracy of document classification into categories improved.
Many more enhancements are planned for the next generation of the Fountech solution, after further client feedback is received. These improvements may include AI-derived text-patterns, and a resource-aware classifier that obviates the need for the client’s personnel to be in attendance for overseeing data privacy. Most interesting is also the facility to split document sets into ‘batches’, so that each batch can be processed in parallel, utilizing all of a server’s available processor resources.
The vCAIO will conduct a situational assessment to identify the needs of, and opportunities for, the organisation. They will provide a report outlining a gap analysis and the baseline from where to start defining the AI strategy going forward
Create a vision of where the organisation should be going
Set standards for strategy implementation through establishing the correct management structures and considering ethical issues
Make things happen by operating at the right level, knowing what to outsource and what to achieve in-house for optimum delivery
Build capacity in the organisation by recruiting the right talent and identifying appropriate vendors or consultants to solve key problems
Become the public face of the new technology, both internally and externally
Understanding your needs and identifying opportunity to exploit
Evaluate your data
Knowledge Gaps | Costs v Benefits | Your Competitors
Latest Industry Tech and how we can enhance it just for you
Our key recommendations and suggested next steps