connect with us

tell me more

case study

Document Classification

request call backsubmit
*Required Field
**We would love to subscribe to our newsletter via email. Please rest assured we’ll treat your information with the greatest care and will never sell it to other companies for marketing purposes. See full Privacy Policy.

Thank you.

Your message has been sent and a member of our team will be in touch shortly.
CLOSE
Oops! Something went wrong while submitting the form. Please try again.
X
download Pdf version
The World has changed - AI Proof of Value Service. Discover more

case study 

Document Classification

How Fountech Solutions automated and improved a file classification system for a commercial document storage organisation

sector 

Document Management

 / 

the client

A commercial, ‘scan and store’ document service provider in the UK approached Fountech Solutions, with the idea of improving the speed and efficiency of their services to enhance profitability by reducing labour costs & time.

The organisation was involved in receiving paper and e-format files from a variety of commercial organisations, which are obligated to store documentation for legal purposes for at least seven years. Businesses use third-party digital storage solutions to avoid having to store paper records, because they are a fire-hazard and take up a lot of physical space. Furthermore, the third party can retrieve a document for a client much more efficiently than looking through archives manually.

the problem

The client’s solution would automatically index all documents passing through its system, however, some documents couldn’t have their class predetermined, so a human would need to manually read and classify each of those ‘strays’. Obviously, this exposed the system to human error, potentially meaning that each and every document would not receive equal consideration. Furthermore, the cost of classifying manually is unsustainably high.

Traditional (non-AI) software couldn’t provide an adequate solution because it requires a given doc’s structure be rigidly defined; this is inherently problematic because those ‘fuzzy’ classifications might well be inaccurate. As there are an infinite amount of features that may combine to define a class, the problem becomes ever more complex. Also, there is no industry wide standard taxonomy, so each company has its own classification structure. An identical document might be classified differently depending upon which organisation owns it.

No items found.

We Consult. We Imagine. We Disrupt.

the solution

AI driven software was required to deal with the complexities of so many database (DB) queries in short spaces of time. In order to make our solution flexible, the solution was nurtured in Fountech’s labs, with each theory tested in practical terms at the clients normal places of activity. This proved to be an invaluable strategy, ironing out problems as they arose, rather than on the first day the new solution was installed.

Production of a model - The classifier’s role is first to train itself on all the documents already known to be in a given class, so that it can model weighted features within them.AI is adaptive, it doesn’t just model class features as they are, it can adapt, retraining to incorporate changes that may occur overtime, whereas non - AI input remains static.

AI is Perceptive - Even the most minute changes are detected and modelled.These are likely to go unnoticed by humans and a combination of small changes can influence the outcome of a close-call classification. A weight is calculated for each class feature depending on the likelihood of finding that feature across all classes; this carries more weight if it appears frequently in its own class and rarely in others. This list of feature-weights for each class is referred to as a data model.

Database Querying - The solution connects to a DB with multiple connections so that batches of documents can be retrieved simultaneously.

Text Pre-processing - Like humans, AI will get confused if meaningful text is hidden amongst irrelevant or ‘noisy’ text. ‘Stop-words’ are the most common form of noise – those which link meaningful words(e.g. “and” or “the”). Thus, a document needs to be stripped of all ‘noise’ before processing.

Document Deconstruction - There are infinite ways to re-interpret text and derive meaning. The most valuable approach is to split documents of each class into their comprising words so that a vocabulary is generated per class. Capturing more quality features during deconstruction is vital for training an accurate model from a given lexicon.

Activity Logging - Log all actions taken and results generated for trace-back purposes.

Real-world testing of the solution - During the development cycle we held multiple functionality test demos for the client. Due to data-privacy issues, the client couldn’t provide us with direct access to their databases without a representative of their personnel in attendance. As the tests were performed using the client’s hardware, we found that initial processing was simply far too slow and taxing on servers, requiring us to redesign. However, once we streamlined the classification verification querying system, the situation was alleviated immediately, and results quickly improved.

No items found.

the outcome

Our client reported that, as time went on, the solution became ever more efficient. Fewer human interventions to classify documents were required, and retrieval times became faster as the accuracy of document classification into categories improved.

Many more enhancements are planned for the next generation of the Fountech solution, after further client feedback is received. These improvements may include AI-derived text-patterns, and a resource-aware classifier that obviates the need for the client’s personnel to be in attendance for overseeing data privacy. Most interesting is also the facility to split document sets into ‘batches’, so that each batch can be processed in parallel, utilizing all of a server’s available processor resources.

request Pdf version

features

The vCAIO will conduct a situational assessment to identify the needs of, and opportunities for, the organisation.  They will provide a report outlining a gap analysis and the baseline from where to start defining the AI strategy going forward

Create a vision of where the organisation should be going

Set standards for strategy implementation through establishing the correct management structures and considering ethical issues

Make things happen by operating at the right level, knowing what to outsource and what to achieve in-house for optimum delivery

Build capacity in the organisation by recruiting the right talent and identifying appropriate vendors or consultants to solve key problems

Become the public face of the new technology, both internally and externally

discover

Understanding your needs and identifying opportunity to exploit

analysis

Evaluate your data

report

Knowledge Gaps   |   Costs v Benefits   |   Your Competitors

external

Latest Industry Tech and how we can enhance it just for you

Synopsis

Our key recommendations and suggested next steps

download pdf version

tell me more

case study

Document Classification

submit
*Required Field
**We would love to subscribe to our newsletter via email. Please rest assured we’ll treat your information with the greatest care and will never sell it to other companies for marketing purposes. See full Privacy Policy.
download pdf version
Oops! Something went wrong while submitting the form. Please try again.