Although digitizing government has become easier, the amount of unstructured data that agencies hold remains a steep barrier to full transparency. Artificial intelligence could be the answer.
Despite years of investing in better storage and analytics, many organizations still struggle to make use of their data. Too often, agencies have an abundance of “dark data” — data that is undiscovered, underutilized or otherwise untapped. Even if these organizations have fully embraced digitization, one of the challenges is that much of their valuable data is trapped in documents such as contracts, invoices, policies and meeting minutes, and they have no effective way of getting it out and making use of it.
Government agencies have long struggled with how to make use of the unstructured data found in most documents. There are two general ways to address this issue, and both have serious shortcomings.
One option is to manually extract data from traditional electronic documents, such as PDFs, Word files or HTML documents. For structured data, such as the amount owed on an invoice, this may be simple and automated. But for unstructured data, it is less straightforward. For example, an invoice may include a description of the services provided. To process this information, project managers must review and verify whether the services align with work on an approved contract and describe work that was actually performed. This likely involves reviewing multiple other documents, all of which also involve unstructured data, and may require the specialized skills of additional government workers, such as lawyers or procurement officials.
The other option is to use structured documents — electronic documents where the various elements of the document have meaningful labels. The most common method would be to use a standard like XML. In XML, the creator of a document can use a schema that defines the elements in the document, the data types of those elements, and any defaults or attributes of those elements. Unfortunately, creating structured documents can be tedious and technical, and changes to schemas must be closely monitored and validated, otherwise nothing may work.
Artificial intelligence is creating a new option for organizations to make better use of data in their documents. Using natural language processing, deep learning and other methods, AI can help recognize and categorize data in documents and then mark up that data to create a structured document.
The challenge is not just extracting data from documents, but obtaining data and metadata to create meaning so that information can be understood in context
Using AI might help solve this problem. For example, if analysts are searching through thousands of unstructured medical documents for the word “penicillin,” they are able to distinguish between those instances where the drug is listed in reference to an allergy and others where it is listed as a prescription.
For government agencies, this opens up new possibilities because more semantic data could help an agency not only better manage a wide variety of documents, such as invoices, contracts and proposals, but also eventually use the technology to answer questions using the data contained within them.
Summarized from www.govtech.com