In a nutshell, Cognition adds word meaning to text oriented computer applications, providing actionable content based upon semantic knowledge. Cognition identifies “tank” in “tanks used in WWII” as a type of military vehicle, and “tank” in “20 gallon salt-water tanks” as a container with synonyms such as “aquarium”. This is accomplished without significant increases in text processing time or the size of any indexed content stored on disk. Please see What We Do for more information.
Cognition's Semantic Natural Language Processing™ technologies can be applied to virtually any text oriented computer application to provide the added power, precision and recall that comes from using word meanings as opposed to keywords or pattern matches.
Existing Cognition Applications
-
A suite of tools to create, manage, search and leverage a concept (meaning) based index of inter/intranet content.
-
Concept driven Topic Discovery and Document Categorization
Tools to automatically identify key document content directly in terms of concepts in Cognition's Semantic Map or mediated by a 3rd party topic/vocabulary resource.
-
Concept driven Document Foldering
Tools and process to sort documents into folders given conceptual relevance to a set of topics or queries, culling out the likely irrelevant documents and reducing manual review time and cost, typically by 50%
-
Concept driven Document Similarity
Tools to identify and rank documents/text similar in conceptual content to a target query or text.
-
A suite of reports Cognition generates detailing the frequency, statistical significance and conceptual saliency of word strings, stems, phrases, word meanings, synonym sets and ontological categories (e.g., “dog” for “poodle”, “rottweiler” etc.) in a document base.
-
Cognition professional services and consultation in tailoring the Semantic Map to the particular vocabulary/concept needs of a specialized project.
Applying Cognition to Your Technology
Companies using Cognition's Semantic NLP win because:
- Their technology products are made smarter, more interactive, and provide a richer user experience.
- They can differentiate their user experience due to simultaneously increased precision and recall of text and documents.
- They will increase their revenue when a richer user experience retains users, thus increasing the lifetime value of the end user.
- They will be able to lower attorney review costs by nearly 50% and minimize the legal risk of missing relevant data.
- They will significantly reduce customer care call center response, maximize self-service support and increase customer satisfaction.
- They will increase the quality of systems that center on ensuring an organization is compliant with regulatory agencies.
- They will nearly double the quality of machine translation.
- They will increase their revenue when Cognition's Semantic NLP is applied to e-commerce applications and monetization strategies.
- They will increase their revenue when Semantic NLP is applied to ad placement, increasing click-through rates by matching ads more often, and matching ads to more relevant queries (or retrieved documents).
CognitionAPI™ – CognitionAPI™ gives technology products a simple, yet advanced integration for Cognition’s Semantic NLP software. This is also used by enterprise customers who have an internally developed data repository or content management system. Cognition offers Application Programming Interfaces (API) for the central components of the software including the Search, Index, and Review functionality within the system. Sample scripts and API libraries for C++, Python, Perl, Ruby, Java, VB/ASP/ActiveX, and PHP4 are available.
CognitionAPI comes in four flavors:
- ClientIndexAPI.h and its corresponding dynamic library for C++ programmers (client server)
- DirectIndexAPI.h and its corresponding dynamic library for C++ programmers (direct, non-client server)
- IndexAgent.ocx for Windows users who desire a COM object interface (client server)
- CognitionSolo.ocx for Windows users who desire a COM object interface (direct, non-client server)
The client server APIs function similarly to the direct indexing APIs, but they interact with one or more instances of Cognition applications on servers in your network to generate indices.
Semantic Search
CognitionINDEXER™ – The Cognition Indexer creates searchable ‘indices’ of concepts extracted from user source documents. Supported document types include: HTML, XML, plain text, Word, WordPerfect, RTF, PDF, Power Point, and many other common document formats.
The CognitionINDEXER™ is one of the most advanced indexers in the world. It reads each sentence, phrase, and word in the dataset. It assigns meaning to each word based upon context, and interprets capital letters, acronyms and ambiguous words. As many as 15 meaning attributes or values can be assigned to each word.
CognitionSEARCHER™ – Cognition’s Semantic Search engine gives users access to the indexed data using complex natural language queries, standard Boolean, advanced “Linguistic Boolean”, fuzzy, pattern, and Soundex name search queries. It also enables relevance ranking, search term highlighting and hit-to-hit navigation.
CognitionBROKER™ – The CognitionBROKER™ allows users to optimize the network performance of Cognition’s Semantic Search engine software in a large, multi-server environment. It automatically load balances, provides fault tolerance by automatically compensating in case of server failure, and automatically discovers network layout and any new server. CognitionBROKER™ also allows you to start or stop servers at any time without modifying any configuration files.
CognitionSPIDER™ – Enterprise customers and technology companies may wish to include external, typically Web-based, data in their internal data repository. CognitionSpider is an application which enables the user to crawl external data and absorb it into their internal repository.
CognitionINTERFACE™ – This is a sample interface that is a web page to be used as an interface to CognitionSearch. This can be used as is, or modified by the client as desired. It includes the method for sending queries to CognitionSearch, displaying results with highlighting of words and phrases in the retrieved documents. It also has the methods for displaying the senses and definitions of query terms chosen by CognitionSearch, for accepting changes in word senses from the user, the method for displaying spell-checking information and the method for accepting spelling choices from the user.
Concept driven Topic Discovery and Document Categorization
CognitionTOPICDISCOVERY™ – The Cognition Topic Discovery module provides a document by document analysis of the significant conceptual content in a project, essentially providing an overview of what each document is about. This module identifies every concept in a document and calculates a salience score (how important is this concept to the document). Reports can be configured to show all concepts ranked by salience score, or just those scoring above some predetermined threshold. Since this module leverages Cognition's Semantic Map, words related via synonymy (e.g., “deed” and “title”) or ontology (e.g. “heart” as a “circulatory organ”) can be automatically grouped together in determining a concept's saliency score.
CognitionCATEGORIZATION™ – The Cognition Document Categorization module functions similarly to the Topic Discovery module, but rather than automatically identifying topics in project documents, it scores each document with respect to a predefined set of topics. Topics can be framed using the vocabulary in Cognition's Semantic Map or the vocabulary defined in an independent 3rd-party resource. Either way, topic salience is scored using Cognition's Semantic Map, so conceptual groupings based on synonymy and ontology within the Map are automatically leveraged.
Concept driven Document Foldering
CognitionFOLDERING™ – Cognition Document foldering assumes that there has been a prior study of the issues in the documents (usually a legal case) and the kinds of documents that are likely to be relevant. Once the issues have been identified, foldering queries are formed. A foldering query is a complex concept Boolean that expresses the complexity of the issue and the various concepts surrounding it. For example, in the Enron case a foldering query around fraud might be
“(Enron executives) AND ((lie to the investing public) OR (cover up information) OR (shred documents) OR (inflate profits in SEC filings))”
Each part of the Boolean is interpreted linguistically and automatically enhanced with alternative expressions. For example, “Enron executive” would find “Kenneth Lay”, etc. The foldering queries are submitted to CognitionSearch and the result and saved in concept-oriented folders
Concept driven Document Similarity>
CognitionADMATCHER™ – The Cognition Ad Matcher interprets the meaning of queries, documents, ad phrases and ad copy. Using meaning as the medium, this ad placement technology maximizes the number of ad phrases matched, and minimizes the number of poor matches to ad phrases.
CognitionMORELIKETHIS™ – Cognition More-Like-This interprets the meaning of documents and can find documents with similar content. The basis document can either be a document in a Cognition index or any document identified by URL. Unlike other document-similarity systems, Cognition More-Like-This is based on conceptual similarity rather than straight similarity of the strings of texts in the documents.
Concept driven Text Analytics
CognitionTEXTANALYTICS™ – Cognition Text Analytics are a collection of reports documenting particular statistics on the content of a project. Content is reported from the level of the string (keyword / pattern match) up through the most comprehensive conceptual groupings (synonymy and ontology).
- Content:
- words – word strings (e.g. tanks)
- stems – word strings reduced to their morphological bases (e.g. tank)
- senses – individual meanings of ambiguous words (e.g. tank2 : container)
- phrases – multiword sequences (e.g. fuel tank)
- concepts – synonym sets and ontological groupings (e.g. fish tanks, aquariums, vessels, containers)
- Statistics:
- frequency – number of times a reported element occurred
- significance – the occurrence of a reported element factoring out chance (applies primarily to phrases)
- salience – the occurrence of a reported element versus its expected occurrence given a baseline project
Semantic Map Customization
CognitionLEXICON™ – The lexicon is the heart of Cognition's Semantic NLP technology. It includes:
A Lexical Dictionary which defines the meanings of each word using morphological, syntactic, taxonomic and semantic features, enabling the software to select word meanings, recognize various forms of a given word and to parse phrases. Alternate spellings and common misspellings are also evaluated. This dictionary includes over 510,000 stems, covering all of the lower-case words of English, as well as tens of thousands of proper nouns, and 540,000 concepts. In combination with the morphology algorithm, the software recognizes over 1 million word forms.
An Ontology or vast “Tree of English” broadens the search due to knowledge of the linkages between general word senses and specific word senses. This technology enables the computer to “reason downward”, allowing a search for the general term like ‘money’ to also find information about specific terms like ‘dollars’ ‘euros’ or ‘yen’. While very complete, covering over 540,000 concepts, the Ontology (or taxonomy) is extensible to accommodate a specific customer’s terminology.
The Meaning Thesaurus groups word senses that are loosely synonymous. For example, ‘column’ (in one meaning), ‘file’ (in one meaning), ‘line’ (in one meaning) and ‘queue’ (in one meaning) are concepts in the same concept group. The thesaurus has over 75,000 such groupings.
CognitionCUSTOMIZATION™ – This is the Lexical Customization that includes a) text analysis tools; b) semi-automated lexical acquisition (a professional service); and c) client-controlled lexical customization. If the client has under one hundred words that they would like to add or modify, there are client customization files that enable a client to add words and phrases in a taxonomy, choose a desired sense to be chosen for a word in their domain, and prefer a name or non-name interpretation of a given word. If the client has a larger amount of vocabulary to add or modify, they can order service from Cognition Technologies to analyze their text for words unknown to the Cognition lexicon, and to add those words to the lexicon in a semi-automated way.
Other Existing Applications
CognitionPARSER™ – The Cognition Parser assigns grammatical structure to sentences, improving the interpretation of word meaning through grammatical and semantic relationships in the sentence. This is also used to rate the semantic and syntactic plausibility of the output of hypothesis-generating software such as machine translation or speech recognition software (See Cognition Ranker).
CognitionRANKER™ – Translation software from foreign languages to English use statistical algorithms that can only guess what the translation might be from statistical similarity to known translations. This software produces hypothesis translations in a rank order. The Cognition Ranker tries to parse the hypotheses, and ranks them for semantic and syntactic plausibility, improving the final choice of translation sentence. In like manner, hypotheses produced by statistically-based speech recognition software for English are ranked for semantic and syntactic plausibility, improving the final choice of interpretation for speech.