I'm in a unique position where I need to be able to recognize and extract numbers (integers and decimals) which are punctuated by interspersed hyphens.
e.g. "1-0-1", "2.5-0-1", "0-0.5-4"
This being said, I also need to be able to recognize regular text/words alongside.
The difficulty I'm facing in practice currently is that a lot of the hyphens end up being clustered together with the adjacent number to incorrectly produce an alphabet. For e.g., "1-0" might become "tO" or "fO".
I've created a fairy large compilation of the most common variations possible, for these sequences, and have input them into the custom lexicon generator. However, it doesn't seem to be making any difference whatsoever.
Are punctuations stripped while creating the lexicon? And if so, what would be a reasonable workaround?
thank you for contacting us and your question.
Indeed, using a lexicon will not answer your use-case.
For such purpose, your should use what we call a LUDEF, which is not available in the current iink SDK. We then have no solution at present.
Would it be possible to re-purpose the math SDK to recognize these patterns?
for the math case, you could indeed create a custom grammar that would allow to recognize such pattern.
Nevertheless, as you see to have text, the math part will not be accurate to recognize it, and you may not have an acceptable result.