Text

Answered

Alphanumeric conversion

HI,

We are facing some issue with conversion when we are using the normal text editor.

When we write 1 it converts it to l and sometimes when we write 2 it converts it to Z

When we write 10 sometimes it converts to 10 but when we write 10 mg it converts it to long.

Is there someway we can improve the transcription here?

Thanks,

Piyush


Best Answer

Dear Piysuh,


this would require a special development in our our core technology, which is not planned at present. Nevertheless, I keep this in mind.


Regarding using special characters, this is not possible neither.


Best regards,


Olivier


Hi Piyush,


Thank you for contacting us.


As indicated by Olivier, this is likely to be a limit case.

In order to better help you, could you, please, share with us some details about your the text content you want to recognize:

  • Does your text only contain posologies?
  • Does you text contain a mix posologies and other text?


Best regards,


Gwenaëlle

Dear Gwenaëlle,

It is supposed to contain a mix of posologies and other text. Basically it can contain any english word.

The problem comes when we start using the two together. 

for e.g. if we write something like 10 mg it tries to convert it to "long". 20 mg to "song"

Most of the time it tries to convert 1 to l.

Also, we have noticed that the system converts 1 to 1 but if we write something after 1 then it tries to merge the two together to create a word.

What would be the best way for this so that it doesnt do the above? Is there a way that we can define the grammar/ resource file, so that this works?

Regards,

Piyush


Dear Piyush,


Thank you for the update.


Currently, as said, we are in a limit case which is not easy to solve. I am afraid not much can be done to improve accuracy.


Best regards,


Olivier

Hi Olivier, 

Thanks for your response. In order to improve this, I have tried something else.

I have created my own dictionary of english words (230000+)  and I have used the grammar en_US-lk-grm.res

This is the grammar that you had shared once with me so that my own words could be recognized in a sentence.

for e.g. if my words consist of "Piyush" "Agarwal" then this would recognize only "Piyush" and "Agarwal" but it could be recognized in a sentence for e.g. "Piyush Agarwal" would be recognized.

This has improved the recognition, but we are ending up with another problem.

The problem is that when we are writing a sentence, the system interprest first 2-3 words correctly. while writing the next word, the previous words which had been transcribed correctly now get transformed into some other word. Is there a way to prevent this so that previous words that have been transcribed correctly dont get messed up?

Thanks and Regards,

Piyush




Dear Piyush,


thank you for the update,


Unfortunately, not much can be done to prevent the previous words from not being changed. One solution could consist in exporting the text and import it again. This should help in having a "more stable" result.


Can you please try to proceed as below, and see if it helps:

String exportResult = editorView.getEditor().export_(null, MimeType.TEXT);
editorView.getEditor().clear();
editorView.getEditor().import_(MimeType.TEXT, exportResult, null);


Best regards,


Olivier

Hi Olivier,

Where do i put this code?

Piyush

Dear Piyush,


can you please try in the contentChanged function?


Neverhtheless, please note that has the contentChanged function is called a lot, this may decrease the app.


Let us know if it works.


Best regards,


Olivier

Dear Olivier,

I put the code inside the contentChanged function. However, I found that the content Changed function was being called twice and the second time the string was blank so my screen was getting cleared.

Any reason for this?

Regards,

Piyush

Also, is there a way that the engine can understand that if we leave a certain amount of space between two characters then we dont merge those two characters to make a new word?

Dear Piyush,


currently, I am a bit puzzled previously written are updated when new words are added. Which configuration do you have? Are you still attaching the TEXT resource with your own lexicon? If so, can you please try without the TEXT resource?


Also, is there a way that the engine can understand that if we leave a  certain amount of space between two characters then we dont merge those  two characters to make a new word?

>>If spaces are "properly" written , words should not be merged. Indeed, we have an algorithm to determine spaces that has been finely tuned over the years. So, this behavior should not occur if spaces are properly done. Would you have some ink samples (x and y coordinates, no image) that would allow to reproduce?


Best regards,


Olivier

Dear Olivier,

Thank you for your response.

1. I am leaving a good amount of space between two characters. The engine is still trying to connect the two letters. How do i get the ink samples ( x and y coordinates )

2. currently, I am a bit puzzled previously written are updated when new words are added. Which configuration do you have? Are you still attaching the TEXT resource with your own lexicon? If so, can you please try without the TEXT resource?

By this I mean that if I add more characters, the previous recognized words change automatically. I dont know why. This happens even if we leave a decent amount of space between two words. 

If i remove the TEXT resource and use only my vocabulary, recognition went down, because then the case of the letters is coming into play and causing a problem.

Now I am mainly using the TEXT resource and a few extra words added. 

We are getting 80-90% accuracy. The main problem that we are now facing is what I have mentioned in point 1

3. The content Changed function did not work as it was clearing the whole data.

Thank you for your help. This is the last step in our application before we can start doing some testing.

Thanks and Regards,

Piyush


1. For 1 got some details

I typed in    "I" ( x cood - 97, y 235 ), then tried to type T (lowercase ) ( 211,214 ) for the vertical stroke . (246,152) for the horizontal stroke.

The engine created the word "It"

Piyush

Hi Olivier,

I can give you one more example.

For e.g. 

I started writing 

"1 said"

Initially the engine figured out that i wrote 1. But as soon as I wrote said, it converted it to "I said".

Looks like english takes dominance over the numbers.


Dear Piyush,


Indeed, when writing "1 said", our TEXT resource is using a model that takes into account the context. Basically, as "I said" is more frequent than "1 said", it will give more weight to "I said".

It is not possible to disable this feature from the TEXT resource. The only solution would be than you do not use the TEXT resource, but this may cause more issues than it will solve.


Regarding the spaces, I am still puzzled, as if these are neatly written, they should be properly recognized. Can you please attach a screenshot when a space is not properly recognized?


Best regards,


Olivier