Raw Content模式下如何提升中文OCR的识别准确率 : MyScript Developer Support

S

Shandongfuwei2008

started a topic about 1 year ago

你好，我这边使用Android平台的SDK，SDK版本是2.3.2，我发现在Raw Content中调用editor.export方法后无法识别字符一、二等字，前提我关闭了形状识别.我期望可以正确的返回字符一、二，有什么好的解决方案吗？

我自己通过在export之前先调用

editor.setSelectionType(contentSelection, "Text", true)

发现可以识别到字符一、二，但是这种方式有两个问题，一个是新增了一条历史记录，另一个是我无法套索选中某一个字进行识别，还有什么好的解决方案吗？

期待您的答复！

G

Gwenaelle @MyScript

said about 1 year ago

Dear Shandongfuwei2008,

Thank you for your question.

Are you using iink SDK to perform your stroke rendering? Do you use the gesture recognition in your application?

Do you reproduce the issue with our Demo example?

Sharing a video showing your ink and export would help us better understand your issue and help you finding a solution.

In addition, you may also share with us your ink strokes.

Best regards,

Gwenaëlle

S

Shandongfuwei2008

said about 1 year ago

你好，

我是使用iink SDK实现的渲染，在程序中没有使用gestrue recogintion，你们的Demo中也可以复现此问题。

期待您的回复。

mp4

Screenrecord...

(3.45 MB)

G

Gwenaelle @MyScript

said about 1 year ago

Dear Shandongfuwei2008,

Thank you for your update.

Could you please upgrade to the the latest iink SDK version (3.0.2) and let us know whether you still face the same issue?

Best regards,

Gwenaëlle

S

Shandongfuwei2008

said about 1 year ago

你好，我这边将iink sdk 从2.3.2升级到3.0.2，并同时更新了中文语言包，问题依旧存在，请帮忙提供解决方案。

备注：

1、RawContentConversion的配置情况如下：

fun Configuration.enableRawContentConversion() {
  // Display grid background. Possible values are grid and none
  setString("raw-content.line-pattern", "none")

  // Activate handwriting recognition for text only
  setBoolean("raw-content.recognition.text", true)

  // Allow conversion of text
  setBoolean("raw-content.convert.text", true)
  setBoolean("raw-content.convert.node", false)
  setBoolean("raw-content.convert.edge", false)

  // Allow converting shapes by holding the pen in position
  setBoolean("raw-content.convert.shape-on-hold", false)

  // Configure interactions
  setString("raw-content.interactive-items", "converted-or-mixed")
  setBoolean("raw-content.tap-interactions", true)
  setBoolean("raw-content.auto-connection", true)

  // Show alignment guides and snap to them
  setBoolean("raw-content.guides.enable", true)
  setBoolean("raw-content.guides.snap", true)

  // Allow gesture detection
  setBoolean("gesture.enable", false) //如果true，MyScript iink SDK将尝试在编写时检测手势。

  setString("lang", "zh_CN") //定义编辑器中使用的语言。
  setBoolean("raw-content.highlight-text", true)//如果true，则允许在可识别的文本上显示语义荧光笔。
  setBoolean("convert.convert-on-double-tap", false)//如果true，双击一个块将转换它。

  setNumber("export.image-resolution", Utils.getApp().resources.displayMetrics.xdpi)

  setBoolean("raw-content.auto-connection", true)//如果true，允许自动连接形状和连接器
  setBoolean("raw-content.eraser.dynamic-radius", false)//如果false，橡皮擦是固定的，其值为半径。如果true，则橡皮擦尺寸是动态的并且随着速度而增长。
  setBoolean("raw-content.recognition.shape", false)//如果true，非文本将被识别为形状，并且结果将在 JIIX 导出中可用。

  // Allow shape & image rotation
  setStringArray("raw-content.rotation", arrayOf("shape", "image"))
}

2、导出jiix格式的配置情况如下:

Engine engine = editor.getEngine();
exportParams = engine.createParameterSet();
exportParams.setBoolean("export.jiix.text.words", true);
exportParams.setBoolean("export.jiix.text.chars", true);
exportParams.setBoolean("export.jiix.text.structure", false);
exportParams.setBoolean("export.jiix.strokes", false);
exportParams.setBoolean("export.jiix.bounding-box", true);
exportParams.setBoolean("export.jiix.glyphs", false);
exportParams.setBoolean("export.jiix.primitives", false);

3、附件为RawContent经过save后的文件和保存的图片以及OCR的识别结果，请用此文件复现问题。

期待您的回复。

mp4

Screenrecord...

(3.61 MB)

zip

_E6_89_8B_E5...

(41.6 KB)

G

Gwenaelle @MyScript

said about 1 year ago

Dear Shandongfuwei2008,

Thank you for your update.

If you want to force your lasso selection to be recognized as Text, we recommend indeed you using the

editor.setSelectionType(contentBlock, "Text", true);

Thus you will get the characters correclty recognized: see for instance the corresponding JIIX to the ink sample you have shared with us:

{
"type": "Raw Content",
"bounding-box": {
"x": 14.6158619,
"y": -5.20541954,
"width": 91.3114014,
"height": 10.153266
},
"elements": [ {
"id": "raw-content/203",
"type": "Text",
"bounding-box": {
"x": 14.6158619,
"y": -5.20541954,
"width": 91.3114014,
"height": 10.153266
},
"label": "一二三四五六七八九",

....

}

Please let us know whether this fixes your issue.

Best regards,

Gwenaëlle

S

Shandongfuwei2008

said about 1 year ago

你好，您的意思是只能用这种方式来提升中文OCR的识别率吗？这种方式是我刚开始提出问题的解决方案，但是这种解决方案存在两个问题：

前提export之前调用editor.setSelectionType(contentBlock, "Text", true);

1、会新增了一条历史记录，我不想新增一条历史记录，有什么解决方案吗？

2、套索选中某一些字进行单独识别返回的结果为空，有什么解决方案吗？

我这边希望可以有一个参数直接设置RawContent下只允许识别成文本，有这样的参数可以设置吗？

期待您的回复

Forums