OCR + SwiftUI + Japanese. Quite a training project! 😅

Hi! It’s been a while! I’ve been learning a lot of new things lately. I came up with some OCR and AI related projects at work, and have been exploring SwiftUI and Flutter in more depth in my spare time. As I’m reaching a big milestone in development of One a Day (my gratitude and positivity platform), I decided it’s time for a mini-project-sized break! :D

I like to code useful things, so rather than create a yet another to-do-list app, I wondered what I could make in a short time that would provide value. I’ve had fun with AI and OCR at work, and wanted to dive deeper into it, whilst also sticking to mobile apps. I remembered I have a very real pain point I haven’t been able to resolve with any existing solutions. 

It's a struggle

When I (try to) read Japanese books or manga, I often have a problem of not being able to fully understand certain kanji, words, or sentences. I can easily translate them using Google Translate, but that doesn’t give me a kanji-by-kanji breakdown needed to fully grasp sentence structure. I want to learn, not just translate. I have access to great dictionaries and grammar resources, but they’re not very useful when I don’t even know how to find the problematic kanji, because I don’t know its reading. What I often end up doing is to translate with the Google app, switch to original text, copy kanji, paste into dictionary, get all the necessary information, add it to flash cards (if I get that far). Quite a lot of steps that quickly kill any bit of motivation I have!

A shotgun to kill a fly

Here comes my project idea then! Create an app that will help me with reading native Japanese texts without the need for all those different tools. 

Feature requirements:

  • Take a picture / upload a picture of Japanese writing
  • Implement OCR technology to read text from the images
  • Display this generated text with furigana to support me with reading
  • Implement a translation feature for selected words
  • Add option to save vocabulary for future study

Some quick designs to solidify the concept and kick off the project:

group 1000004785

group 1000004786

group 1000004787

Initial investigation / Expected problems

  1. Even though the OCR APIs I’m familiar with (such as Pen to Print) were more than capable to read single words, they were not able to read vertical, right-to-left writing. This is going to be trickier than I initially assumed, and I will have to test more solutions. Shortlisted Azure AI Vision, and Google Vision, though the setup and pricing might be a bit of an overkill for a small side project. OCR.Space seems promising and the first few requests returned good results, though I did notice some small mistakes.
  2. There seem to be plenty of Japanese text analysers, dictionary packages, and learning software so I naively assumed I would have no problems finding a furigana generator API. Wrong! Initial search left me with a few excellent web tools, which I could scrape, some python based open source API projects which I could translate and host myself, and a very capable, albeit slow ChatGPT solution I quickly put together. I’ll need to research this some more, although I’m leaning heavily towards AI at the moment, as it would allow me to achieve the POC very quickly.
  3. Furigana requires using ruby characters, which complicates the matter of displaying it in a mobile app. From what I’ve seen this can be achieved using attributed strings with ruby annotations, though not without issues. Moreover, this is not supported by SwiftUI by default and I’ll have to look into some custom code magic to bring this to life.
  4. Translation feature should be achievable with the use of openly available XML dictionaries (JMdict for word meaning and KanjiDic for more in depth information on kanji). This is honestly such a relief to be able to access dictionary data so easily, especially after the struggles of having to implement Easy Polish News (my other language learning app) through web scraping!

Not giving up

That’s pretty much it for now! I should have everything I need to get this working, although it will be a bit more difficult than I initially assumed. Still, it’s been fun researching all these concepts and thinking about a language from a technical point of view again. I’m looking forward to working on this. Hopefully I’ll be able to make something that will finally help me tackle those books!