This article introduces how to handle and resolve issues where JSON files can’t be loaded or parsed, but for some reason, beautifying in Atom makes it work.
I regularly use a translation software called qtranslate, which is quite excellent… but let’s not go into that here. It stores translation history in JSON format, which I utilize, but there was one puzzling thing - before parsing JSON in Bash or Ruby, I had to open it in Atom and do json beautify.
And the cause - this is hard to notice. It was the BOM. Haven’t seen you in a year. Previously, I was troubled by this with autohotkey.
The reason beautifying in Atom fixes it seems to be that it automatically removes the BOM. The following shows confirmation of whether BOM is attached before and after beautifying in Atom.
yuis ASUS /mnt/c/pg$ file /mnt/e/_QTranslate/History.json
/mnt/e/_QTranslate/History.json: UTF-8 Unicode (with BOM) text, with very long lines, with no line terminators
yuis ASUS /mnt/c/pg$ file /mnt/e/_QTranslate/History.json
/mnt/e/_QTranslate/History.json: UTF-8 Unicode text, with very long lines
With this, What was like this,
yuis ASUS /mnt/c/pg$ cat /mnt/e/_QTranslate/History.json | parsejson '[0]'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.5/json/__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.5/json/__init__.py", line 315, in loads
s, 0)
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
Can now be parsed.
yuis ASUS /mnt/c/pg$ cat /mnt/e/_QTranslate/History.json | parsejson '[0]'
['evading', [[5, 17, 33, '回避\r\n\r\nVerbs:\r\n 回避 (avoid, around, avoidance, circumvent, evade, avert, evasion, evading)\r\n 脱税 (tax evasion, evaded, evading, evaders, evasions)\r
\n 逃れ (dodging, fled, evaded, fleeing, shirking, evading)\r\n\r\n']], False]
Once you know this, the rest is easy. Let’s solve it programmatically with bash nkf.
nkf --overwrite --oc=UTF-8-BOM hoge.txt # Add BOM
nkf --overwrite --oc=UTF-8 hoge.txt # Remove BOM
yuis ASUS /mnt/c/pg$ file /mnt/e/_QTranslate/History.json
/mnt/e/_QTranslate/History.json: UTF-8 Unicode (with BOM) text, with very long lines, with no line terminators
yuis ASUS /mnt/c/pg$ nkf --overwrite --oc=UTF-8 /mnt/e/_QTranslate/History.json
yuis ASUS /mnt/c/pg$ file /mnt/e/_QTranslate/History.json
/mnt/e/_QTranslate/History.json: UTF-8 Unicode text, with very long lines, with no line terminators
Nice.
