Project

General

Profile

Feature #1161

Autodetect encoding of Morrowind installation files

Added by Pieter van der Kloet over 3 years ago. Updated 13 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
General
Target version:
Start date:
02/07/2014
% Done:

0%

Severity:
Normal

Description

OpenMW and the ini-importer should be able to detect the encoding of the Morrowind installation based on the contents of Morrowind.ini or Morrowind.esm.
Determining the correct encoding should be done by comparing a generated hash of a (portion of a) string against a list of known hashes.

Morrowind is available in the following languages:

windows-1252:
  • English
  • French
  • German
  • Italian
  • Spanish
windows-1250:
  • Polish
windows-1251:
  • Russian

For more information, see this forum topic: http://forum.openmw.org/viewtopic.php?f=6&t=2018&start=10#p22016

History

#1 Updated by Lars Söderberg over 3 years ago

  • Target version set to openmw-0.31

#2 Updated by Marc Zinnschlag about 3 years ago

  • Target version changed from openmw-0.31 to openmw-0.32

#3 Updated by Marc Zinnschlag about 3 years ago

  • Target version changed from openmw-0.32 to openmw-0.33

#4 Updated by Marc Zinnschlag almost 3 years ago

  • Target version changed from openmw-0.33 to openmw-0.34

#5 Updated by Marc Zinnschlag almost 3 years ago

  • Target version changed from openmw-0.34 to openmw-0.35

#6 Updated by Scott Howard over 2 years ago

English md5sum:
ab4549bc8c5a0fc915411dbb81310ca2 Morrowind.ini

still need russian and polish values for Morrowind.ini

example implementation in mainwizard.cpp, which already uses qt
QFile file("Morrowind.ini"); //get from mGameSettings dataDir, or
QByteArray hashData = QCryptographicHash::hash(file.readAll(),QCryptographicHash::Md5).toHex();
if (hashData == "ab4549bc8c5a0fc915411dbb81310ca2") {
arguments.append(QLatin1String("win1252"));
} else if ...

#7 Updated by Marc Zinnschlag over 2 years ago

  • Target version changed from openmw-0.35 to openmw-0.35.1

#8 Updated by Marc Zinnschlag over 2 years ago

  • Target version changed from openmw-0.35.1 to openmw-0.36

#9 Updated by Marc Zinnschlag over 2 years ago

  • Target version changed from openmw-0.36 to openmw-0.37

#10 Updated by Marc Zinnschlag almost 2 years ago

  • Category set to General
  • Target version changed from openmw-0.37 to openmw-0.38

#11 Updated by Marc Zinnschlag over 1 year ago

  • Target version changed from openmw-0.38 to openmw-0.39

#12 Updated by Marc Zinnschlag over 1 year ago

  • Target version changed from openmw-0.39 to openmw-0.40

#13 Updated by Marc Zinnschlag about 1 year ago

  • Target version changed from openmw-0.40 to openmw-0.41

#14 Updated by Paul McElroy 10 months ago

I think it would be safer to just import the ini assuming encoding 1252, check for errors, if errors, then import using encoding 1250, if errors, then import using encoding 1251.

Alternatively, you should be safe assuming the encoding from the language of the installation.

#15 Updated by Chris Robinson 10 months ago

I think it would be safer to just import the ini assuming encoding 1252, check for errors

That's the issue with encodings. It's not (necessarily) technically an error to use the wrong one. The encodings simply map char values to specific characters. Before UTF-8, each region would have its own encoding to handle its language's characters, but the encodings determine what characters the values represent. So encoding-1252 may say value 212 is one character, and encoding-1251 may say value 212 is another character, both valid in their own contexts, and there's no way to detect which one is correct without actually doing string matching (or asking the user which one looks correct given a string).

#16 Updated by Marc Zinnschlag 10 months ago

  • Target version changed from openmw-0.41 to openmw-0.42

#17 Updated by Marc Zinnschlag 5 months ago

  • Target version changed from openmw-0.42 to openmw-0.43

#18 Updated by Andrei Kortunov 5 months ago

still need russian and polish values for Morrowind.ini

I don't think that taking a hash from a whole file is a good idea, especially for Morrowind.ini (since it can be modified if OpenMW user uses an existing Morrowind istallation).

Can't we just read an any string GMST (sMonthMorningstar for example) from Morrowind.esm and search for known characters in that string?
For example, "cc e5 f1 ff f6" in hex is a russian-translated "Month" word in windows-1251.

#19 Updated by Alexei Dobrohotov about 1 month ago

Can't we just read an any string GMST from Morrowind.esm

And what if we don't have any Morrowind.esm and instead there's an omwgame file which may not even have such GMST and may even be encoded in Unicode yet Morrowind.ini is still a must for some reason and is still encoded in, say, Win-1252? All cases must be covered.

#20 Updated by scrawl . about 1 month ago

I would suggest using one of the fallback strings in Morrowind.ini (e.g. the class creation questions) and scan for a string in any given language. Easier to do than opening the ESM file.

Also, some Morrowind.ini's IIRC have a Language entry (not all do, but if there is one we can use it)

#21 Updated by Alexei Dobrohotov 13 days ago

  • Target version changed from openmw-0.43 to openmw-1.0

Also available in: Atom PDF