[#404] UnicodeDecodeError with windows-1250 shapefile when generating classes

Date:
2007-05-22 11:02
Priority:
3
State:
Open
Submitted by:
Jachym Cepicky (jachym)
Assigned to:
Bram de Greve (bramz)
Category:
none
Version:
none
Resolution:
none
Summary:
UnicodeDecodeError with windows-1250 shapefile when generating classes

Detailed description
Hi,

in my shape file, several non-utf (mainly windows-1250) encoding text appeard. When I want to generate classes in properties window, based on this column, I get this error:

An unhandled exception occurred:
'utf8' codec can't decode bytes in position 3-6: invalid data
(please report to http://thuban.intevation.org/bugtracker.html)

Traceback (most recent call last):
File "/usr/lib/thuban/Thuban/UI/classgen.py", line 762, in _OnRetrieve
i = self.list_avail.InsertStringItem(index, str(v))
File "/usr/lib/python2.5/site-packages/wx-2.8-gtk2-unicode/wx/_controls.py", line 4698, in InsertStringItem
return _controls_.ListCtrl_InsertStringItem(*args, **kwargs)
File "encodings/utf_8.py", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data

thuban svn
Message  ↓
Date: 2007-05-29 07:06
Sender: Bernhard Reiter

Bram wrote that his code can do it. :)
Jachym, if you can try his branch it would be wonderful.

Date: 2007-05-25 15:09
Sender: Didrik Pinte

Here is a simple patch for the actual version before Bram's version will be incorporated in the trunk :

Index: thuban/Thuban/UI/classgen.py
===================================================================
--- thuban/Thuban/UI/classgen.py (revision 2772)
+++ thuban/Thuban/UI/classgen.py (working copy)
@@ -759,7 +759,8 @@
index = 0
for v in list:
self.dataList.append(v)
- i = self.list_avail.InsertStringItem(index, str(v))
+ i = self.list_avail.InsertStringItem(index,
+ str(v).decode('iso-8859-1'))
self.list_avail.SetItemData(index, i)

self.list_avail_data.append(v)

I don't have time to test and commit but will do it after the 31st of may when coming back.

Date: 2007-05-22 18:25
Sender: Bernhard Reiter

This seems similiar to
[#118] problem with utf8 systems trying to see non utf8 table

You could try one of the patches from there or
check out the WIP-pyshapelib-bramz/ branch which
has new code for opening other encodings.
I think Bram would love feedback if this works with his code. ;)

Date: 2007-05-22 13:58
Sender: Jachym Cepicky

file added, try the column "HORNINA"

Date: 2007-05-22 13:50
Sender: Bernhard Reiter

Jachym,
thanks for reporting.
Do you happen to have an example file that you could
publish and attach to this issue?

The problem is that we need some detection for the charset
optimally. For a hack you could add some code around "str(v)" to decode your encoding into unicode.

Attachments:
Size Name Date By Download
436 KiBgeo.dbf2007-05-22 13:58Jachym Cepickygeo.dbf
892 KiBgeo.shp2007-05-22 13:58Jachym Cepickygeo.shp
7 KiBgeo.shx2007-05-22 13:58Jachym Cepickygeo.shx
Field Old Value Date By
assigned_tonone2007-05-29 07:06Bernhard Reiter
summarycan not retrieve dat from attribute table2007-05-22 18:25Bernhard Reiter
File Added207: geo.shp2007-05-22 13:58Jachym Cepicky
File Added208: geo.shx2007-05-22 13:58Jachym Cepicky
File Added206: geo.dbf2007-05-22 13:58Jachym Cepicky