<?xml version="1.0"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>CKAN: Ticket #2232: Unicode Exception when rebuilding the search index</title>
    <link>http://localhost/ticket/2232</link>
    <description>&lt;p&gt;
In some cases this exception is fired when building the search index. As explained in &lt;a class="closed ticket" href="http://localhost/ticket/1616" title="defect: Catch exceptions when rebuilding the search index (closed: fixed)"&gt;#1616&lt;/a&gt; this makes the whole process stop.
&lt;/p&gt;
&lt;pre class="wiki"&gt;
  File "/var/lib/ckan/pdeu/pyenv/lib/python2.6/site-packages/solr/core.py", line 326, in wrapper
    return self._update(content, query)
  File "/var/lib/ckan/pdeu/pyenv/lib/python2.6/site-packages/solr/core.py", line 550, in _update
    rsp = self._post(selector, request, self.xmlheaders)
  File "/var/lib/ckan/pdeu/pyenv/lib/python2.6/site-packages/solr/core.py", line 639, in _post
    return check_response_status(self.conn.getresponse())
  File "/var/lib/ckan/pdeu/pyenv/lib/python2.6/site-packages/solr/core.py", line 1096, in check_response_status
    raise ex
SolrException: HTTP code=400, reason=ParseError at [row,col]:[1,2354] Message: An invalid XML character (Unicode: 0x1) was found in the element content of the document.
Traceback (most recent call last):
  File "/var/lib/ckan/pdeu/pyenv/bin/paster", line 9, in &amp;lt;module&amp;gt;
    load_entry_point('PasteScript==1.7.3', 'console_scripts', 'paster')()
  File "/var/lib/ckan/pdeu/pyenv/lib/python2.6/site-packages/paste/script/command.py", line 84, in run
    invoke(command, command_name, options, args[1:])
  File "/var/lib/ckan/pdeu/pyenv/lib/python2.6/site-packages/paste/script/command.py", line 123, in invoke
    exit_code = runner.run(args)
  File "/var/lib/ckan/pdeu/pyenv/lib/python2.6/site-packages/paste/script/command.py", line 218, in run
    result = self.command()
  File "/var/lib/ckan/pdeu/pyenv/src/ckan/ckan/lib/cli.py", line 298, in command
    rebuild()
  File "/var/lib/ckan/pdeu/pyenv/src/ckan/ckan/lib/search/__init__.py", line 134, in rebuild
    {'id': pkg.id}
  File "/var/lib/ckan/pdeu/pyenv/src/ckan/ckan/lib/search/index.py", line 54, in insert_dict
    return self.update_dict(data)
  File "/var/lib/ckan/pdeu/pyenv/src/ckan/ckan/lib/search/index.py", line 79, in update_dict
    self.index_package(pkg_dict)
  File "/var/lib/ckan/pdeu/pyenv/src/ckan/ckan/lib/search/index.py", line 153, in index_package
    raise SearchIndexError(e)
ckan.lib.search.common.SearchIndexError: HTTP code=400, reason=ParseError at [row,col]:[1,2354] Message: An invalid XML character (Unicode: 0x1) was found in the element content of the document.
&lt;/pre&gt;</description>
    <language>en-us</language>
    <image>
      <title>CKAN</title>
      <url>http://assets.okfn.org/p/ckan/img/ckan_logo_shortname.png</url>
      <link>http://localhost/ticket/2232</link>
    </image>
    <generator>Trac 0.12.3</generator>
    <item>
      
        <dc:creator>amercader</dc:creator>

      <pubDate>Mon, 19 Mar 2012 14:29:55 GMT</pubDate>
      <title>milestone changed</title>
      <link>http://localhost/ticket/2232#comment:1</link>
      <guid isPermaLink="false">http://localhost/ticket/2232#comment:1</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;milestone&lt;/strong&gt;
                changed from &lt;em&gt;ckan-v1.7&lt;/em&gt; to &lt;em&gt;current-ckan-sprint-2012-04-02&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>amercader</dc:creator>

      <pubDate>Mon, 19 Mar 2012 14:30:17 GMT</pubDate>
      <title>status changed; owner set</title>
      <link>http://localhost/ticket/2232#comment:2</link>
      <guid isPermaLink="false">http://localhost/ticket/2232#comment:2</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;owner&lt;/strong&gt;
              set to &lt;em&gt;amercader&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;status&lt;/strong&gt;
                changed from &lt;em&gt;new&lt;/em&gt; to &lt;em&gt;assigned&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>amercader</dc:creator>

      <pubDate>Wed, 21 Mar 2012 12:07:40 GMT</pubDate>
      <title>status changed; resolution set</title>
      <link>http://localhost/ticket/2232#comment:3</link>
      <guid isPermaLink="false">http://localhost/ticket/2232#comment:3</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;status&lt;/strong&gt;
                changed from &lt;em&gt;assigned&lt;/em&gt; to &lt;em&gt;closed&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;resolution&lt;/strong&gt;
                set to &lt;em&gt;fixed&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Fixed in 6b4c6c9
&lt;/p&gt;
&lt;p&gt;
This was due to some characters not being supported by XML. We now strip them before indexing.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item>
 </channel>
</rss>