Personal tools
You are here: Home Leocornus Leocornus Buildout Config Repository. Solr for WordPress

Solr for WordPress

— filed under: ,

Using Solr for Wordpress.

WordPress Database Schema

WordPress post short URL pattern:
http://WP_BLOGS.domain/WP_BLOGS.path/?p=WP_XXX_POSTS.ID

select 
blog.domain as domain, blog.path as path, post.id as postid,
blog.blog_id as blogid, post.post_author, post.post_title,
post.post_excerpt, post.post_content, post.post_type
from 
wp_blogs blog, wp_886_posts post
where 
blog.blog_id = 886 and post.post_status='publish'

Post Category and Tag

Both category and tag are saved in table wp_xx_terms. We will use the name column as the keyword. The table wp_xx_term_taxonomy will tell this term is a category for tag. The table wp_xx_term_relationships will associate a post/page to a term.

select
term.name
from
wp_886_terms term, wp_886_term_taxonomy taxonomy, wp_886_term_relationships relat
where
relat.object_id = postid and relat.term_taxonomy_id = taxonomy.term_taxonomy_id
and taxonomy.term_id = term.term_id

Post Comments

All about table wp_xx_comments

select
comment_content as comment
from wp_4_comments where 
comment_post_id = postid and comment_approved = 1

Media files in WordPress Post

All media files are saved in table wp_xxx_posts under post_type: attachment. The status of the attachement post will always be '''inherit''', which means it depends on the status of its parent post.

Database Importer Configuration

    <entity name="post4"
        processor="SqlEntityProcessor"
        query="select
        blog.domain as domain, blog.path as path, post.id as postid,
        blog.blog_id as blogid, post.post_author as postAuthor,
        post.post_title as postTitle, post.post_excerpt as postExcerpt,
        post.post_content as postContent, post.post_type
        from wp_blogs blog, wp_4_posts post
        where blog.blog_id = 4 and post.post_status='publish'"
        transformer="TemplateTransformer"
    >
      <!-- column is the database table column name,
           name is the solr schema field name.
      -->
      <field column="id" name="id" template="${post4.blogid}-${post4.postid}"/>
      <field column="postTitle" name="title" />
      <field column="postExcerpt" name="description" />
      <field column="postContent" name="content" />
      <entity name="post4_authors"
          processor="SqlEntityProcessor"
          query="select display_name from wp_users where id = ${post4.postAuthor}"
      >
        <field column="display_name" name="authors" />
      </entity>
      <field column="url" name="url"
           template="http://${post4.domain}${post4.path}?p=${post4.postid}" />
      <!-- trying to get the keywords -->
      <entity name="post4_cats"                                                                                                         processor="SqlEntityProcessor"
          query="select
          term.name as keyword
          from                                                                                                                          wp_4_terms term, wp_4_term_taxonomy taxonomy, wp_4_term_relationships relat
          where
          relat.object_id = ${post4.postid} and
          relat.term_taxonomy_id = taxonomy.term_taxonomy_id and
          taxonomy.term_id = term.term_id"
      >
        <field column="keyword" name="keywords"/>
      </entity>
    </entity>

A Little Script to Automate Things

mysql_connect($server, $username, $password) or
    die("Could not connect: " . mysql_error());
mysql_select_db($database);

$blogIds = mysql_evaluate_array('select blog_id from wp_blogs');

echo "<pre>";
for ($i = 0; $i < count($blogIds); $i++) {

    echo <<<EOT
    <entity name="post$blogIds[$i]"
        processor="SqlEntityProcessor"
        query="select blog.domain as domain, blog.path as path, post.id as postid,
blog.blog_id as blogid, post.post_author, post.post_title,
post.post_excerpt, post.post_content, post.post_type
from wp_blogs blog, wp_$blogIds[$i]_posts post
where blog.blog_id = $blogIds[$i] and post.post_status='publish'"
        transformer="TemplateTransformer"
    >
      <field column="id" name="id" template="\${post$blogIds[$i].blogid}-\${post$blogIds[$i].postid}"/>
      <field column="post_title" name="title" />
      <field column="post_excerpt" name="description" />
      <field column="post_content" name="content" />
      <field column="url" name="url" template="http://\${post$blogIds[$i].domain}\${post$blogIds[$i].path}?p=\${post$blogIds[$i].postid}" />
    </entity>
EOT;
    echo "\n";
}
echo "</pre>";

Script Transformer for Description

The idea candidate for description of a WordPress psot will be the excerpt field of a post. However, the exerpt field has not always been filled. So here comes the following script:

    // parse the post content and extract the description for a post.
    function processPost(row) {

        orgContent = new String(row.get('postContent'));
        // strip out all HTML tags.
        content = orgContent.replace(/<[^>]*>/g, "");
        row.put('postContent', content);

        // decide the excerpt as the description.
        excerpt = new String(row.get('postExcerpt'));
        if (excerpt.trim().length < 1) {

            if (content.length > 255) {
                excerpt = content.substring(0, 255);
            } else {
                excerpt = content;
            }
            row.put('postExcerpt', excerpt);
        }

        return row;
    }

References

Document Actions