Educating the world

Our blog has over 10,000 readers a month

Useful B2Evolution SQL

September 20th, 2011

While B2Evolution’s search and back office interfaces are excellent they cater for the general use case. Search is a good example of this as there are many things to search for but a limited amount of search criteria in the interface. There are also tasks you might want to perform that would normally mean spending hours going through the web interface.

SQL is the way to go. It allows you to update your blogs structure underneath in a couple of well crafted statements. So I thought I’d share them with you.

Deleting a category with many dependencies
I came across a problem with deleting categories. If there are lots of posts using the category you want to delete, then you have to go into each post by hand and change the dependency on the category.
http://www.bigsoft.co.uk/blog/index.php/2008/09/28/b2evolution-deleting-a-category-with-man

Finding articles without any tags
I wrote a related article plug-in that relies on tags so I wanted to find out which of my live blog articles don’t have enough tags. This piece of SQL gives you a list of blog article titles followed by the number of tags that are associated with each article. I have added a filter to ignore articles that have more than a set number of tags.

SELECT @mintags := 2;
SELECT post_ID, post_title, count(itag_tag_ID) ntags FROM evo_items__item
LEFT JOIN evo_items__itemtag ON itag_itm_ID = post_ID
WHERE post_status = ‘published’
GROUP BY post_ID
HAVING ntags < @mintags
LIMIT 20
;

You can pull out the post_id and use it in this SQL that displays the full set of tags for an article.

SELECT @postid := 30;
SELECT itag_itm_ID, tag_name
FROM evo_items__itemtag, evo_items__tag
WHERE itag_itm_ID = @postid
AND itag_tag_ID = tag_id
;

Viewing comments and deleting spam

SELECT comment_ID, comment_date, comment_author, comment_author_IP, SUBSTRING(TRIM(comment_content),1,150)
FROM evo_comments
WHERE comment_status="draft"
ORDER BY comment_author_IP
;

Then delete them all:

DELETE FROM evo_comments
WHERE comment_status="draft"
;

Or delete by IP address:

DELETE FROM evo_comments
WHERE comment_status="draft"
AND comment_author_IP IN (’xxx,xxx,xxx,xxx’, ‘yyy,yyy,yyy,yyy’)
;

Come back Eclipse all is forgiven

August 19th, 2011

I have complained about Eclipse IDE in the past because it can be a little tricky to set things up especially when it comes to web development and JEE. That all stops now! Recently, I have been using C-sharp (c#) with Visual Studio 10 and IIS 7 and have found it really buggy with in comprehensible documentation.

The C# API documentation is really annoying. Take a look at System.Threading.Timer for example. In the list of constructors if gives the type of the object but not what it is for! Generally it’s the other way around, if you know what it is for you can infer the type yourselves. In contrast take a look at the equivalent in Java. The constructor list of java.util.Timer gives you the type, what it is for and a very clear definition of what its purpose is. The MSDN documentation has you hopping all over the place trying to find out what various options are for and how to use them. If you can’t find what it is for or how to use it then you can’t even use the source because there isn’t any.

The nice thing about open technology is that everyone is thrown in together. We all know and understand that there isn’t much documentation and the support is generally provided at the expense of someone’s free time. The side effect of this is that there are loads of blogs and publicly available articles which detail how to do things. This is the opposite of what I have experienced while using Microsoft technologies. It could be the fact that Microsoft have written loads of documentation that makes people want to push through and understand it rather than thinking “this is rubbish, I’ll look for something else on the internet". Microsoft’s documentation is just that: documentation. You have to start from the beginning and plough through it until you get to the end. There is almost nothing on how to do a particular job or a Frequently Asked Question. Sure there are forums but you can spend your life in there and there are so many questions from thicky people who haven’t done *any* reading and don’t know or understand any of the concepts.

I had started this article with the title Debugging web applications under IIS 7 using Visual Studio 10 but have already changed the subject enough to make the article about something else. So I will finish off with a couple of Eclipse features that I’m really missing in Visual Studio 10 and maybe a few gripes with VS10.

  1. I’ve been trying to run an IIS module in debug mode. All the documentation says that the development environment comes with its own web server that is comparable with the normal IIS Web Server. Well try as I might I couldn’t get this to work. The options that looked like what I needed were greyed out in the web application properties and when I did finally get the debugger to launch it moaned about not being able to find the configuration files. The only way I could get the debugger to hit my breakpoint was to ditch the built in web server, publish to a different folder and set up IIS to have an application under a virtual folder.
  2. But it didn’t stop there! There is a conflict between IIS and Visual Studio because when you close Visual Studio down it forgets the setup so you have do it all again each time you start. VS10 opens your project and sees you are using a the local web server and have a project url. It checks the IIS configuration and says that there is already a virtual directory. The project loads in but most of the artefacts are missing. You have to delete the virtual directory in the IIS manager, reload the project, get VS10 to create the virtual folder again, go back to IIS manager and change the location of the directory it maps onto.
  3. In Eclipse, your can store you projects anywhere. We use backed up network drives (and Clearcase) to hold our projects. Visual studio doesn’t seem to like projects on remote file systems. You have to set up all sorts of additional trust arrangements with different hosts which is a massive pain.
  4. If the worse comes to the worse and you can’t find a problem you can download all the source code for all the libraries and step through every line of code. Not true with IIS - there is no code.
  5. You have to explicitly add files to the project. You can’t say everything in the folder is in my project, refresh and pick up the new files I’ve added. If you remove a file from the directory, you can’t delete it from the project. VS10 sees that it can’t delete the file and so won’t let you delete it from the project. Hitting refresh has no effect.
  6. Probably because you have to explicitly add all the files to the project there are loads of configuration files that have to be maintained.
  7. Can’t run the web application in situ.
  8. Visual Studio tries to manage all the artifices of the project so that when you change a bit of it it updates all the dependencies, but it doesn’t. There are loads of places where it doesn’t and they are not immediately obvious. You’ll only find then if you are trying to run in debug mode or something.
  9. Visual Studio never seems to tell you anything. Every time you do something in Eclipse it talks to you through the output window errors are easy to see.
  10. Switching on debug inside IIS or Visual Studio or the publisher is almost impossible.
  11. IIS and Visual Studio are only concerned with using other Microsoft technologies. So if you think that Microsoft SQLServer is a colossal waste of time and effort and you want to use MySQL instead then there is no help with setting that up.
  12. The debugger is, frankly, rubbish in comparison to Eclipse. Eclipse gives you instance id and all sorts of extra typing information which is linked into the “go to definition of” menus.
  13. Visual Studio wouldn’t let me write code and hot deploy, so you are constantly stopping editing, building, publishing and starting.
  14. Context menu while over the source is rubbish too. Eclipse seems to have a greater understanding of the make up of the language so it lets you jump to base classes, type definitions just by context clicking on a variable, class or a definition.
  15. When the command completion helper pops up, it only contains entries that are provided by the “using” statements you have used. So it’s not much help. Eclipse gives you a selection of commands that are available in related packages. When you select on of those commands Eclipse adds the import to the top of the file. With Visual Studio you have to hunt around for the interface or class you want, then find out which package it belongs to, then add the “using” statement to the top of the files and only then can you get it to auto-complete. Useless really!
  16. It is not possible to resize the auto-completion pop-up dropdown box. If there are loads of things to look at then it can take a while to search through using such a small viewport.
  17. When you hover over a call in the source, the tool tips pops up and then disappears after a second, usually before I’ve finished reading it. Eclipse lets you hoover over the tool tip and as a result to will change to something that stays open (until you click away from it). The most useful thing about this feature is that while the tool tip is in this semi-permanent state you can select the text contained within it. This is really useful when you are trying to get the assembly or class path to a class for a using statement. With VS10 you have to remember as much as you can and keep going back to re-read the tool tip.
  18. When you press control and space to pop up the auto-completion, Visual Studio draws a box around the option it thinks you want but it doesn’t highlight it. It usually gets the correct option, but because it isn’t highlighted you have to press an up or down arrow key to select it then you can press return to choose it. In Eclipse it highlights it so you just do a control+space then hit return. I can’t seem to get used to this useless extra step so I’m constantly adding carriage returns to my code which is breaking my flow.
  19. Each time there is network slowness then the whole of Visual Studio locks up. I think it rechecks every minute or so because it is ok for about a second then it will lock up for another minute. If you try and kill it, it ignores you. Whereas if you log out, it shuts down straight away. I’ve been so close to hitting the reset button. I wouldn’t mind but most of the time I’m not actually doing anything. Visual Studio must be constantly checking something in the background. Eclipse would be a bit sluggish but nowhere near as bad as Visual Studio - it’s absolutely unusable.
  20. When you are running something, both IDE’s debug messages are directed to the Output panel at the bottom of the screen. In Eclipse this panel is editable and in Visual Studio it is not. The most useful part of this feature is that it allows you to add a couple of blank lines to help break up all the text. When you are re-running an area of code the debug messages repeat and it is quite difficult on the eye to track through all that text looking for the beginning of your test. If you add a couple of blank lines then it is obvious where the start and end are.
  21. If you have lots of tabs in Visual Studio and you want to close some of them. You must click the tab, move to the other side of the screen to click the ‘X’ button, then back to where all the tabs are, then back to the ‘X’ button. Alternatively you can click the tab you want, context-click on the tab, move to second option ‘Close’. If you are using Eclipse, just click the ‘X’ on the tab - no dragging the mouse across the screen and back or phaffing with right-clicks.
  22. In Visual Studio when you try to view/update the project or solution properties, the window that appears is tiny. Almost every option goes off the end of the panel and most annoyingly the panel is not expandable. You find you are constantly going back and fore expanding one thing to see what the values should be. Eclipse lets you expand the options panel as large as you want. Eclipse even has a little search box that lets you search the option pages for the option you are looking for.
  23. Don’t even get me started on trying to compile or run a Visual Studio project from a network share. It is so buggy that it is unusable. The simple solution is to copy your files from the share to the local hard drive and work on it from there. Ironically there is a Clearcase integration plug-in for Visual Studio, but you can’t use it on dynamic views because Visual Studio can’t cope with network shares. There is a load of settings related to running or compiling projects on network shares but they don’t make any difference. The Microsoft Connect forums are full of bug reports about this where the last entry is “Microsoft are looking into it". Snapshot views reside on the local filesystem so they work fine. Eclipse will let you put your projects anywhere.
  24. Half of the internet is taken up with Visual Studio questions relating to “No native symbols in symbol file” and “Cannot find or open the PDB file” and most of them go unanswered. When you get the same problem under Eclipse, you just right-click on the warning popup and select specify location, go to the location where the symbols file is and load it in. Job done. Eclipse doesn’t care that the symbol file might not be next to the executable file. We can save literally thousands of hours.
  25. This is the one that grates me the most. In Visual Studio, when you create files using the right click on project name add, then Visual Studio fills the project file with all kinds of flags relating to the type of that file. So if you create the file separately and drop it in or change the extension of an existing file to a different type which might have flags then the file won’t behave like the other files of that type in the project. So I create a .cs file and rename it to .ashx.cs then add the missing .ashx file so that it looks like the other handlers but when I publish my new files don’t get published like the others do. There’s no way to find out what is missing, because all the icons look the same and the code is the same. So you spend all your time trying to figure out why one file is not working like all the others and the only thing you can do is create a new project and cut and paste all the source code. In Eclipse you right-click the project and select “Refresh". The Eclipse project can be recreated from all the artefacts in the project where as Visual Studio can not. Now I have to find out why my Session_Start() is not being called even though everything appears the same as the last project I did. Useless!

Sorry Eclipse all is forgiven.

Just as a foot note Mirco$oft have produced a free Visual Studio 10 plugin called “Productivity Power Tools” which goes some way to provide a feature set similar to Eclipse. It’s miles off the mark but it’s a start.

Fixing problems caused by updating Plesk

August 16th, 2011

Permanently deleting MediaWiki pages and revisions

August 3rd, 2011

Spammers target wiki’s because many of them are mis-configured. They use robot’s to submit page updates and page additions. I could understand it if they were adding advertising, redirecting to other sites or adding Trojans but they’re not. The page updates are nonsense like:

AQqGJP , [url=http://cqvbgfwzwypk.com/]cqvbgfwzwypk[/url], [link=http://pfjbydaloorn.com/]pfjbydaloorn[/link], http://bkxerafkfcgr.com/

or random sentences like:

Thank God! Someone with barnis speaks!

It serves no purpose other than giving me more work to do. Mind you on this occasion I got a blog article out of it, so it’s not all bad!

In spite of having my configuration set the way I want it they have some how managed to by pass them; there is probably a bug in MediaWiki, so the next job is to install the latest version.

The central table inside MediaWiki is mw_revision. It holds all the versions of each page. The history of a document is a timestamped ordered list of revision records.

MediaWiki uses hard coded numbers to represent the namespace column value in the mw_page table, so we will create a new table to hold the values. These will make it easier to see which namespace our articles are in:

DROP TABLE IF EXISTS cats;
CREATE TABLE cats
(
id INT PRIMARY KEY
,name VARCHAR(20)
);
INSERT INTO cats(id, name) VALUES
(’0′,'’)
,(’-1′,’Special:’)
,(’-2′,’Medi:’)
,(’1′,’Talk:’)
,(’2′,’User:’)
,(’3′,’User_talk:’)
,(’4′,’Project:’)
,(’5′,’Project_talk:’)
,(’6′,’File:’)
,(’7′,’File_talk:’)
,(’8′,’MediaWiki:’)
,(’9′,’MediaWiki_talk:’)
,(’10′,’Template:’)
,(’11′,’Template_talk:’)
,(’12′,’Help:’)
,(’13′,’Help_talk:’)
,(’14′,’Category:’)
,(’15′,’Category_talk:’)
;

If we look at one of our pages:

SELECT page_id, CONCAT(name,page_title) full_name
FROM mw_page, cats
WHERE id = page_namespace
AND page_title like ‘%How_to_re%’;

We can see that there are 2 pages: the original and an extra page with the same name but in a different namespace.

+---------+-----------------------------------------+
| page_id | full_name                               |
+---------+-----------------------------------------+
|       9 | How_to_report_problems_effectively      |
|      79 | Talk:How_to_report_problems_effectively |
+---------+-----------------------------------------+

We will investigate each page in turn.

First page 9 “How to report problems effectively". Here is the SQL to look at the revision history of the page:

SELECT @page:=9;
SELECT rev_id, rev_page, rev_text_id,
SUBSTR(rev_comment,1,15) comment, rev_user,
rev_user_text, rev_timestamp, rev_len, rev_parent_id
FROM mw_revision
WHERE rev_page=@page
ORDER BY rev_timestamp;

We can see from the results below that the last 3 entries are spam.

+--------+----------+-------------+-----------------+----------+-----------------+----------------+---------+---------------+
| rev_id | rev_page | rev_text_id | comment         | rev_user | rev_user_text   | rev_timestamp  | rev_len | rev_parent_id |
+--------+----------+-------------+-----------------+----------+-----------------+----------------+---------+---------------+
|     10 |        9 |          10 | New page:  In a |        1 | Davidnewcomb    | 20071210142814 |    4057 |             0 |
|     11 |        9 |          11 |                 |        1 | Davidnewcomb    | 20071210142854 |    4022 |            10 |
|     13 |        9 |          13 |                 |        1 | Davidnewcomb    | 20071211110321 |    4584 |            11 |
|     25 |        9 |          25 |                 |        1 | Davidnewcomb    | 20071216172547 |    4603 |            13 |
|     79 |        9 |          79 |                 |        1 | Davidnewcomb    | 20071220185710 |    6291 |            25 |
|     80 |        9 |          80 |                 |        1 | Davidnewcomb    | 20071220192557 |    6938 |            79 |
|    770 |        9 |         770 |                 |        1 | Davidnewcomb    | 20100322103154 |    6939 |            80 |
|    833 |        9 |         833 | wIojbsxYCEMAiHX |        0 | 195.251.148.31  | 20110608224510 |      45 |           770 |
|    841 |        9 |         841 | FAjVBqBMj       |        0 | 50.19.100.18    | 20110609091850 |      59 |           833 |
|    854 |        9 |         854 | yrZXYJobVab     |        0 | 201.248.150.100 | 20110609141813 |     135 |           841 |
+--------+----------+-------------+-----------------+----------+-----------------+----------------+---------+---------------+

We must delete the text records associated with those revisions:

DELETE FROM mw_text
WHERE old_id IN (833, 841, 854);

Next we need to delete the revision entries.

DELETE FROM mw_revision
WHERE rev_id IN (833, 841, 854);

Now that the page is clean we must solder up the pages table to make that page point to the latest version. Firstly record the timestamp of the most recent revision. Using the timestamp and the page number as an index collect the latest revision number and the latest revision’s length.

SELECT @most_recent := MAX(rev_timestamp)
FROM mw_revision
WHERE rev_page = @page;

SELECT @latest := rev_id, @latest_len := rev_len
FROM mw_revision
WHERE rev_timestamp = @most_recent;

When we have everything we need, update the page table:

UPDATE mw_page SET
page_latest = @latest,
page_len = @latest_len
WHERE page_id = @page;

Next we’ll check out the other page with the same name:

SELECT @page:=79;
SELECT rev_id, rev_page, rev_text_id, substr(rev_comment,1,15) comment,
rev_user, rev_user_text, rev_timestamp, rev_len, rev_parent_id
FROM mw_revision
WHERE rev_page=@page
ORDER BY rev_timestamp DESC;

We can see there is only one entry and that is a spam entry.

+--------+----------+-------------+---------------+----------+---------------+----------------+---------+---------------+
| rev_id | rev_page | rev_text_id | comment       | rev_user | rev_user_text | rev_timestamp  | rev_len | rev_parent_id |
+--------+----------+-------------+---------------+----------+---------------+----------------+---------+---------------+
|    825 |       79 |         825 | QorQCGouPrXjN |        0 | 74.119.98.5   | 20110608164848 |      26 |             0 |
+--------+----------+-------------+---------------+----------+---------------+----------------+---------+---------------+

We must delete the text records associated with those revisions:

DELETE FROM mw_text WHERE old_id IN (825);
DELETE FROM mw_revision WHERE rev_id IN (825);

This is a page we want to get rid of so we will just delete the entry from the page table.

DELETE FROM mw_page WHERE page_id = @page;

If you are going to be doing lots of this then here are some procedures to help. This SQL will produce a report to help you identify pages that have been attacked.

SELECT rev_id, rev_page, SUBSTR(CONCAT(name, page_title), 1, 10) full_name, substr(rev_comment, 1, 10) comment, rev_user_text, replace(substr(old_text, 1, 50), “\n","") txt
FROM mw_revision, mw_text, mw_page, cats
WHERE old_id = rev_text_id
AND rev_page = page_id
AND id = page_namespace
ORDER BY rev_page, rev_timestamp;

This procedure helps to delete all the artefacts for a page.

DELIMITER //
CREATE PROCEDURE delete_page (IN p_id INT)
BEGIN
DELETE FROM mw_text WHERE old_id IN
(SELECT rev_text_id
FROM mw_revision
WHERE rev_page = p_id);

DELETE FROM mw_revision WHERE rev_page = p_id;
DELETE FROM mw_page WHERE page_id = p_id;
END //
DELIMITER ;

It can be called by:

CALL delete_page(123);

Where as this SQL deletes all the revisions after a point. To find the identifier of the last good version, use the web interface and go to the page that has been altered. Click the history link for the page. Hover over the revision’s “addition time” link. The number you want is in the oldid part of the URL. In this case, we don’t need the page id because a revision can only be part of one page. You can compare the selected revisions to make sure you have the correct one.

DELIMITER //
CREATE PROCEDURE delete_after_rev
(IN p_last_good_rev INT)
BEGIN

SELECT @page:= rev_page
FROM mw_revision
WHERE rev_id = p_last_good_rev;

DELETE FROM mw_text WHERE old_id IN
(SELECT rev_text_id FROM mw_revision
WHERE rev_page = @page AND rev_id > p_last_good_rev);

DELETE FROM mw_revision
WHERE rev_page = @page AND rev_id > p_last_good_rev;

SELECT @most_recent := MAX(rev_timestamp)
FROM mw_revision WHERE rev_page = @page;

SELECT @latest := rev_id, @latest_len := rev_len
FROM mw_revision
WHERE rev_timestamp = @most_recent;

UPDATE mw_page SET
page_latest = @latest,
page_len = @latest_len
WHERE page_id = @page;

END //
DELIMITER ;

It can be called by:

CALL delete_after_rev(1234);

When the page is over written by the spammer it severs all the Category and Section links. A new page is generated and cached. The side effects of this will manifest themselves as incomplete Category pages, pages not appearing and your searches not returning the correct results. Click the Special:LonelyPages page to display the Orphan pages; this special page can act as a guide to help you track down all the pages that have been tampered with.

To remove the page cache run this SQL:

TRUNCATE mw_objectcache;

As pages are requested they will be rebuilt and cached so there will be a short lag as this happens.

To rebuild the search indexes drop to the command line and run the rebuildall command. There is an .htaccess to prevent you from running this file via the web. I tried removing this but the program wouldn’t run, so it must be run from the command line.

cd <wiki-root>
php maintenance/rebuildall.php

My wiki is less that a hundred pages so the rebuild only took a second. The output is here for your reference:

PHP Warning:  xxx
** Rebuilding fulltext search index (if you abort this will break searching; run this script again to fix):
Dropping index...
Rebuilding index fields for 83 pages...
0

Rebuild the index...
Done.


** Rebuilding recentchanges table:
Loading from page and revision tables...
$wgRCMaxAge=7862400 (91 days)
Updating links and size differences...
Loading from user, page, and logging tables...
Flagging bot account edits...
Flagging auto-patrolled edits...
Deleting feed timestamps.
Done.


** Rebuilding links tables -- this can take a long time. It should be safe to abort via ctrl+C if you get bored.
Refreshing redirects table.
Starting from page_id 1 of 83.
Refreshing links table.
Starting from page_id 1 of 83.
Retrieving illegal entries from pagelinks... 0..0
Retrieving illegal entries from imagelinks... 0..0
Retrieving illegal entries from categorylinks... 0..0
Retrieving illegal entries from templatelinks... 0..0
Retrieving illegal entries from externallinks... 0..1
Done.

If you don’t have access to the command line then running a web spider across your site would force the pages to rebuild and update their search caches but it would take much longer.

When you are all done, you ought to clean up the procedures and tables that we have created because they are no longer needed:

DROP TABLE cats;
DROP PROCEDURE delete_page;
DROP PROCEDURE delete_after_rev;

Happy repairing !

Thanks to David Aldridge for giving me a starting point.

Want to remove a file, but something is locking it?

August 1st, 2011

Tired of not being able to delete a file or folder because something is locking it? Want to move a folder but for some reason you can’t? Did you close down an application and for some reason you still can’t delete/move/rename files it was using? Do you miss your Unix box? Well LockHunter is the application for you!

It works as an application or via the context menu to tell you which processes are locking files that are preventing you from being happy. You can explicitly remove the locks owned by the application or delete the file which would be really handy for trying to clean up a machine infected with malware.

I gave it a go and it was pretty good.
http://www.lockhunter.com/