
| What's New | The Latest Postings for F. Andy Seidl | |
|
| June 26, 2008 Excerpt from: FAS Talk | | Search over 1400 Ann Arbor, Michigan area business and organization web sites with this new vertical search engine. | As an example of a vertical search site built on the MyST platform, I recently launched a site called Search Ann Arbor. Google is a killer search engine. Everybody knows that. Google searches over billions and billions of web pages, allowing you to find things with amazingly little effort. But sometimes, the fact that Google searches so many pages is actually a problem. Sometimes, you're not interested in the entire web, but just some narrow (but still big) slice of it. For example, suppose you're searching a specific company and its trading partners; or doing research on a specific drug; or looking for real estate in a given area. You don't want results form the entire web. You want results just exactly about what you are looking for. The trouble is, Google does not know what you are really interested in; it only knows the words you typed in the search box. Google understands this. That's why they created a programmable interface to let developers build custom, "vertical" search engines that focus on specific subsets of the web. At MyST, we've built on this idea by incorporating seamless integration of Google custom search engine technology into the MyST platform. As a result, we have extended our MyST Blogsite and MyST Enterprise RSS service offerings to include MyST Vertical Search services. As an example of a vertical search site built on the MyST platform, I recently launched a site called Search Ann Arbor. This site lets visitors search over 1400 Ann Arbor area business and organization web sites. The benefits of a vertical search engine are obvious as soon as you try using one. For example, go to Google and search for "pizza," or "movies," or "clothing". How likely are you to find a local result? Now, go to Search Ann Arbor and try the same thing. While the results here may be of little use to someone in California, the are of great us to someone in Ann Arbor. Do you have an idea for a vertical search engine? Would you like a unified, multi-domain search that covers your various company web sites, blog sites, e-commerce sites, etc.? Leave a comment here or contact me directly. | | |
| April 02, 2008 Excerpt from: FAS Talk | | Somewhere in the shadowy underbelly of the Web, there is an intelligent, and slimy, hacker exploiting WordPress security holes. | I found that in the previous 96 hours, these compromised WordPress blogs had generated over 200,000 unwanted requests! I manage dozens of web servers, most of which run commercial advertorial sites such as blogsite.com. Over the past few days, I had been noticing that my company site at myst-technology.com was periodically showing signs of traffic stress but that there was no apparent traffic increase (i.e., neither Google Analytics nor VisiStat were reporting increased activity.) Then, yesterday, I happened to be looking at the real-time Apache server status for that site and I serendipitously noticed a bunch of POST requests to "/mysmartchannels/sign-up". Aha! Slime balls at play! For years, we hosted a public MySmartChannels service that anyone could sign up for an use for free. And, guess what the URL of the sign-up form was? Yep, as it turns out, there is still someone out there trying to hack their way into a free service that is no longer available. (Why would they do that? I suppose to use the service anonymously for some nefarious purpose.) Several years ago, I developed a web server security system called SlimeGate which has been protecting our servers from a wide-range of hackers and spammers. Once I realized what was happening, it was a trivial matter to augment the SlimeGate rule set to detect and block these "sign-up" probes. It's now been about 14 hours since I deployed the rule updates and in that time, SlimeGate has identified and blocked 181 unique servers (i.e., IP addresses) attempting this probe. [21 Apr 2008, fas: as of today, we're up to 828 unique servers.] SlimeGate maintains a database of slime balls, making it easy to analyze patterns and trends. Looking at these 181 new entries revealed that all but two were WordPress servers as identified by their user agent string of "Incutio XML-RPC -- WordPress/<version>". Of course, user agent strings are easily spoofed, so to confirm, I randomly selected 25 of the 181 IP address and actually visited them with a web browser. Of these, 18 were confirmed WordPress sites, five presented a generic Apache server page, and two were unreachable. The bottom line is that it appears that there are hundreds of compromised WordPress servers running some type of zombie processes on behalf of hackers. Armed with the new data collected by SlimeGate, I was able to revisit the server loading issue. I found that in the previous 96 hours, these compromised WordPress blogs had generated over 200,000 unwanted requests! I'm not quite sure the best way to notify the compromised site owners. But from my end, SlimeGate is nicely managing the problem, deflecting this new class of slime at the firewall. | | |
| January 09, 2008 Excerpt from: FAS Talk | | Even major players like Technorati don't get it right. | My company hosts hundreds of commercial blogsites, so we spend a lot of time dealing with the underbelly of the Web—spammers, hackers, orange alligators, and just plain bad programmers. I'm used to seeing poorly designed and/or poorly implemented web sites and web services. Just about anybody can toss something onto the web. That's actually a great thing; it's what has made the Web succeed. But tossing something onto the Web is a far cry from deploying a robust, scalable, maintainable, secure, and well-performing web application. As far back as 1975, in his book The Mythical Man-Month, Fred Brooks described a fundamental issue in software development that is as true today as it was then. The general idea is that once a piece of software appears to work, it is still a long, long way from being a solid, commercial-grade product. In fact, depending on the nature of the application, you should plan on investing 3-9 times more to take the application the rest of the way to commercial-grade. One of the reasons the Web is so messy today is that a very large number of applications are presented as commercial-grade applications long before they actually are. This is not surprising, but as someone who deals daily with the dynamic world of Web 2.0—web services, integrations, content syndication, etc.—it surely can get frustrating. Case in point... I was trying to figure out why some of our clients were having trouble "claiming" their blogsite under Technorati. The basic concept is simple, claiming you blogsite lets you tell Technorati which blogs are yours so that Technorati better serve its purpose as a massive cross-blog directory/search engine. As a rudimentary security measure, the process for claiming a blog in Technorati requires that you place a special key in your blog content so that Technorati can confirm that you, indeed, have authoring permission to the blog (and therefore, probably are its owner). So far, so good. This a simple yet reasonably reliable security check much like the credit card company calling your home phone to see if you know the account number of the new card they mailed you. But where this gets messy is that the Technorati program (known as a spider) that visits your blog is sloppily programmed and makes requests indistinguishable from many undesirable applications (often called spambots) that inhabit the Web. MyST Blogsite servers are protected against spambot traffic by software that automatically detects and manages spambots (and various other ner-do-wells). With Technorati's spider looking like a spambot, it was being prevented from accessing the servers. The solution is trivially easy and should have been done by Technorati engineers years ago. Specifically, Technorati could identify its spider by passing a simple piece of data known as a user agent string. This is a well-document, trivial-to-implement, Web standard protocol that is has long been accepted as a best practice for spider developers. But, web programmers can be sloppy and so many, in fact, are sloppy. After much time trying to communicate with Technorati's engineering staff (without response), we deployed a simple work around to make allowances for Technorati's poor programming. I know our own software is not perfect—none ever is. But if more web developers would spend just a little cleaning up sloppy little messes in their own application, just imagine how many millions of lines of "work around" would become unnecessary. | | |
|
|
|  |