Just One More Layer of Indirection
(Trying to achieve stable orbit with sufficient architecture)

Missing Step Zero (permalink)

December 9th, 2009

    Why do instructions seem to miss the step before the first? I will call this missing step “Missing Step Zero”, if you will.

    Maybe I am just bad at Googling. Maybe I have a problem with directions. Or, maybe I have a non-standard setup, but the missing step zero has plagued me several times in my life.

    Now, I am too forgetful to remember all the Missing Step Zeros I had to hunt down but her are two:

    “How do I write a file in Oracle’s PL/SQL?”

    But none of them work!

    After hours of investigation I found the missing step zero:

    Make sure sysdba grants you permission to execute the UTL_FILE package

    GRANT EXECUTE ON UTL_FILE TO <username>

    I wish I read *this* first

    “How do I turn on WCF logging?”

    And there are hosts of other sites with various logging options. But none of them work!

    After a few days, and help from a friend, I found the sneaky step zero:

    Only the app.config file in the executable subproject is used to configure logging. All other subproject config files are ignored.

    Seriously? How about generating an error when an app.config file is not going to be used?

    Posted in Coding | No Comments »

Are Ad Servers Bogging Down the Web? (permalink)

November 30th, 2009

    Slashdot brings up a point I complain about: Ad servers are slowing down the web.

    I do not use web applications because they are slow. I do not know what people do to pass the time when they wait for each page to load. Using web mail, and adding an attachment makes you feel like you wasted precious time.

    The web is mostly slow because of server latency. Especially “waiting for …” whatever ad server has been bogged down. I particularly dislike the sites that also use the slow Google Analytics servers.

    Posted in Economy, Languages, Rants, Technology | No Comments »

Type Transformation Library. In Java! (permalink)

November 28th, 2009

    I have just read an interesting post on LtU, which asks for a type-class transformation library. And it reminded me of wanting the same thing. I had not considered making these features into a stand-alone project. This is perfect for a project:

    1. The features are definitely useful, I have had to build some portions of this library for myself. I would have been happy to have a library that did this for me.
    2. The projects has a finite size: As long as we limit the number of forms we can transform between, the number of transformations are finite. Certainly, choosing the top four, or five commons forms will make a useful library.
    3. Adding forms is perfect for the open source community to contribute: The overall structure of the API would be clearly defined, and people can add their own transformations without knowing the details of the bigger project. Limiting scope of the task, and making it manageable.
    4. Much of the heavy lifting has been done: In various personal libraries of code, and in the open source community, these transformations exist already. All that remains is patching the disparate parts into a normalized, clean API
    5. The type transform API should be normalized and complete (any type to any other type) so it is easy to learn. This may demand us to implement non-useful transformations, or worse, annotate forms that can not support the richness that some forms can.
    Posted in Coding, Java | No Comments »

Try an Index instead of Changing Your Infrastructure (permalink)

November 15th, 2009

    One thing that disturbs me is the proliferation evil agents who love key-value stores. Especially those that love key-value stores in a latency-infested cloud. What upsets me more are the infinitely confused people who believe a database is *worse* than their key-value storage.

    Here is one where Ian prefers Cassandra over proper database indexes:

    For some reason, Ian has compared his terrible query to his optimized Cassandra implementation. The query (and schema) are so bad, I suspect it’s a straw man.

    Ian does not provide the SQL which makes him conclude that “Computing the intersection with a JOIN is much too slow in MySQL, so we have to do it in PHP.”. Any statement that implies a join is done faster outside the database should set of warning bells: The database should have all the information required to make your queries fast. If this is not the case, then something is seriously wrong with your indexes.

    An all-database solution, even if it is a stored procedure, will be faster than a networked solution just because of latency. Personally, I have found returning a few hundred extra rows from a single “close enough” query significantly faster than issuing two queries with perfect results: Latency is your biggest enemy.

    Let’s look at the Digg schema provided:

    CREATE TABLE 'Diggs' (
      'id'      INT(11),
      'itemid'  INT(11),
      'userid'  INT(11),
      'digdate' DATETIME,
      PRIMARY KEY ('id'),
      KEY 'user'  ('userid'),
      KEY 'item'  ('itemid')
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
     
    
    CREATE TABLE 'Friends' (
      'id'           INT(10) AUTO_INCREMENT,
      'userid'       INT(10),
      'username'     VARCHAR(15),
      'friendid'     INT(10),
      'friendname'   VARCHAR(15),
      'mutual'       TINYINT(1),
      'date_created' DATETIME,
      PRIMARY KEY                ('id'),
    
      UNIQUE KEY 'Friend_unique' ('userid','friendid'),
      KEY        'Friend_friend' ('friendid')
    
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    Some changes to the indexes would help:

    1. KEY ‘user’ (‘userid’) – does not help much when a user has digged many items: The index will help by pointing to all the actual ‘Diggs’ records, but the database will have to load every one of those blocks from disk to get that information (very likely one block per record). I would have suggested UNIQUE KEY ‘user’ (‘userid’, ‘itemid’, ‘digdate’) – This would have allowed the query to simply use the index, and not have to go back to the massive, unsorted, ‘Diggs’ table.
    2. UNIQUE KEY ‘Friend_unique’ (‘userid’,‘friendid’) – Seems to be the correct index to for “Query Friends for all my friends.”; this should be a single block lookup. There is no reason this should take 1.5seconds.
    3. KEY ‘Friend_friend’ (‘friendid’) – Maybe instead, Ian intended to have a list of all users that made ‘me’ a friend, rather than all users ‘I’ have befriended. This certainly explains the 1.5sec response time. In this case, the index should be expanded so the table blocks do not need to be loaded: UNIQUE KEY ‘Friend_friend’ (‘friendid’, ‘userid’).
    4. Maybe MySQL is poor database and loads the original records during a query even if the columns are not needed

    Anyone who may be complaining about the extra disk space required to write these indexes should note that Ian’s Cassandra implementation consumes much more space than I am advocating here.

    Even *IF* the Digg database is so big that the index lookups take too long, we should realize that we can pre-compute query results in the database, just like in Ian’s Cassandra implementation. If the database does not have materialized views, we can always add triggers to do the job ourselves. The former is still a limited technology, and the latter is quite messy, but both are better than changing your whole platform.

    Finally, it seems Ian is trying to optimize for the worst case: “Kevin Rose, for example, has 40,000 followers”. I disagree with changing your infrastructure for a single use case for a minority of users, but that is a business decision that involves more issues than Ian’s blog entry can be expected to consider.

    In conclusion, I am angry that the human race has lost another soul to the legion of key-value fanatics. I am further incensed that apparently 298 other nameless souls have followed Ian into the pits of hell. (298 diggs at time of writing).

    Posted in Coding, Rants | No Comments »

RMS vs Miguel (permalink)

November 13th, 2009

    Introduction

    Last month RMS and Miguel had a disagreement. Only now have I had the time to write out my thoughts.

    Miguel is Overly Optimistic

    First, I can agree with Miguel when he says

    “I know that there are great people working for the company,…”

    but I take issue with the second half of his statement,

    “…and I know many people inside Microsoft that are steering the company towards being a community citizen.”.

    I have no doubt that Microsoft’s employees are trying to steer the company towards being a community citizen. But Miguel has an implicit trust that shareholders will not take back that steering wheel and drive in the opposite direction. That is where I oppose Miguel’s optimism.

    Microsoft shareholders have been sitting on a goldmine for the last 20 years. Sure, Microsoft has been making reasonable products over the years, but it’s profitability is primarily due to the great waves of money in the world economy, generated by ever-increasing public and private debt. The population spent money they did not have for any nifty software feature. Microsoft developers benefit in this environment of free money because the shareholders find it’s easy to be altruistic when profits are high. I even contend that free software has had a hard time competing because money itself is (apparently) free.

    The good times for Microsoft will not last. I believe the next decade will show how cruel the shareholder can be to “open source”. There are two main forces at work which will make the Microsoft shareholders act much more ruthless, and probably more shortsighted.

    1. A poorer user base: Money will be tight, either because of domestic inflation, or lack of liquidity. Microsoft faces an unrelenting barrage of competition from Free and Open Source software. People and corporations will have a greater incentive to use free software to reduce their spending.

    2. Profitable innovation is reaching it’s limit: The success stories of the last ten years depend on massive user bases: 10 million, 100 million or more. Each individual user only contributes pennies, if even, to overall revenue. Revenue per user is only going to go down further. I do not want to go into detail about why I believe this is true, but generally the software industry had matured: Software for the commoners has been built, and software niches are filled.

    Microsoft is stuck between this innovation limit, and Free software’s relentless catch-up. Microsoft will feel the squeeze and start acting like most corporations that see their business model die: Sue.

    I suspect that this fear of mine is just like Stallman’s, and I do not consider it irrational. Microsoft has every right to protect it’s patents. From the shareholder perspective, it must protect it’s patents when net-losses threaten the company.

    Miguel gets Distracted

    Miguel says:

    “Working at CodePlex is a great way of helping steer Microsoft in the right direction. But to Richard, this simply does not compute.”

    Miguel has fallen for the classic work-with-them-instead-of-against-them. Just like the environmentalist employed by a big oil corporation; he is told that he will help the company along the right path. But really, his employment/involvement is spin for advertisement, and for government tax rebates. The company would do the same without the environmentalist’s help, only now the environmentalists have one less advocate.

    Microsoft will have done fine without Miguel. But now Microsoft can now advertise Miguel to the Open Source community, and hopefully Microsoft has distracted Miguel enough from being a competitive threat.

    Miguel is motivated by Profit

    Open Source, which I define as Open Source *not* including Free Software, is sold to the public as a compromise between the GPL and proprietary licensing. Really, Open Source is an advertising scheme used to acquire important tech-savvy users which install software on the majority of our machines. Open Source has the secondary goal of gaining some free debugging. Both goals include not giving back.

    Open Source Profiteering is pragmatic, effective, and efficient at bringing products to market, but this is a short sighted goal. Open Source has it’s place, it is necessary, does some good, and it is what I would do if I ever released software people wanted. That does not mean I have to like it: I like steak, but I don’t like the thought of chopping up cows.

    “Richard Stallman frequently conjures bogeymen to rally his base. Sometimes it is Microsoft, sometimes he makes up facts and sometimes he even attacks his own community”.

    First, Stallman is sometimes wrong, after all he is only human. But to say he is conjuring bogeymen is misleading. Stallman is only issuing warnings of possible problems. He advocates actions that should be taken to avoid those possible problems.

    Stallman is thinking long term, which is necessarily hard to be accurate. Stallman may misidentify the benign as threats (like with .Net, maybe), or he may identify threats as benign (I personally wanted something like GPL v3 back in the 90’s). Miguel does not attempt to see the long term, nor appreciate the difficulty in doing so. When Miguel hears warnings about .Net, and Microsoft, but Miguel “knows” there is no danger over the next year, he simply assumes Stallman is fear mongering.

    Stallman is not motivated by profit, and Miguel does not understand this. Miguel assumes his goals are shared by all others. When Miguel says:

    “Looking at opportunities where others see hopelessness. … I rather work on constructive solutions to problems than moan and complain.”

    Miguel assumes the opportunities he finds, and the constructive solutions he invents would be lauded by any reasonable person. Miguel is wrong. Opportunities are defined by goals. Constructive solutions, any solutions really, are defined by goals. Miguel’s goals are profit. Miguel’s found opportunities and creative solutions are of no interest to Stallman.

    Stallman is not a salesman. If Stallman was sent to Africa he would not see the shoeless as “hopeless” situation, nor as an “opportunity”, because both perspectives require a profit goal. Stallman would probably walk shoeless with the natives, and eat some good food, and maybe teach them to make their own shoes.

    Conclusion

    Miguel’s perspective is that of a short sighted, pragmatic, profiteer. As such, he makes a few wrong statements and conclusions:

    1. Microsoft’s employees can control the company direction – No, shareholders control the company direction.
    2. Stallman is pessimistic because he does not laud the “opportunities” and “constructive solutions” – No, Stallman simply does not share Miguel’s profit motivation, so those “opportunities” and “constructive solutions” are not.
    3. Stallman is fear mongering – No, Stallman’s simply warning others of possible threats to Free Software.

    Posted in Rants | No Comments »

PI (permalink)

October 29th, 2009

    I have a small programming project, called YAY, which I work on occasionally.  The objective of YAY is to be a type-safe and easy to use parser-generator-and-compiler.  The parser-generator is the easy part.  The compiler portion is more difficult.  Specifically, I am adding namespace processing so that the parsers specified are able to generate general graphs, and not just trees.  In theory, YAY should be able to parse simple “languages” like XML and HTML, including URLs and XML namespaces, with no post-processing.

    Programming languages, like Java, should require macro definitions to become full compilers. Unfortunately, macro definitions have not been added to YAY yet.

    Today I have discovered π (PI), which is very much like YAY.  This is good news because I do not particularly enjoy building YAY, I only like using it.  If π (PI) can replace YAY, then I can have someone else do the hard work, while I play at a higher abstraction level.  π (PI) looks interesting because it seems to have avoided YAYs intermediate parse tree representation, and seems to go directly to macro (re)writing. 

    But, I am suspicious whether it works. 

    Now, I could be wrong, but YAY is complicated for a reason:  It must allow identical language constructs inside different contexts to mean different things.  For example

      A: for (Object o : MyList){

        for (Object p : MyOtherList){

          if (something) continue A;

        }//for

      }//for

    In this case the continue A; refers to scope that is ‘far’ from itself, and the same sequence of bytes can also refer to a different exit point of another loop later in the program. I am also thinking that exception handing scope can be more complicated.

    It is not obvious how π (PI) achieves this non-local syntax specification.

    In any case, language specification is one of my favorite subjects.  I am compelled to review the π (PI) implementation despite it being in a pre-alpha state.

    Posted in Coding, Languages | No Comments »

Static Pages for Blogging Are Nicer (permalink)

October 22nd, 2009

    I moved to WordPress under a year ago.  Since then posting has become an incredible chore, if not outright impossible.  Static pages seem to be the best, although I loose comments.

    Not like there are many legitimate comments anyway, the amount of spam is insane! I would install a captcha, but something tells me I would need master PHP skills to insert it into my custom pages. I think it would be easier to make a daemon on my machine to update pages when an email has come in. Then I can get spam filtering for free.

    Here is a guy that lists why static pages are better:

    Posted in Personal | No Comments »

SQL Databases Could Scale (permalink)

October 4th, 2009

    Adam Wiggins says that SQL Databases Don’t Scale.   Some of the comments there mention Oracle RAC being able to scale quite well, but I do not know if RAC still has a scaling limit, albeit higher.   Maybe Oracle’s RAC limit is effectively infinity (much like no one will need more than 640K ram), in which case RAC solves our problems, and the discussion is complete.

    But, let’s assume SQL databases do not scale now, I believe they can be scaled without changes to the application logic.

    The solution can be found in Sharding; which means partitioning the data between servers according to access patterns.  I propose automatic sharding which will take the database requests, at the client end, and redirect those requests to the machine(s) with the required shard of data.

    Automatic sharding should be completely possible:  Database constraints reveal the strongly connected data, but also reveal the natural break lines in that data.  Application access patterns (from profiling) can provide evidence of about what tables do not change often, and what data is ripe for replication.

    For example, the automatic sharder should “see” that partition by user id is effective because the data dependencies between users is quite small.   Furthermore, the mutually dependent portion of the database will consist of lookup tables, and other rarely changed data; which can be replicated given the few times it changes.

    The relational database was designed around the independence of rows.  This row independence is necessary for highly parallel operations, which is exactly what sharding needs.   If it is true that the database community has been “trying to solve a problem for twenty years and still haven’t managed to come up with an obvious solution”, then I am dismayed:  After sharding a database or two, it should be obvious how to automate the sharding.

    Posted in Coding, Oracle | No Comments »

Cronic Radiation is Good? (permalink)

September 21st, 2009

Equal Temperament is a Poor Approximation (permalink)

August 19th, 2009