<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Your Catchphrase Here! &#187; Java</title>
	<atom:link href="http://blog.christopherschultz.net/index.php/category/Tech/java/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.christopherschultz.net</link>
	<description>Rantings of a Lunatic</description>
	<lastBuildDate>Tue, 31 Aug 2010 19:07:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Properly Handling Pooled JDBC Connections</title>
		<link>http://blog.christopherschultz.net/index.php/2009/03/16/properly-handling-pooled-jdbc-connections/</link>
		<comments>http://blog.christopherschultz.net/index.php/2009/03/16/properly-handling-pooled-jdbc-connections/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 22:46:44 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://blog.christopherschultz.net/?p=68</guid>
		<description><![CDATA[I&#8217;m an active member of the Tomcat-users mailing list and I see lots of folks that post questions about not being able to get a new database connection. The answer is simple: you have exhausted their JDBC connection pool. The answer is not so simple because the reasons for the situation can often be different, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m an active member of the <a title="Apache Tomcat Mailing Lists" href="http://tomcat.apache.org/lists.html">Tomcat-users</a> mailing list and I see lots of folks that post questions about not being able to get a new database connection. The answer is simple: you have exhausted their JDBC connection pool. The answer is not so simple because the reasons for the situation can often be different, but most likely, your application is not properly handing pooled connections. Read on for information on how to code your app to properly handle such pooled connections.</p>
<p><span id="more-68"></span></p>
<p>I won&#8217;t go on and on about what a good idea connection pools are. That is either self-evident or covered elsewhere. What is <em>not</em> properly covered elsewhere is how to write your code correctly. They&#8217;re all throwaway examples that work under the best circumstances but fall apart when anything goes wrong.</p>
<p>When using a pooled connection, the idea is that you shouldn&#8217;t have to go out of your way to treat it specially. Technically, the techniques shown in this post are not just useful for pooled connections: it&#8217;s simply how you <em>should</em> code JDBC interactions, pooled or otherwise. The basics are simply: acquire your connection, and make sure you close everything before your method completes. Unfortunately, folks often forget that exceptions can be thrown and your cleanup code might not actually get called.</p>
<p>Below are two annotated code samples that should demonstrate everything you should be doing when handling JDBC connections. I hope this helps some folks out there.</p>
<p>The first code snippet is for when you are not engaging in a SQL transaction. Sure, you <em>could</em> use the transactional code in all cases, but the non-transactional one is simpler and will make your code easier to read and understand. It also won&#8217;t give you any &#8220;cannot rollback connection that isn&#8217;t in a transaction&#8221; errors. My code throws application-specific exceptions to demonstrate how to shield calling code from the &#8220;complexities&#8221; of JDBC.</p>
<pre>// It's important that these references are /outside/ the try/catch
// and that they are set to null. The compiler will enforce this; I'm
// just being explicit.
Connection conn = null;
PreparedStatement ps = null; // I prefer PreparedStatements; use Statement if you want
ResultSet rs = null;

try
{
    conn = ...; // Get your connection however you want

    // Issue your queries and get your results.
    // Remember: if you want to issue multiple queries and/or
    // work with multiple result sets, either declare more Statement
    // and ResultSet locals above and duplicate all cleanup logic for them,
    // or make sure you close each object along the way before you
    // try to re-use the reference.

}
catch (SQLException sqle)
{
    // NEVER swallow exceptions. At least log them using "sql.printStackTrace()"

    // Note that the root exception is being passed-along to the
    // application-specific exception. This allows error messages to include
    // the /full/ stack trace.
    throw new ApplicationException("Error in database code", sqle);
}
finally
{
    // Anyone who has done any real work in Java will know that a 'finally'
    // block will be run after the 'try' block regardless of any exception
    // activity.

    // A couple of things to note, here:
    //   1. Close objects in the proper order: result, then statement,
    //      then connection.
    //   2. Each close gets its own try/catch block. You don't want
    //      the connection to be leaked just because the result set
    //      failed to close properly.
    //   3. Don't throw any exceptions in a finally block. If there is
    //      already an exception "in the air", you'll shoot it down
    //      and replace it with a new one. The original exception is
    //      almost certainly more useful.
    //   4. NEVER swallow an exception. At least log the error.
    //   5. This cleanup code has whitespace removed for brevity.
    //   6. This cleanup code lends itself to being put into a separate
    //      method. I usually have a 'close' method that takes 3 arguments:
    //      Connection, Statement, ResultSet and does the same thing.

    if(null != rs) try { rs.close(); } catch (SQLException sqle)
        {  sqle.printStackTrace(); }
    if(null != ps) try { ps.close(); } catch (SQLException sqle)
        {  sqle.printStackTrace(); }
    if(null != conn) try { conn.close(); } catch (SQLException sqle)
        {  sqle.printStackTrace(); }
}</pre>
<p>That&#8217;s it. Not really that hard, but if you don&#8217;t have your try/catch block straight and the proper code in the finally block, they you are asking to leak connections or other stuff. Java might protect you from a lot of things by collecting its own garbage, but you can still bring down your database server by opening way too many connections and then leaking them all.</p>
<p>Transactions bring another problem to the table because you have to do a rollback unless you want the transaction to commit. Most connection pools default to &#8220;auto-commit&#8221; mode, and the JDBC spec states that calling setAutoCommit(true) commits a transaction if one was in progress. That means that you need to <em>ensure</em> that your transaction is rolled-back&#8230; otherwise it will be committed for you, and that&#8217;s probably not what you want.</p>
<pre>Connection conn = null;
PreparedStatement ps = null;
ResultSet rs = null;

try
{
    conn = ...; // Get your connection however you want
    conn.setAutoCommit(false); // BEGIN

    // Issue your queries and get your results.

    conn.commit();             // COMMIT
}
catch (SQLException sqle)
{
    // DO NOT allow the call to rollback to throw an exception. See the
    // notes in the 'finally' block in the last example. You could even
    // create your own 'rollback' method to simply do this to clean-up
    // your code a bit.
    if(null != conn) try { conn.rollback(); } catch (SQLException sqle1)
        { sqle1.printStackTrace(); }

    throw new ApplicationException("Error in database code", sqle);
}
catch (ApplicationException ae)
{
    // Yes, catch ApplicationException. Anything that the current method can
    // throw /must/ be caught and re-thrown.

    if(null != conn) try { conn.rollback(); } catch (SQLException sqle)
        { sqle.printStackTrace(); }

    throw ae; // Re-throw the same exception
}
catch (RuntimeException rte)
{
    // You wouldn't want a NullPointerException to end up committing your
    // partial transaction, would you?

    if(null != conn) try { conn.rollback(); } catch (SQLException sqle)
        { sqle.printStackTrace(); }

    throw rte; // Re-throw the same exception
}
catch (Error e)
{
    // Errors, too!

    if(null != conn) try { conn.rollback(); } catch (SQLException sqle)
        { sqle.printStackTrace(); }

    throw e; // Re-throw the same error
}
finally
{
    // Here, I'm taking my own advice to put the cleanup code into a
    // separate method. See how nice this looks?
    this.close(conn, ps, rs);
}</pre>
<p>Following the code samples above will keep your DBAs very happy, and you&#8217;ll never have post a message to a mailing list and have to admit that you really didn&#8217;t know what you were doing when it comes to JDBC programming.</p>
<p>I have a few more general tips for JDBC, connection pools, etc.:</p>
<ol>
<li>In development, set your connection pool to a <em>fixed</em> size of 1 connection. This will let you know immediately if you have any deadlock potential in your code when trying to request more than one connection at a time from the pool. If you <em>do</em> request more than one connection at a time, then there is a possibility that you could deadlock your application waiting on database connections. Just do it. It won&#8217;t hurt, and you might be surprised to find out that you forgot to &#8216;close&#8217; a connection (and therefore return it to the pool). Using this technique will find the error <em>much</em> faster than others.</li>
<li>Use a &#8220;validation query&#8221;. When you request a connection from the pool, you can have the pool validate that the connection is still valid (i.e. connected, etc.) by issuing a simple query to the database. <em>Make this a simple query</em>. Queries like &#8220;SELECT * FROM purchase&#8221; are bad ideas. Choose something like &#8220;SELECT 1 FROM DUAL&#8221; (for you Oracle folks) or &#8220;SELECT 1&#8243; for databases that don&#8217;t require a table name for trivial SELECT queries. If you&#8217;re using MySQL, use &#8220;/* ping */ SELECT 1&#8243; since recent versions of Connector/J can detect the &#8220;/* ping */&#8221; part of the query and issue a super-cheap connection test that doesn&#8217;t even involve the query parser. Why issue a trivial query when you don&#8217;t have to?</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://blog.christopherschultz.net/index.php/2009/03/16/properly-handling-pooled-jdbc-connections/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Character Assassination</title>
		<link>http://blog.christopherschultz.net/index.php/2005/11/18/character-assassination/</link>
		<comments>http://blog.christopherschultz.net/index.php/2005/11/18/character-assassination/#comments</comments>
		<pubDate>Fri, 18 Nov 2005 17:35:56 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://blog.christopherschultz.net/?p=35</guid>
		<description><![CDATA[Using a programming language touted for its strong internationalization support along with an application server that should do the same, one would think that "international" characters would be easier to deal with. It turns out that the world is out to get me.]]></description>
			<content:encoded><![CDATA[<p>
    At the dawn of (computer) time, someone decided that computers being able     to deal with letters as well as numbers would be a great idea. And it turned     out to be a big &#8216;ole mess. </p>
<p>
    The problem is that you have to decide how to <i>encode</i> these letters     (or characters) into numbers, which is the only thing that computers can handle.     <a href="http://en.wikipedia.org/wiki/EBCDIC">EBCDIC</a> and     <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a> were two of the first,     and while DBCDIC has effectively died, ASCII has turned into a few (relatively     compatible) standards such as US-ASCII and ISO-8859-1 (also called &#8220;Latin-1&#8243;).     These jumbles of letters are called <i>character sets</i>, and the describe how     to take the concept of a letter and turn it into one or more 8-bit bytes for     processing within the computer. </p>
<p>
    One of the most flexible characters sets is called     <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a>, and represents an     efficient packing of bytes by only using the minimum necessary. For example,     there are jillions of characters out there in human language if you take into     account written languages like Chinese, Sanskrit, etc. We would need many     bytes to represent all character possibilities (maybe 4 or 5), but UTF-8     has a trick up its sleeve that helps reduce the number of bytes taken up     by common (read: Latin-1) characters. It&#8217;s also completely backward-compatible     with ASCII, which makes it super-handy to use in places where ASCII was     already being used, and it&#8217;s time to add support for international characters. </p>
<p>
    Now that the history lesson is over, it&#8217;s time to complain. </p>
<p>
    I&#8217;m writing an application in the Java programming language, which is generally     highly touted as having excellent internationalization (or     <a href="http://en.wikipedia.org/wiki/I18n">i18n</a>) support: it has     encoding and decoding capability for a number of different character sets     (ASCII, UTF-8, Big5, Shift_JIS, any number of ISO-xyz-pdq encodings, etc.),     natively uses Unicode (actually, UTF-16, which is a specific type of Unicode),     and has some really sexy ways to localize (that&#8217;s the process of managing     translations of your stuff into non-native languages &#8212; such as Spanish     being non-native to me, an English speaker) content. </p>
<p>
    I was tyring to do something very simple: get my application to accept a     &#8220;funny&#8221; (or &#8220;international&#8221; or non-Latin-1&#8230; I&#8217;ll just say &#8220;funny&#8221;, since I don&#8217;t     use those characters very often) character. I love the Spanish use of     open-exclaimation and open-question characters. They&#8217;re upside-down versions     of ! and ? and preceed questions and exclaimations. It makes sense when you     think about it. Anyhow, I was trying to successfully take the string     &#8220;¡Bienvenidos!&#8221;, put it into my database, and get it back out successfully,     using a web browser as the client and my own software to move the data     back and forth. </p>
<p>
    It wasn&#8217;t working. Repeated submissions/views/re-submissions were resulting     in additional characters being inserted before the &#8220;¡&#8221;. Funny stuff that I had     clearly not entered. </p>
<p>
    I&#8217;ve done this before, but the mechanics are miserable and I pretty much block     out the painful memories each time if happens. </p>
<p>
    The problem is that many pieces of code get their grubby little hands on the     data from the time you type it on your keyboard and the time it gets into     my database. Here is a short list of code that handles those characters,     and where opportunities for cock-ups occur. </p>
<ul>
<li>Keyboard controller. Your keyboard has to be able to &#8220;type&#8221; these     characters correctly so that the operating system can read them. I can&#8217;t     type a &#8220;¡&#8221;on my keyboard, so I need to take other steps.</li>
<li>Your operating system. MS-DOS in its default configuration in the US isn&#8217;t     going to handle Kanji characters very well.</li>
<li>Your web browser. The browser has to take your characters and submit     them in a request to the web server. Guess what? There&#8217;s a character encoding     that is used in the request itself, which can complicate matters.</li>
<li>The web server, which may or may not perform any interpretation of the     bytes being sent from the web browser.</li>
<li>The application server, which provides the code necessary to convert     incoming request data into Java strings.</li>
<li>My database driver, which shuttles data back and forth between Java     and the database server.</li>
<li>The database itself, which has to store strings and retrieve them.</li>
</ul>
<p>
    I can pretty much absolve the keyboard and operating system at this point.     If I can see the &#8220;¡&#8221; on the screen, I&#8217;m pretty happy. I can also be reasonably     sure that the web browser knows what character I&#8217;m taking about, since it&#8217;s     being displayed in the text area where I&#8217;m entering this stuff. My web server     is actually ignoring request content and just piping it through to my app server.     The database and     driver should be okay, as I have specified that I want UTF-8 to be used both     as the storage format of characters in the database, and for communication     between the Java database driver and the database server. </p>
<p>
    That leaves 2 possibilities: the request itself (made by the web browser) or     the application server (converts bytes into Java strings). </p>
<p>
    The first step in determining the problem is research: what happens when     the web browser submits the form, and how is it accepted and converted into     a Java string? </p>
<ol>
<li>The web browser creates a request by converting all the data in a form into     bytes. It does this by using the content-type     &#8220;application/x-www-form-urlencoded&#8221; and some character encoding. You can     ignore the content-type for now.</li>
<li>The request is sent to the server.</li>
<li>The application uses the <code>ServletRequest.getParameter</code>    method to get a String value for a request parameter.</li>
<li>The application server reads the parameter out of the request using some     character encoding, and converts it into a String.</li>
</ol>
<p>
    So, it looks like the possibilties for confusion are where the character sets     are chosen.  The     <a href="http://www.w3.org/TR/REC-html40/interact/forms.html#edef-FORM">W3C</a>    says that &lt;form&gt; elements can specify their preferred character set     by using the <code>accept-charset</code> attribute. The default value for     that attribute is &#8220;UNKNOWN&#8221;, which means that the browser is free to     choose an arbitrary character set. A semi-tacit recommendation is that the     browser use the character encoding that was used to provide the form (i.e.     the charset of the current page) as the charset to use to make the request. </p>
<p>
    That seems relatively straightforward. My responses are currently using UTF-8     as their only charset, so the forms ought to be submitted as UTF-8. Perfect!     &#8220;¡&#8221; ought to successfully be transmitted in UTF-8 format, and go     straight-through to my database without ever being mangled. Since this wasn&#8217;t     happening, there was obviously a problem. What character set *was* the     browser using? A quick debug log message ought to help: </p>
<pre class="code">
DEBUG - request charset=null </pre>
<p>
    Uh, oh. A <code>null</code> charset means that the app server has to do     some of it&#8217;s own thinking, and that usually spells trouble. </p>
<p>
    Time to take a look at the &#8216;ole API specification. First stop,     <a href="http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/ServletRequest.html#getParameter(java.lang.String)">ServletRequest.getParameter()</a>,     which is the first place my code gets a crack at reading data. There&#8217;s no mention     of charsets, but it does mention that if you&#8217;re using POST (which I am),     that calling <code>getInputStream</code> or <code>getReader</code>    before calling <code>getParameter</code> might cause problems. That&#8217;s     a tip-off that one of those methods gets called in order to read the parameter     values themselves. Since <code>InputStream</code>s don&#8217;t care about     character sets (they deal directly with bytes), I can ignore that one.     <a href="http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/ServletRequest.html#getReader()">ServletRequest.getReader()</a>    claims to throw <code>UnsupportedEncodingException</code> if the encoding     is (duh) unsupported, so it must be applying the encoding itself. There is no     indication of how the API determines the charset to use. </p>
<p>
    The HTTP specification has a header field which can be used to communicate     the charset to be used to decode the request. The header is &#8220;content-type&#8221;,     and has the form: &#8220;Content-Type: major/minor; charset=[charset]&#8220;. I already     mentioned that the content-type of a form submission was     &#8220;application/x-www-form-urlencoded&#8221;, so I should expect something like     &#8220;Content-Type: application/x-www-form-urlencoded; charset=UTF-8&#8243;     to be included in the headers from the browser. Let&#8217;s have a look: </p>
<pre class="code">
DEBUG - Header['host']=[deleted]
DEBUG - Header['user-agent']=Mozilla/5.0 [etc...]
DEBUG - Header['accept']=text/xml, [etc...]
DEBUG - Header['accept-language']=en-us,en;q=0.5
DEBUG - Header['accept-encoding']=gzip,deflate
DEBUG - Header['accept-charset']=ISO-8859-1,utf-8;q=0.7,*;q=0.7
DEBUG - Header['keep-alive']=300
DEBUG - Header['connection']=keep-alive
DEBUG - Header['referer']=[deleted]
DEBUG - Header['cookie']=JSESSIONID=[deleted]
DEBUG - Header['content-type']=application/x-www-form-urlencoded
DEBUG RequestDumper- Header['content-length']=121
</pre>
<p>
    Huh? The Content-Type line doesn&#8217;t contain a charset. That means that     the application server is free to choose one arbitrarily. Again, the unspecified     charset comes back to haunt me. </p>
<p>
    So, the implication is that the web browser is submitting the form using UTF-8,     but that the app server is choosing its own character set. Since things aren&#8217;t     working, I&#8217;m assuming that it&#8217;s choosing incorrectly. Since the Servlet spec     doesn&#8217;t say what to do in the absence of a charset in the request, okly reading     the code can help you figure out what&#8217;s going on. Unfortunately, Tomcat&#8217;s     code is so byzantine, you don&#8217;t get very far into the request wrapping and facade     classes before you go crazy. </p>
<p>
    So, you try other things. Maybe the app server is using the default file encoding     for the environment (it happens to be &#8220;ANSI_X3.4-1968&#8243;) for me. Setting     the &#8220;file.encoding&#8221; system property changes the file encoding used in the system,     so I tried that. No change. The last-ditch effort was to simply smack the     request into submission by explicitly setting the character encoding in the     request if none was provided by the client (in this case, the browser). </p>
<p>
    The best way to do this is with a servlet <i>filter</i>, which gets ahold of the     request before it is processed by any servlet. I simply check for a     <code>null</code> charset and set it to UTF-8 if it&#8217;s missing. </p>
<pre class="code">
public class EncodingFilter
    implements Filter
{
    public static final String DEFAULT_ENCODING = "UTF-8";

    private String _encoding;

    /**
     * Called by the servlet container to indicate to a filter that it is
     * being put into service.
     *
     * @param config The Filter configuration.
     */
    public void init(FilterConfig config)
    {
	_encoding = config.getInitParameter("encoding");
	if(null == _encoding)
	    _encoding = DEFAULT_ENCODING;
    }

    protected String getDefaultEncoding()
    {
	return _encoding;
    }

    /**
     * Performs the filtering operation provided by this filter.
     *
     * This filter performs the following:
     *
     * Sets the character encoding on the request to that specified in the
     * init parameters, but only if the request does not already have
     * a specified encoding.
     *
     * @param request The request being made to the server.
     * @param response The response object prepared for the client.
     * @param chain The chain of filters providing request services.
     */
    public void doFilter(ServletRequest request,
			 ServletResponse response,
			 FilterChain chain)
	throws IOException, ServletException
    {
	request.setCharacterEncoding(getCharacterEncoding(request));

	chain.doFilter(request, response);
    }

    protected String getCharacterEncoding(ServletRequest request)
    {
	String charset=request.getCharacterEncoding();

	if(null == charset)
	    return this.getDefaultEncoding();
	else
	    return charset;
    }

    /**
     * Called by the servlet container to indicate that a filter is being
     * taken out of service.
     */
    public void destroy()
    {
    }
}
</pre>
<p>
    This filter has been written before: at least <a href="http://wiki.apache.org/tomcat/Tomcat/UTF-8">here</a> and     <a href="http://java.sun.com/products/servlet/Filters.html#72673">here</a>. </p>
<p>
    It turns out that adding this filter solves the problem. It&#8217;s very odd that     browsers are not notifying the server about the charset they used to encode     their requests. Remember the &#8220;accept-charset&#8221; attribute from the HTML     &lt;form&gt; element? If you specify that to be &#8220;ISO-8859-1&#8243;,     <a href="http://www.mozilla.org/products/firefox">Mozilla Firefox</a>    will happily submit using ISO-8859-1 and not tell the server which encoding     was used. Same thing with <a href="http://www.microsoft.com/windows/ie">Microsoft Internet Explorer</a>. </p>
<p>
    I can understand why the browser might choose not to include the charset     in the content type header because the server ought to &#8220;know&#8221; what to expect,     since the browser is likely to re-use the charset from the page containing the     form. But what if the form comes from one server and submits to another?     Neither of these two browsers provide the charset if the form submits to     a different page, so it&#8217;s not just an &#8220;optimization&#8221;&#8230; it&#8217;s an oversight. </p>
<p>
    There&#8217;s actually a <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=241540">bug</a>    in Mozilla related to this. Unfortunately, the fix for it was removed because     of incompatibilities that the addition of the charset to the content type was     causing. Since Mozilla doesn&#8217;t want to get the reputation that their browser     doesn&#8217;t work very well, they decided to drop the charset. :( </p>
<p>
    The bottom line is that, due to some bad implementations out there that ruin     things for everyone, I&#8217;m forced to use this awful forced-encoding hack.     Fortunately, it &#8220;degrades&#8221; nicely if and when browsers start enforcing the     HTTP specification a little better. My interpretation is that &#8220;old&#8221;     implementations always expect ISO-8859-1 and can&#8217;t handle the &#8220;charset&#8221;     portion of the header. Fine. But, if a browser is going to submit data in any     format <i>other than</i> ISO-8859-1, then they should include the charset     in the header. It&#8217;s the only thing that makes sense. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.christopherschultz.net/index.php/2005/11/18/character-assassination/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How old are you, really?</title>
		<link>http://blog.christopherschultz.net/index.php/2005/08/07/how-old-are-you-really/</link>
		<comments>http://blog.christopherschultz.net/index.php/2005/08/07/how-old-are-you-really/#comments</comments>
		<pubDate>Mon, 08 Aug 2005 01:09:00 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://blog.christopherschultz.net/?p=32</guid>
		<description><![CDATA[An investigation into time delta computation.]]></description>
			<content:encoded><![CDATA[<blockquote>
<p>
    When a man sits with a pretty girl for an hour, it seems like a minute. But let him sit on     a hot stove for a minute&#8211;and it&#8217;s longer than any hour. That&#8217;s relativity.     </p>
<p>
    -Albert Einstein     </p>
</blockquote>
<p>
    Reckoning time has always been a problem for humans, it seems. We have argued     over <a href="http://www.hermetic.ch/cal_stud.htm">which calendar to use</a>    for quote a long time. Even worse is trying to figure out how long ago something     happened. </p>
<p>
    The answers to many &#8220;how long ago&#8221; questions can be answered with a certain degree     of slop. For example, &#8220;how long ago was <a href="http://en.wikipedia.org/wiki/Jesus">Jesus     of Nazareth</a> born?&#8221; could be answered,     &#8220;about 2000 years ago&#8221;. &#8220;When was peace declared at the end of     <a href="http://en.wikipedia.org/wiki/World_War_II">World War II</a>?&#8221;, &#8220;60 years ago&#8221;.     But what a question to which the answer should be more specific, such as &#8220;how long ago     was I born?&#8221;. I want to know the years, months, and days for that figure, and here&#8217;s why. </p>
<p>
    As part of my continuing work with <a href="http://www.childhealthcare.org">The Center     for Promotion of Child Development Through Primary Care</a>, I have to be able to display     ages for patients that our doctors will be treating. More often than not, these patients are     young, so we&#8217;re talking about newborns through adolescencts. For the newborns, the number     of months and days is very important, while the ages of adolescent patients are okay to round-off     to years and months, and maybe just years. </p>
<p>
    It turns out that it&#8217;s somewhat difficult to answer the question &#8220;how old are you?&#8221;. It doesn&#8217;t     really seem all that hard, until you actually try to do it. The problem is that people disagree about     a lot of things. For example, you won&#8217;t get much argument that there are 10 days separating     2000-01-01 and 2000-01-11, or that there is 1 month separating 2000-01-01 and 2000-02-01.     But what about the date difference between 2000-01-31 and 2000-02-30? Is that 30 days or     is it 1 month? </p>
<p>
    <a href="http://www.boyet.com/">Julian Bucknall</a> is a guy who studies algorithms,     at least as a hobby. He has a <a href="http://www.boyet.com/Articles/PublishedArticles/Calculatingthenumberofmon.html">discussion     of time reckoning in software</a> including a sample implementation in C#. Although I appreciate     his discussion (and created a few new unit tests based upon some of the problematic date ranges     he presents), I don&#8217;t entirely agree with how he did his implementation. I happen to be using     Java for my purposes, but I did my own implementation because I needed to, not because I&#8217;m     just a Java wonk. </p>
<p>
    Before I start, those without a programming background have to realize that most programming     languages have very poor tools for handling dates. Mostly they center around counting milliseconds     since a certain date (usually <a href="http://en.wikipedia.org/wiki/Unixtime">1970-01-01</a>).     This is great for quick calculations of numbers of days between events, since a day has a fixed     number of milliseconds (1000 ms/sec * 60 sec/min * 60 min/hr * 24 hr/day = 86400000 ms/day). </p>
<p style="font-style:italic;">
    For those of you who are too smart for your own good, I&#8217;m going to be ignoring     <a href="http://tycho.usno.navy.mil/leapsec.html">leap seconds</a> and things     like that for the time being, since computers generally don&#8217;t handle those, anyway. If you want your     computer&#8217;s time to be correct to the nearest leap-second mandated by the     <a href="http://hpiers.obspm.fr/">IEOS</a>, you should just manually adjust your clock     whenever it&#8217;s convenient&#8230; no date library is going to worry about keeping a list of all leap-seconds     ever added to civil time. </p>
<p>
    So, back to dates in software. Since the number of milliseconds in a day is fixed, and computers     often represent dates as a number of milliseconds from a fixed date (generally known as the epoch),     it&#8217;s very easy to calculate the difference between two dates as a number of days. For example, I     was born on 1977-10-27. That means that I am 10146 days old (wow, that doesn&#8217;t seem like a lot&#8230;).     But how many years, months, and days old am I? </p>
<p>
    Fortunately, for discussion purposes, I&#8217;m writing this entry on 2005-08-07, which has both the     day-of-month, as well as the month itself, less than the same numbers in my birth date (that is,     8 is less than 10, and 7 is less than 27). That&#8217;s good because it makes the math harder.     If I had been born on 1977-08-01, then you could count on your fingers that I am 28 years, 0 months,     and 6 days old. Since I was born later in the month and later in the year, there are all kinds of     fun things that have to happen. </p>
<p>
    If you were to perform these calculations on your fingers, you&#8217;d probably start with the birth date     and keep adding years until you couldn&#8217;t add them anymore without going over. You&#8217;d easily get to     27 and stop (if you had that many fingers). But then, you have to figure out what the differences     are between the months and days. Exactly 27 years after my birth would be 2004-10-27. In order to     get yourself to 2005-08-07, you need to add a bunch of months. If you add 10 months, you&#8217;ll get     2005-08-27, which is too much. So, you have to add 9 months instead, and then figure the days.     Exactly 27 years and 9 months after my birth would be 2005-07-27. In order to get to today, you have     to add days. If you add 11 days, you&#8217;ll get to 2005-08-07. Ta-da! </p>
<p>
    Now, that didn&#8217;t seem too bad, did it? Actually, an implementation which basically follows this     on-your-fingers calculation is the one proposed by Julian Bucknall as well as many others     on the web. I don&#8217;t like this implementation because is it computational overkill (you have to do     <i>lots</i> of looping, and most Date object implementations that exist out there will re-calculate     a bunch of stuff whenever you update a single field, such as the year or month). I actually wrote     mine before I read his article, and I don&#8217;t have a C# compiler handy to run his algorithm through     my test cases, so I can&#8217;t be sure that they yield the same results. At any rate, I have an     implementation that should be a little more efficient and meets my needs. </p>
<p>
    Oh, one last note: we had been using a Java library called     <a href="http://mindprod.com/jgloss/bigdate.html">BigDate</a> to do our date calculations.     I knew it was going to be a pain in the neck to write our own, so we found a library that     would do it for us. Unfortunately, it fails with Java Date objects representing dates before     1970-01-01. The author claims that his library handles dates prior to 1970 in contrast to Java&#8217;s Date,     but it appears that he is wrong on two counts: Java&#8217;s Date class does, in fact, handle dates before 1970,     and his library trips over them. I was able to use his library by passing-in the year, month, and date     separately, but that required me to use deprecated methods in the Date API, and I was already     starting to look down my nose at it, slightly. Just for the heck of it, I tried to use BigDate to calculate     the date delta between a BCE date and today, and BigDate ignored the era, so I got the wrong answers     there, too. </p>
<p>
    So, I wrote my own implementation (in Java) that quickly calculates deltas for all three fields (I&#8217;m not     concerned with time, just the date), possibly ajdusts them for BCE dates, and then runs a fairly     simple algorithm to move the date, then month and year to their correct values. We use a class     called DiffDate which just stores a year, month, and date as a return value. I have one method     that accepts a pair of Date objects, and one that accepts a pair of Calendars. Use of the Calendar     avoids deprecation warnings during compilation, and offers two methods for client code, making     it easier to use in situations that call for either Dates or Calendars. </p>
<blockquote>
<pre>
    //
    // Copyright and licence notice: I intend for this code to be freely copied, edited, improved, etc.
    // Please give me (Chris Schultz, http://www.christopherschultz.net/) credit as the source of
    // this code, and let me know if you find ways to improve it.
    //
    public static DiffDate diffDates(Date earlier, Date later)
    {
      Calendar c_e = Calendar.getInstance();
      c_e.setTime(earlier);
      Calendar c_l = Calendar.getInstance();
      c_l.setTime(later);
      return diff(c_e, c_l);
    }

    public static DiffDate diff(Calendar earlier, Calendar later)
    {
      int y1 = earlier.get(Calendar.YEAR);
      int m1 = earlier.get(Calendar.MONTH);
      int d1 = earlier.get(Calendar.DATE);
      int y2 = later.get(Calendar.YEAR);
      int m2 = later.get(Calendar.MONTH);
      int d2 = later.get(Calendar.DATE);

      // Adjust years across eras (BC dates should be negative, here).
      if(java.util.GregorianCalendar.BC == earlier.get(Calendar.ERA))
        y1 = -y1;
      if(java.util.GregorianCalendar.BC == later.get(Calendar.ERA))
        y2 = -y2;

      int d_y = y2 - y1;
      int d_m = m2 - m1;
      int d_d;

      // Now that we've got deltas, start with the days and work backward
      // changing any negatives into positives, and rippling up to larger
      // fields.
      if(d2 >= d1)
      {
        d_d = d2 - d1; // Easy
      }
      else
      {
        // To determine how big the months are.
        Calendar work = (Calendar)later.clone();
        while(d1 > d2)
        {
          // Move backward through the months, adding a whole month
          // until we have enough days to cover the deficit.
          --m2;
          // To track our progress through the month
          --d_m;
          // Now, there's one less month between dates
          if(0 > m2)
          {
            --d_y;
            work.set(Calendar.YEAR, work.get(Calendar.YEAR) - 1);
            m2 = Calendar.DECEMBER;
          }

          work.set(Calendar.MONTH, m2);
          d2 += work.getActualMaximum(Calendar.DAY_OF_MONTH);
        }

        d_d = d2 - d1;
      }

      // Adjust the months and years
      while(0 > d_m)
      {
        d_m += 12;
        d_y -= 1;
      }

      return new DiffDate(d_y, d_m, d_d);
    }
</pre>
</blockquote>
<p>
    The whole thing is very straightforward, with the notable exception of the big &#8220;else&#8221; block in the     middle of the code. It is here where we handle cases when the earlier date has a day-of-month     that is later in the month than the later date. In that case, we need to count backwards, enlisting     the help of a Calendar object to give me the lengths of various months. That &#8216;work&#8217; calendar actually     exists only to help me with leap-year determination. I suppose I would have used the old &#8220;years     evenly divisible by 4, except every 100, except every 400&#8243;, but that would have complicated my code     even further, and, I think, been inaccurate for old dates because of changes to the calendar.     Then again, I think that GregorianCalendar (the default calendar in my locale) had those same     rules, so I&#8217;d get the same results in both cases. If you want to calculate dates in October of 1582,     <a href="http://www.wisegeek.com/what-happened-to-the-calendar-in-october-1582.htm">you&#8217;re on your own</a>. </p>
<p>
    You may have noticed, but this implementation does not handle time zones in any way.     The reason is that this is intended to be for age calculation. If you were born in Sydney     on 2000-01-01, then it might still have been 1999-12-31 in New York. However, you&#8217;re     certainly not going to maintain your birthday to be 1999-12-31 when you&#8217;re in the US and     2000-01-01 when you&#8217;re in Sydney. Or, at least, we won&#8217;t ;) </p>
<p>
    It occurs to be that I&#8217;d like to write an entirely new Date implementation for Java, to handle things     like bizarre missing dates (like October 1582) and a few other things that bother me about the Date     class, but it&#8217;s just not going to happen. There are too many APIs that already use Date (or Calendar)     and they&#8217;re not likely to change. Also, one of the things that I haven&#8217;t liked about the APIs is that     they were able to neither calculate nor store delta dates. I have solved both with a delta date     implementation and a simple delta date class. </p>
<p>
    So, how old are you, exactly? My code says that I&#8217;m 27 years, 9 months, and 11 days old.     But I feel much younger than that. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.christopherschultz.net/index.php/2005/08/07/how-old-are-you-really/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
