I wrote this post after experimenting a lot with AI coding tools. While code generation is helpful, I found a potentially more interesting pattern emerging: actually replacing complex or brittle functions (think intricate parsers, rule engines) with direct, validated LLM calls.
This seems viable now mainly due to the massive drop in inference costs over the last couple of years. Tools like Instructor AI (which I use in the example) also help bridge the gap by enforcing structured output and handling validation/retries, making the LLM calls more reliable.
The post explores this "LLM as function" idea and includes a practical Python example replacing a brittle date parser with an LLM call using Instructor.
Curious to hear if others are exploring similar patterns, what challenges you've faced, or what tools/techniques you're using to add reliability to LLM outputs within your applications. Thoughts and feedback welcome!
Can't it generate code using calendar library instead and run it? Perhaps will be easier to debug and allows for caching/reuse. Small LLM might otherwise fail at stuff like leap years (user can ask for last day of February).
You're absolutely right that using a robust calendar/date library (like dateutil in Python) is the standard approach and often the best one. It's definitely easier to debug and test in the traditional sense.
The potential value of the LLM approach comes in when:
The natural language input is very fuzzy or varied, potentially going beyond what standard libraries parse easily out-of-the-box.
You want to simplify the application code that interprets the user's intent before calling a date library function. The LLM handles the interpretation step.
Regarding edge cases like leap years: That's a valid concern for any date logic. However, capable LLMs (even models like GPT-3.5-Turbo or GPT-4o mini), when given the proper context (like today's date, as shown in the example prompt), handle standard date calculations including leap years quite well. More importantly, the validation layer (using Pydantic/Instructor in the example) is crucial here.
Caching is also possible for the LLM responses (based on input text + date context) if needed. It's definitely a different set of trade-offs compared to using a library directly!
I meant that LLM will be used, but not to compute date directly, but to generate python code from user input. That does not need to be thrown out of cache every day.
Hi HN, author here.
I wrote this post after experimenting a lot with AI coding tools. While code generation is helpful, I found a potentially more interesting pattern emerging: actually replacing complex or brittle functions (think intricate parsers, rule engines) with direct, validated LLM calls.
This seems viable now mainly due to the massive drop in inference costs over the last couple of years. Tools like Instructor AI (which I use in the example) also help bridge the gap by enforcing structured output and handling validation/retries, making the LLM calls more reliable.
The post explores this "LLM as function" idea and includes a practical Python example replacing a brittle date parser with an LLM call using Instructor.
Curious to hear if others are exploring similar patterns, what challenges you've faced, or what tools/techniques you're using to add reliability to LLM outputs within your applications. Thoughts and feedback welcome!
Can't it generate code using calendar library instead and run it? Perhaps will be easier to debug and allows for caching/reuse. Small LLM might otherwise fail at stuff like leap years (user can ask for last day of February).
You're absolutely right that using a robust calendar/date library (like dateutil in Python) is the standard approach and often the best one. It's definitely easier to debug and test in the traditional sense.
The potential value of the LLM approach comes in when:
The natural language input is very fuzzy or varied, potentially going beyond what standard libraries parse easily out-of-the-box. You want to simplify the application code that interprets the user's intent before calling a date library function. The LLM handles the interpretation step. Regarding edge cases like leap years: That's a valid concern for any date logic. However, capable LLMs (even models like GPT-3.5-Turbo or GPT-4o mini), when given the proper context (like today's date, as shown in the example prompt), handle standard date calculations including leap years quite well. More importantly, the validation layer (using Pydantic/Instructor in the example) is crucial here.
Caching is also possible for the LLM responses (based on input text + date context) if needed. It's definitely a different set of trade-offs compared to using a library directly!
I meant that LLM will be used, but not to compute date directly, but to generate python code from user input. That does not need to be thrown out of cache every day.