I would love to hear more about how, specifically, you used Beyond Better. Did you still need to code using xcode or did you use Beyond Better to help actually do the iphone code?
I’m the founder of Beyond Better, so not answering the specific use-case with Xcode. But I can talk about how Beyond Better (BB) works.
BB works alongside Xcode. Use BB to plan and implement changes; it will directly modify files, and then Xcode is used to compile the app. BB can also control the process by calling xcode CLI tool; which is great for a tight feedback loop, watching for errors and immediately fixing them.
BB works best with “spec-driven development” - discuss and plan the desired outcomes, to create clear specifications, and then let BB implement the plan.
Note, BB isn’t just for coding projects; it’s useful with a range of project types; research, creative writing, etc. The key features are how BB uses multiple “data sources” (your knowledge documents, including cloud providers) and works with all popular models (Claude, ChatGPT, etc) including local-only models via Ollama.
A full article about Beyond Better is still in the works. But in brief, I’ve created (explicitly not saying “written”) a native iPhone app with Beyond Better modifying files in an Xcode project. I can’t quite remember if I used Beyond Better to help me set up Xcode as well, but it was either that or ChatGPT. I know nothing about Swift or Xcode.
Part of why I haven’t written more is that I haven’t finished the app, so I’m a little embarassed at its state. It’s a customized stopwatch app for backup timing in running races. It has three modes at the moment, Simple, Camera, and Video.
- Simple works well, and I was able to hand it to a random volunteer who had never seen it before and she recorded 540 times at our Turkey Trot on Thanksgiving. It’s designed to be used without looking at the screen and offers options for deleting extra times and marking times as “weird” when we know something happened at a certain spot.
- Camera is designed to record a time and) take a photo on every tap. I’ve mostly bogged down on getting rotation to work right. It’s devilishly difficult to get the viewfinder right.
- Video is designed for mounting on a tripod, pointing at the finish line. I haven’t gotten it working yet, but the idea is that it will record video with a timestamp overlay. On every press of a Bluetooth selfie-stick button, it records a time and saves a photo. After the race, if we can’t figure out what happened at a certain spot, we can scroll through the video to the right time and watch the finish line again. (I have another app that does this and it’s quite helpful for sorting out confusions.)
None of this is rocket science, but I’ve never found an app that looks and works the way I want. Since I time about 15 races per year, I’m extremely opinionated about how the app should work. And I’m right. ![]()
Overall, using Beyond Better is the closed I’ve come to feeling like I did with HyperCard back in the day. Not in the hands-on sense, but in the way it makes me feel when I get something working.
A friend just sent me the code for a bitcoin trading bot that he’d written (well, emended) with Claude. He is an innocent (but not when it comes to trading). I was dumbfounded at the sophistication of the Python Claude produced.
But, I was horrified by the lack of error checking and just, well, strange oddities that I was unwilling to spend the time to unravel.
He says it works but has been “adjusting” things for days.
This is pretty powerful stuff coming from these LLMs but, my, I don’t think I want my tools to be built by people who have no idea how they’re built.
Dave
Interesting. Do you only have this opinion because you see the code? What if, like most apps we use, you didn’t get to see the code?
When I was at Apple I saw the code behind one particular first-party iOS app: it was horrifying. But everybody used the app, until it was retired and then everybody missed the app.
This is an interesting point in two ways:
-
First, it sounds like it doesn’t really fall into the category of “my tools.” It’s something your friend wrote largely for himself. The stuff I’ve written is largely for myself and my direct community as well. As with HyperCard, I strongly suspect that most AI-generated apps that we non-programmers are writing are meant for extremely limited audiences.
-
Second, do any of us have any ideas how any of our digital tools are built? We certainly hope that the developers pay attention to error handling and security and the like, but all evidence points to the contrary. I’m not saying that’s good, just that it’s the reality of the world we live in. Even with open source apps, where the code is by definition public, what percentage of people have the time, interest, and knowledge to evaluate whether the code is well-constructed? Vanishingly few, so we have trust in our experiences using it (you may not hit the egregious bugs) and what others say.
Yes, the app was working, but I did see the code and was concerned about how well the app would work over time. Roughly similar to happily seeing water come out of the faucet but then looking under the sink and seeing all the joints wrapped with duct tape– not a good look for the long-term.
Dave
As a software developer I’m also keenly aware that my quest for perfection is unobtainable, and my desire for balance in my code would take infinite time, so I have to set reasonable scope.
I’m currently working on 2 games, and have 7 apps in various states of “most people would consider these done already” a handful of which are ready to launch. I need to take myself and my id out of the equation.
It’s a complex issue. And not necessarily a correlation between good code and wide/popular user base.
I’ve worked on massive projects, used daily by many thousands of people, that were so poorly designed that the support team was 4x larger than any other team in the org, simply to handle all the code failures.
I’ve worked on small open-source projects that were unarguably works of art, but never reached massive adoption.
I completely agree with your statement “I don’t think I want my tools to be built by people who have no idea how they’re built” - there are too many nuances for error handling, scaling, and more. But the bigger issue for me would be “I don’t want my tools build by people who don’t care how they are built”.
After decades in the industry, my experience is the number of people who “know how” things are built but “don’t care” about the quality, far outweigh those who both “know how” and “care about quality”.
Bringing that back to LLM-generated code, the latest generation of models are creating code that is far superior to the majority of code I’ve seen in large scale projects in the past. So which do I prefer:
- superior code managed by people who don’t understand it
- inferior code managed by people who do understand it
Honestly, today, I’m not sure which I prefer - there are very real pros/cons for both. But moving forward, as models become more capable, my choice will be very clear and easy - I prefer superior code managed by people who don’t understand it - the code will so much better, that the human managing it doesn’t need to understand it.
Claude Opus 4 writes code that is “far superior” to what I would ever spend time to write. But that is still for specific solutions, a human is still needed for the project-wide understanding.
Claude Opus 4.5 is approaching the ability to have project-wide understanding (some argue Opus 4.5 is already there). So handing over a complete project to the LLM (using Beyond Better in my case) is becoming a very real solution.
Unfortunately, that situation is way too common in the modern software world.
Most people believe that software making their lives easier is elegant and refined. That is rarely true.
Well said.
I personally have very high standards for code quality. But upholding that standard against economic and market pressures is an ongoing and very real tension. It’s the experienced developer that intuitively understands where to make the tradeoffs. There will always be technical debt in any sufficiently large project; the questions is knowing where that debt is acceptable, or where the cost (interest) will be too high.
@ace , @gingerbeardman , @cngarrison you all have really good points but I don’t think I made my point clearly enough.
What concerns me is LLM’s can produce sophisticated code but when the person asking for it has no idea whether the code is good is asking for potentially serious trouble. The widely reported farce of lawyers (who should have checked the results being lawyers an’ all) who submitted utterly false briefs to the court is a fine case (
). They assumed it was right. It’s bot-seduction. Gosh, it looks great! Must be right! When you’re producing an app that has real world consequences—money gained or lost, health tracking, production control and so on—ignorance is a problem. It’s so easy to do when you’re sitting there at the end of an evening fiddling around with an LLM that produces an amazing looking app that you forget the consequences of it being wrong. And because you’ve never written code since that Java class in high school, 20 years ago, you have no idea what to look for so you just say, “Cool! It works!”
Dave
That is the critical point. This discussion is combining both personal apps and distributable apps for important or wide distribution.
I completely agree that an app with real world consequences needs human oversight.
But a personal app (or app with small/limited audience) can be a success without human oversight of all the generated code.
A large/distributable app can be written by an LLM, but the final product still needs human review. (At least with current generation of models; I expect that to change within a year.) Also, that review isn’t just code quality, it’s the over-arching architectural design decisions. I expect those decisions to be made by someone with deep knowledge and expertise, both with the technology and with the business domain.
I don’t review all LLM-generated code. But in those cases, I do expect a robust test suite, which I will review. I do trust quality LLM-generated code combined with comprehensive test suite. In fact, I trust that a lot more than I trust quality human-generated code without a test suite.
That’s just being lazy. The same applies whether it’s legal or coding. It’s not a shortcoming of the LLM, it’s a shortcoming of the human who is ignoring the responsibility to actually produce valid results.
My point in earlier comment; there are many human software developers who ignore that responsibility and produce shoddy work regardless. That problem existed long before LLM’s.
Changing the focus slightly… where is the line of responsibility? As a developer with decades of experience; I write quality code. I take pride, and responsibility for that code. But to what extent?
The code I write doesn’t run on hardware literally as I wrote it - it gets compiled (or interpreted) into code that can execute at the hardware level. I don’t review that code. Even if I did review that code, I’m not skilled enough to judge the quality of that code. I shift the responsibility to compiler/interpreter to generate the expected code at same level of quality. When I push that code out to run on a remote server (eg. in AWS systems), that pushes the responsibility even further away from me.
So let’s move in the opposite direction. I “write code” using English language, which the LLM compiles/interprets into Python or Typescript. That is the exact same process of shifting responsibility to a lower level. What makes it ok to shift that responsibility to a machine language interpreter, but not ok to shift that to a LLM interpreter? In both cases, the final output quality is a direct correlation to the quality of my input.
When I want the LLM to write code for me; I don’t just provide a two-sentence prompt and hope for the best. I will have a long conversation with the LLM, often taking a couple of hours (or more), to analyse requirements, generate a plan, review implementation details, confirm testing methodology, and finally to proceed with the actual work of writing code. With all of that preparation, the results I get back are almost always production quality. The results are better than I would have if I spent a few days (or weeks) writing the code myself. But it’s my decades of experience that allow me to work with the LLM and understand all the nuances of the planning stage, and make the critical decisions that will result in a quality outcome.
Exactly. But who is going to have that experience 20 years from now?
I’m reminded of another industry that has gone through similar knowledge loss, without apparent consequences - structural engineers. Over 20 years ago, the engineers had to understand all the finer details of stresses, materials, etc - along with all the mathematical equations to ensure compliance was met. Today, the engineers are taught how to use software that handles that responsibility.
The knowledge isn’t completely lost (at least I hope not) since the software writers must have the deeper understanding. But the actual bridge builders have lost that knowledge. The same will happen with software engineers too.
But what about combining those two; LLM’s writing software to design bridges - who has the knowledge then? But in reality, it will be LLM’s (or more likely LWM’s) that design the bridges, skipping the software in between completely.
Large World Model
Well, great. Now I’m going to have to avoid bridges for the next few decades.
I think Xcode is still required to test and deploy your app, if you want to make an app with AI then you can use the Cursor IDE along with Ccode. You can find hundreds of tutorials to make apps with xcode and cursor.
Yes, if you want to make a native iOS app, you have to work in Xcode. Beyond Better makes that possible by editing the Xcode project files on disk directly, but I still run the code from Xcode after every pass.
Xcode 26 introduced AI integrations with ChatGPT and Claude (you can also add your own model provider). I’ve been using the ChatGPT integration, and have to say that model has come a long way since I first started asking it coding questions a couple years ago. Being integrated within Xcode makes it easy to say “take a look at this entire project, and point out areas for improvement.” But in my experience, it still takes some (actually, quite a bit of) fluency in Swift/SwiftUI to be able to end up with a useful product. Give you a leg up? Suggest an approach to a problem you hadn’t thought of? Absolutely. But it still gets lost in the weeds pretty routinely.