He built fault-tolerant infrastructure for 100,000+ users in finance and healthcare before the first user arrived. Here is the sequencing framework that kept those systems running, and what enterprise AI teams are getting wrong by doing it in reverse.
Most enterprise AI systems do not fail because the model was wrong. They fail because the infrastructure underneath the model was never designed for the conditions production actually creates. A Gartner study of 783 infrastructure and operations leaders found that only 28% of enterprise AI initiatives fully meet ROI expectations, and 20% fail outright. What failed was the operational layer underneath it; the infrastructure teams were underfunded, deferred, and encountered for the first time after real users had already arrived.
Abduaziz Abdukhalimov spent a decade solving a problem most teams do not discover until it is too late. At Barso LLC, he built fault-tolerant, cloud-native infrastructure for more than 100,000 active users across finance, healthcare, and telecommunications, the kind of systems where a deployment failure is not a support ticket but a regulatory exposure. He designed event-driven platforms on Apache Kafka and RabbitMQ, automated CI/CD pipelines that cut deployment windows by 60%, and restructured system architecture for a 40% performance improvement under sustained production load. What he built matters less than when and in what order he built it. That sequence is what this piece is about.
What breaks first when infrastructure is secondary?
In synchronous microservice architectures, one slow dependency exhausts shared thread pools under load, collapsing the entire system regardless of model performance. To prevent this, Abduaziz chose Apache Kafka and RabbitMQ for inter-service communication at Barso LLC. Event-driven messaging decouples producers from consumers: a service publishes an event to a queue and continues; the consumer processes it independently. When a consuming service slows or fails, the queue absorbs the backlog. The failure stays bound. The rest of the platform continues operating.
The tradeoff is real and worth stating explicitly. By introducing event-driven messaging, Abduaziz accepted eventual consistency as a design constraint. He mapped workflows upfront, identifying which could tolerate eventual consistency and which required synchronous guarantees, and structured the data model accordingly. A financial transaction updating both a ledger and a notification record, for example, required explicit atomicity decisions before a single line of code was written. That conversation cannot happen after the architecture has hardened around synchronous assumptions.
“Fault tolerance isn’t something you add later,” Abduaziz explains. “When you’re building for 100,000 users in finance or healthcare, every architectural decision either contains failure or spreads it. You have to make that call at the beginning, not after the first incident hits.”
The second failure mode Abduaziz addressed was deployment fragility. Teams that invest heavily in model capability and lightly in deployment automation cannot push critical fixes without risk; a security patch requires a manual deployment, a maintenance window, and downtime coordination across teams. In regulated industries, that gap between discovering a vulnerability and patching it is a compliance exposure, not a scheduling inconvenience.
At Barso, Abduaziz built CI/CD pipelines on Jenkins and GitHub Actions, containerized applications with Docker, and orchestrated deployments through Kubernetes, reducing deployment windows by approximately 60%. More significantly, he configured rolling deployments so updated containers replaced running ones gradually, with automatic rollback if health checks failed. Critical fixes could reach production without taking the platform offline. What appeared to be an efficiency gain was a risk management decision.
The third failure mode Abduaziz tackled was performance degradation under real load, a problem that rarely surfaces in pre-launch testing because test environments underrepresent concurrent production traffic. He restructured the architecture to move workloads that did not require a synchronous response into background processing via the event queue, and optimized database queries for sustained concurrency. The result was approximately 40% improvement in overall system responsiveness under load. The underlying principle he applied: not every operation needs to block a user-facing request. Identifying which ones can be deferred is a pre-launch design decision. Discovering under load that they cannot is a post-launch incident.
Why is the infrastructure layer underfunded?
Deloitte’s State of AI in the Enterprise 2026 report, based on a survey of 3,235 business and IT leaders across 24 countries, conducted between August and September 2025, found that only 25% of organizations have moved 40% or more of their AI pilots into production.
The data gap points to a deeper measurement problem. Most organizations track what is visible, features shipped, deadlines met, and model accuracy benchmarks. What rarely gets tracked is system behavior at three times expected load, six months after launch, without a maintenance window. Abduaziz encountered this gap directly at Barso LLC: the performance problems that his database optimizations and background processing architecture eventually solved did not appear in pre-launch testing; they appeared when concurrent production traffic hit a system that had only ever been tested at a fraction of real load. These are the conditions that determine whether a system is a production platform or a prototype that has not yet met its stress test, and they are rarely part of pre-launch evaluation criteria.
“Anyone can build a distributed system,” Abduaziz notes. “The real test is keeping it running, under real load, with tight deadlines, when taking it offline isn’t an option. That’s when most teams get a very honest look at what they actually built.”
What emergency deployments reveal that planned rollouts never do?
The most reliable test of infrastructure sequencing is not a planned rollout. It is an unplanned one, where the gap between design assumptions and production conditions collapses immediately, and the system either holds or it does not.
During the early weeks of the COVID-19 pandemic, universities across Uzbekistan required functional remote learning infrastructure within weeks, no staged rollout, no iterative hardening, no margin for failure. Abduaziz led the deployment of a Moodle-based e-learning platform under those conditions: full production load from day one, thousands of concurrent users, and no acceptable downtime. The effort was recognized by the Ministry of Higher Education.
What made it possible was not improvisation. The decisions Abduaziz had already made at Barso, containerization, automated deployment pipelines, event-driven architecture, and database optimization for concurrent load, were transferred directly to the emergency deployment. The crisis compressed the timeline, but it did not change the architectural requirements. A system that absorbs an emergency deployment was already designed to handle the load it had not yet seen. One that cannot was never designed for production in the first place.
“During COVID, we’re talking weeks, not months,” Abduaziz recalls. “Universities had to get online right away, and there was no margin for anything to go wrong. That’s when it clicked for me: scalable infrastructure isn’t a nice-to-have. It’s literally the only thing between your users and a completely broken service.”
Four decisions must be made before production, not after. Before writing model code, define whether inter-service communication will be synchronous or event-driven. Before the first deployment, build the CI/CD pipeline and rollback configuration. Before load testing, identify which operations can move to background processing. Before launch, embed authentication at the architecture level. The question is not whether the model is ready. It is whether the infrastructure was.

