My first Azure AI project - From ZERO to AI HERO

After much thinking, I chose my first Azure AI application. A project that I felt was going to give me a real life experience on the dynamics of building a solution that would solve a real problem or improve an existing process.

My wife is a day trader. One of her biggest daily challenges is the complexity and time-consuming nature of the due diligence she must complete within a very tight time window before entering a trade. She has to check important market and company information in different websites. To help her, I decided to build an app that pulls and consolidates all the relevant (technical and fundamental) market information and automatically checks it against the rules of her trading system. This would give her a single, streamlined dashboard instead of multiple websites, and ideally, would offer her a clear quick assessment of how well a potential trade aligns with her strategy.

The App: DDGenie

DD Genie (as in Due Diligence Genie) is a web-based application developed for the trading firm CBL Trading. When given a stock ticker, it aggregates intraday market data from multiple sources and evaluates the ticker’s compliance with the firm’s proprietary systematic trading rules, providing clear recommendations. The platform’s AI pièce de résistance is a module called “The Oracle,” which predicts the stock’s closing price for the day using a model pre-trained on data from over 1,000 historical trade alerts generated by the same trading system in the past.

I won’t go into the details of the trading system in this post; my goal is simply to capture and organize all of my ideas in one place.

This project has been an incredible learning experience!

Things I’ve learned while building this app

1.- You can’t use an LLM for complex tasks it wasn’t trained for

My initial idea was simple—though admittedly not very sophisticated: to build a small dataset containing company information and intraday market data for the target ticker, then feed it into a pre-built LLM. I would bring the model up to speed on the company’s trading rules and conditions through prompt-based inference. After that, I’d consult the LLM to evaluate how well a specific trade aligned with the trading system, hoping to receive a clear report highlighting both the alignments and any violations.

The problem is that, while most LLMs are very well trained to understand semantics, and can provide semi-coherent answers to various prompts. The only “training” I was giving this particular model about the trading system I expected it to enforce was a list of rules containing stuff like: “The float of the stock needs to be between so and so to be tradeable by the system”, or “More than “X” number of employees is a red flag for the system” ,etc.

That amount of information might be sufficient for a human, who can apply the rules to a small dataset and identify which values comply and which don’t. However, it’s nowhere near enough for an AI model. To generate meaningful results, the model would need to be trained on hundreds, if not thousands, of examples for each scenario—and even then, its answers would often be only partially correct, and inconsistent at times.

Although I learned a lot through the process—deploying the LLM to a virtual machine in Azure, creating an endpoint, and calling it from the app by sending a JSON object with the trade’s market data—the results were disappointing. The model did return a grammatically coherent response in English, but the facts were distorted and often completely wrong. It clearly didn’t understand the rules, yet that didn’t stop it from confidently generating an answer.

2.- Some times, most of what you need to do is do-able with code

In the real world, it turns out, it’s quite possible that the amount of AI you need in a project is very little. When I went to the whiteboard to design DDGenie, I realized that almost everything I needed to do was do-able with rule-based programming. I don’t need a ML model to tell me that the float of a company is too low, or that the employee count is too high.

Those are numerical values, and I can easily write code that would flag (with 100% accuracy) what needs to be flagged. No AI “reasoning” needed to have a program flag 1M as too low of a number for the Float of a company (for example).

So, as much as I wanted to pull the AI trigger everywhere, I kept my attention in the real needs of my ‘customer” (a.k.a. my wife) and the best way to do most of what she needed to be done was do-able in plain Python, so i did it.

3.- Keep it narrow, stupid

Another thing I learned is that The narrower the scope of the AI used in your applications, the better the results tend to be. AI is far more compartmentalized than many people expect. To build a truly intelligent system, it’s often more effective to divide the problem into smaller, well-defined tasks. Train one model for each specific task, and combine their outputs. The collective intelligence of these specialized “cells” creates a smarter and more reliable system overall.

4.- As for the quality of the data…

The old developer’s adage “Garbage in, garbage out” is more relevant today than ever. Data must be validated, cleaned, and normalized before being fed into a model. AI has little to no built-in filtering—it will always returns an answer. If the input data is poor, the output will be just as poor.

To improve my model’s Normalized Squared Error Score, I had to refine the dataset by removing unnecessary (or confusing) columns, normalizing skewed data, and converting time columns into integer values by calculating the number of minutes elapsed from a reference time (4:00 AM EST) instead of using the standard HH:MM format that is much harder to process (9:00 AM would be 300).

I removed columns with to many N/A or missing rows. Sometimes a zero can be informational (Like when you say: “The Gap between the previous day close and the open is zero” that means the stock open price was the same as the previous close), but sometimes it means there was no data in the data provider database. That’s a very different zero!. I replaced those with N/A and, in the case of some columns like IPO date, there were too many N/A making the data almost worthless , so I dropped them.

Using a model vs creating one

By the time of this exercise, there were more than 11K pre-trained models in Azure’s AI Studio. I can’t say that I checked the documentation of every single one of them, but after filtering by “Finance’ and “Trading” the list was reduced to 3 or 4. None of them created and trained to do what I wanted to do: Predict the most probable outcome of a trade entered at a specific price, time and market conditions. So, I decided to increase the complexity of my first project . I wouldn’t be simply integrating an existing AI model to my application as planned, but I would have to build one from scratch.

Tried to do it from scratch with Designer but Azure’s Automated ML does a Much better job.

Azure Designer

Azure has a wonderful visual tool called Designer. It allows developers to build a pipeline using what they call “components”, which basically are pre-made code blocks you can drag and drop to perform much needed processes like cleaning, normalizing, splitting datasets and feeding them to untrained models which are available also as components ready to be dragged into the pipeline and applied. There’s a few untrained models available based on a few different algorithms and statistical methods for different uses like regressions, anomaly detection, classification, etc.

I built a pipeline that would clean, normalize and split the data, then train, test, and evaluate a model using a dataset with the following initial columns:

After the first (many) tries running the pipeline and getting error after error it became clear (thanks Chat GPT) that some columns needed to be cleaned (i.e., Date needed to be properly formatted), some needed to be normalized due to the huge difference between them (Volume, Market Cap) and some needed to be dumped since they were adding complexity to the dataset without improving the accuracy of the model. At some point I also realized that I was training the model with data that, realistically, I was not going to have available at the time of asking it for a prediction (For example, I wasn’t going to have the open price at 6AM since the market opens at 9:30AM). That didn’t make sense; To run a regression, the model is going to need to have all the columns it was trained on inputted. The only one you won’t input is the one being predicted of course.

I removed every column that fell into that situation and this was the resulting final data-set:

As per the model, this was the one I chose . see below along with on of the pipelines i tried:

Below, metrics i obtained after making changes to the dataset I fed to the model I built in designer. Note how there were some improvements but nothing major

Using raw data

After I log-transformed the columns related to volume to normalize the peaks. It got marginally better in some metrics and worse in other.

When, in addition to Log-transformation, I also changed the time column to minutes elapsed from 4AM with the intention of normalizing the data. Again, the changes were not significant.

Azure Automated ML

I gave it a try to the automated tool Azure offers for ML creation and it gave me a pleasant surprise. The application grabs your data and tries several algorithms until it finds the one with better metrics. It has a much bigger library of algorithms that were not available in Designer and the metrics where so much better!

Normalized Square Error of the model Azure Automated ML built for me:

Streamlit is A-W-E-S-O-M-E

I’m not very sure how I bumped into Streamlit but I’m so happy I did!. I’m not even sure I know what type of sorcery they use! 😀 , what I can tell you is that it was an amazing solution to build a web-based UI for my python script. I just had to import their library , call a few output functions and the integration happens seamlessly. Replace python’s print() with their st.write when you need to display something and it will be displayed on the website it builds for you real time. Wanna share make it accessible to others in your same network? No problem. it creates a localhost for you.

Closer view to the App

Conclusion

I’m very glad I chose this as my first project. In hindsight, there are several ways I could have used the model more effectively. Even though I trained it with a reasonably large dataset, trying to predict a stock’s closing price based on the last 1,000 trades made by the system is a bit like trying to forecast the weather using only the previous two years of data—there are many other variables involved. Too many moving parts influence the outcome.

Rather than predicting an exact closing price, a better approach would have been to predict the edge—whether i should have a bearish or bullish sentiment. That is, i think, a more realistic expectation for a model of this nature.

I didn’t implement any pre trained model on this project as I didn’t find a real and strong use for it. I wasn’t expecting to end up training my own prediction model but it was a great experience and one that opened the door to more ideas I’ll possibly try in the future like using an image manipulation model to read charts and spot a pattern.