An interview with David Wang, Managing Director, Artificial Intelligence and Financial Engineering at State Street Corporation

Development, Implementation And Management Of ML Models

Effectively develop and govern machine learning models for appropriate implementation and to help overcome issues surrounding data and explainability.

ENTER YOUR DETAILS FOR THE CONFERENCE AGENDA

How important is it to combat issues of data bias to secure investment for machine learning in model development?

The importance depends on the use case and business scenario. In most cases, these data biases can result in lower model performance. This usually means model performance doesn’t meet business expectation to address the problem and therefore hard to justify return of investment (ROI). ROI can be demonstrated through qualitative evaluation and quantitative measurement of client experience improvement, new revenue generation/ cost savings, operational efficiency, risk mitigation etc. However, in some cases, model performance caused by data biases is immaterial or can be mitigated using some approaches listed above.

In terms of investment, what areas are you focusing on? Where, for example, are you planning to invest within the next 3 to 6 months?

We are currently focusing on enterprise data quality control as a product on core data domains of asset management industry such as market value, price, position, Security Master, security analytics, and market data. In addition, we have use cases related to back office operation optimization e.g. reconciliation, digital marketing, and credit risk management.

What are the biggest challenges you are facing within machine learning model development right now?

First, insufficient and poor label data. This forces data scientists to give up supervised learning models and instead go with unsupervised learning models or other types of algorithms. The unsupervised learning approach typically requires larger amount of training data and it is harder to find reliable model performance metrics. Second, unbalanced sample data (sample bias) for training and validation like data sample imbalance by categories and inadequate data samples for certain frequencies. In order to deliver minimum acceptable business result, data scientists tend to overfit the model by many rounds of parameter turnings. Although it is delivered, it is hard to make the model generalized especially when the model in production and market moves. This is also not productive use of a data scientist's time. Lastly, business interpretation and explanation of model result by financial fundamentals. It is difficult to explain the model result in layman terms by underlying factors especially with deep learning models due to its non-linearity. We have to do a lot of traditional financial and quantitative engineering before implementation and design the solution carefully under sound financial principles. We usually need to implement additional modeling and analysis to produce explanation. The work conceptually is similar with attribution modeling in some traditional financial fields.

How can machine learning model development be managed to combat issues of data bias?

Some common data biases in Machine Learning are sample data bias, measurement bias, exclusion bias, linking bias, and aggregation bias. Here are some practices that can help to mitigate some of these biases.
1) Have domain experts to do feature analysis along with the data scientist and select the most meaningful features to represent problem space. This process typically involves in depth fundamental and quantitative analysis.
2) Random sampling or data augmentation can be a good fit to minimize sampling bias when the dataset isn’t large or representative enough.
3) Ensemble algorithms or cross-validation provides a good way to avoid bias during model training.
4) The model performance metrics should be designed based on the use case. For example, the model’s sensitivity is more important than the model’s accuracy in some circumstances.
5) Perform periodic online training (re-calibration) of the model in live production environment. If the model architecture is right, this can auto tune model parameters to adapt recent market data movement.

Ahead of the GFMI Development, Implementation and Management of ML Models conference we spoke with David Wang, Managing Director, Artificial Intelligence and Financial Engineering at State Street Corporation. David is responsible for artificial intelligence and financial engineering in State Street where he built artificial intelligence, financial engineering and quantitative modeling capabilities to explore and deliver products and services from front office to back office globally. David has 20+ years of experience in investment management industry with responsibilities of portfolio management, trading, modeling, analytics, risk management, financial technologies, and merges and acquisitions.

AGENDA REQUEST

April 17-19, 2023

New York, NY

An interview with David Wang, Managing Director, Artificial Intelligence and Financial Engineering at State Street Corporation

VISIT WEBSITE

David will be presenting during day one (4/17/2023) of the Development, Implementation and Management of ML Models conference!

For registration pricing and multiple attendee discounts, please contact:

Ria Kiayia

riak@global-fmi.com

Interested? Do you feel you will benefit?

In terms of our platform, (our conferences are informal and intimate peer-led meetings where all speakers and delegates are senior executives from top financial institutions), how do you see it assisting you with overcoming the challenges you currently face?

These conferences help me to see what our peers are doing, what problems they have, and how they address them. I can also see the trend in our industry and where we are heading to in the near future. The platform is definitely helpful to generate some practical ideas to overcome some of the challenges I have. For example, model interpretation approaches and NLP usage for quantitative interments.

Session: Analyse the best practices to manage and cleanse data

Consider the main internal methods of utilizing and managing alternative data
Analyse why the ‘cleansing’ of data is an area of substantial focus
Investigate the use of machine learning programs to tease out bias through dimension reduction
Balance internal and external reviews of data to overcome challenges the challenges of data bias

Panel Discussion: Debate how a reduction in the data set can optimize machine learning models

What are the challenges of overfitting in machine learning models?
How do you ensure the appropriate quantity and quality of data is inputted?
Why would this solution be optimal when artificial intelligence programs typically thrive on as much data as possible?
How does this approach reduce the chance of data bias within models?