Scaling Applications to Meet Increasing Customer Demand w/ Ahmed Farag

ABOUT THIS EPISODE

In this episode, we profile how FaceGraph, led by visionary founder & Chief Executive Officer, Ahmed Farag, is helping schools, offices and retailers use AI and ML to engage customers in new and powerful ways.

Join Ross Beard and Ahmed Farag as they discuss:

  • Overcoming challenges in scaling Facegraph’s application
  • Adapting quickly to meet new pandemic-related demands
  • Leveraging autoscaling to improve application performance

Check out these resources we mentioned during the podcast:

Application Modernization is a show where Software Leaders discuss the challenges and successes in modernizing applications and infrastructure at high-growth software companies. Learn more on Apple Podcasts, Spotify, and the Shadow-Soft website.

Listening on a desktop & can’t see the links? Just search for Application Modernization in your favorite podcast player.

You are listening to application modernization, a show that spotlights the forward thinking leaders of higro software companies. From scaling applications and accelerating time to market to avoiding expensive license and costs, we discuss how you can innovate with new technology and forward thinking processes and save some cash in the process. Let's get into it. Today we're talking with I'Med Farag about scaling applications to meet increasing customer demand. A'med is founder and CEO of Facecraft, a company that helps schools, offices and retailers use artificial intelligence and machine learning to engage customers in new and powerful ways. In this episode, we'll talk about three big scaling challenges that Ahmed in his team ran into as they grew from three customers to five hundred customers. This is a really interesting story with some great takeaways. I'm so excited to share it with you today. Here we go with our guests Ahmed Farag, I amed. Welcome to the show. How are you today? I'm doing good. How are you good? Good, today we're going to be speaking to you about scaling applications and infrastructure. Can you tell us a little bit about yourself and what you do. Yeah, so my name is Ahmett Farag aim, a software engineer, and I start my career in two thousand and four building applications in my my really love for software or started when I started writing a little bit of, you know, Games. At the time we had the similar phones that had this game snake, so I used to try to build those games and it was kind of interesting how you can build a moving object. For me, like, I used to write programs that basically do a calculations and stuff, but just to do graphics. That...

...what really got me into the world of programming, and I did this with mostly Microsoft tools. So I was a Microsoft dotnet guy from the beginning and then I work for a few companies Internet Lee, some originally from Egypt. I worked in Egypt, in the Middle East a little bit. Then I worked in the UK Netherlands and I got married to my wife, who's originally Egyptian but she was born in Chicago and we moved to the US with her family over there and I worked in most recently in Microsoft and throughout the years. In Two thousand and ten I founded the company called Software Ranger that does software consulting and in two thousand and seven, two thousand and seventeen. I'm sorry. We started facecraft. In facecraft is, you know, my last venture, which is focused on a tennis tracking. Yes, face craft. He's a very interesting story. Can you talk us through what facecraft does and what problem you are solving for customers? Yeah, so facecraft is focused on providing, you know, a platform that will help solutions interact with humans better. And this is very complex, like people would think about like what does that mean? Well, it basically the best materialization of this idea was in one of the products that facecraft produced called smile me in, which does attendance tracking. What we've done is we've taken an open source platform called identity server, for it's basically an identity system that was created by a group of engineers. It's open source on Github and it does modern authentication it. So it does o off and open ID, which is in the last I would say, sex with seven years is the the way to do authentication in anything like facebook, even Microsoft. So we customize that, that platform, and we built face graft on top of it, which allows you to do things like, you know, issue and no off token with facial recognition. So we sort of merged ai in things that advances than Ei, that makes, you...

...know, machines understand human better. You know, which is facial recognition, voice recognition and so on. We made it a part of the the ooth ecosystem and that helps us build a robust system, for example, for from you know, we call it, you know, smiling, and that helps customers do attendance tracking. So if you go to an organization today, you step in in front of a machine, it would recognize you automatically, so you don't have to press even in a button that says, Oh, I'm here, take my picture. It will just basically recognize your here and then I'll do facial recognition. When it does that, it will recognize who you are, what time you came and if you are in a school dropping off a kid, it will tell you, Oh, you have two kids, which one you're dropping off and so on. So we made that process seamless. Interesting. Can you distinguish between two people that might be twins? Wow, this is a very lass, a very good question. Actually, that's a problem in facial recognition that hasn't been solved. So if we're talking about identical twins, it's very, very difficult and when apple has produced face ID first in the market, there were like a couple of companies, I think one of them in Russia, that was trying to show how they can break it, as well as when Microsoft did, you know, the facial recognition in the cold windows hello, when the hello in windows, there was some. So you know, companies have shown how we can have a twin that basically unlocks the machine or apple with you know, if your identical. See the problem is facial recognition that's successful today, which we call practical, meaning it would ninety nine percent give you a good match. Usually works. It has two components. It has the detection component, which is essentially you give me a picture, I need to detect where the face in the image. So that uses machine learning, and then the other part of it is just a normal search problem where, once you do detection, one of the things you do as you try to extract the features of the face. So where is the location of the eye, the size of the eye, the eyelids, the nose, the mouth, so all of those. So the more...

...features you gather, the better your ugorithm for for matching will be. For identical twin, those are identical. So it's a it's not easy. So in what we call high high security scenarios, let's say unlocking doors, opening bank accounts, we normally require a second factor authentication. Much like having you sign in with a using them a password, we would actually send you a text and then you would have to enter Verification Code. This way, if your twin brother is trying to steal your money, they have to have your phone. That's really interesting. In a matter of fact, I am an identical twins, so maybe you could test use us as tests in the future. Anyway, it sounds like you're looking at lots of data as you're growing. How are you managing all of this data and how you scaling your platform? Yeah, no, this is a good question. So I mean just to bring you know the audience this pe on on the size of data we're dealing with. So currently we have about twenty two thousand, you know, active users. Most of those users will be using our system between six am in the morning Pacific time all the way to around ten am in the morning specific time, because most of our customers are here in in the US, Canada and Mexico. So we're working on three time zones and all of them come in the morning use our system and then most of them go in the afternoon around, I would say, from three PM Pacific all the way to six or seven right so we normally receive about a million requests during that time. So that's start. That's our peak in our requests are very large because we receive images, right so we're not really getting just a usually in a password. We have a very big volume of data that comes in in the morning and, you know, obviously comes in in the afternoon. And the challenge that we have is we have to do a response under a second. So our promise is when you stand in front of the machine, you try to get something done. Whether you know today is the tennis tracking. It...

...could be also temperature scanning, because we've added this feature for chrona. You don't want to be waiting for a minute to get a response. So we have this obligation to make this work really fast and then add to this the obligation of security and safety, where we have to do a lot of scraping for data and so on on the spot. So for scalability, when we started we had three customers. Now we have about five hundred or so customers. We work with businesses directly, not consumer. So we run into problems of scale and more importantly, is how can we, you know, have responses go quickly to customers? So we're, you know, basically scale on responses per second. And then this the other scale problem is actually storage, because we some of our customers need to have their data store for up to ten years and a lot of those will keep records as well as images interesting. So specifically with storage, what, how are you solving for this problem? Yeah, so what we're using today since we brought up storage, so we're using three clouds. Today. Mainly everything is hosted on Microsoft asure cloud, including storage, but we still use Google clouds for, you know, push unifications to our APPS and devices. That would be firebase, and we're using a lobbs for AI and for the develops agents. So for storage, we're using we have two types of storage, binary files and those will be stored essentially on azure storage as files, but then everything else is stored in our database. In that case we're using Microsoft Cosmos de via no sequel database. And you know, this just brings me to, you know, what other services were hosting our system on. So essentially, when we started. We didn't want to. We were really keen to launch and we built our system very simple, a number of web applications and Web Services and those web services were hosted on as your APP service. You know, we handle most of our requests synchronously, so you receive the request and you then you receive...

...the response and we leverage auto scaling and APP service. So when we have, you know, for example, certain number of requests coming that goes up or CPU or memory contention, we would have it on a actually roll out additional instances of observers and we were using a service premium. Interesting. So, mean we're here in a lot of customers wanting to embrace cubineis and move to containers and it sounds like should if you've taken a different direction. You know, our audience would be really interesting to hear short of why did you go that route, the reasons why you're not using cubin eddies. Yeah, know, that's that's a great question and we always thought about that. We always thought about when are we gonna, you know, more specifically run microservices instead of having US run those bigger monolithic services? How can we decouple our system so it's more scalable and you know, like I said, in the beginning or goal was to launch, and it was really easy for us to launch directly, you know, having those web services available. We only have three customers and we've gone from, you know, three customers to five hundred. And specifically, ironically, during corona, you know, in March, when things went went down, everybody was not going to school or work. We were obviously also impacted. But what helped us when we partner with a company in Hong Kong and we were able to customize the advice that will help customers go back to work in school. So we were able to do like temperature measurement with a tendance tracking and so on. So that actually created an explosive growth that we have in planned for. So we started getting problems and more specifically, because of your building systems on the cloud, transient problems are really common. You may have a database not available for the second that you need it. So that's when we realize, realize that's the time that we should think about maybe continerizing our system. So the way we built our services, you know, those core services that I talked about, as well as...

...the non essential services. Most of them are built in donet core, so they can run a Linux and windows. So if we decide to host them on a container, that shouldn't be a problem. However, so our goal was not to really use containers. Our goal was to break down the services, make them the coupled as much as possible so we can provide more scalability. And what we've decided to do is we used Microsoft as your functions and as your functions gives you that ability, gives you to have the ability to have microservices. But one of the you know, aggain this is for the audience. This is more similar to the Lunda architecture. So you can do this in other cloud platforms as well, but since we've been invested in Microsoft cloud, we were focused on using Microsoft as your function to implement microservices. But you know, the interesting thing that I think Microsoft has done very well that really benefited us, is they have, you know, a solution to one of the problems that you may face when you use containers or, you know, you build microservices, which is essentially you have to think of making your services stateless as much as possible. If they are stateful, they and you have to worry about storage and making sure that, you know, as you services being spun up and down, you're not losing your storage. So one of the things we looked at is using as your functions with durable entities, and this is something that they've released. Mike stuff released, I think about three or four years ago. So durable entities gives you, you know, few benefits. The most important benefit is they have the ability where you can create that entity, so that enity is essentially a type you can define. So any class, any you know, type you define your new your code can become a construct or. The objects defined out of that class can be serialized automatically for you. So that would be the state, and the platform, as your function, will take care of storing that state for you and it will rehydrate it for you when your service is up, so when...

...your function is running. So it takes care of having two things, breaking down your services into smaller pieces. But it also has the ability to make it really easy for a developer who doesn't know so a lot about micro services to just, you know, adding a few attributes in your c sharp code, you can essentially just enable some of your types to be those entities. That will be you know, stored for you and they will be permanently stored in as your store. So they're not really ephemeral. They are actually stored in actual storage that doesn't go away. Interesting. So it sounds like you can achieve the scalability you know that you're needing without necessarily needing to embrace cubin eddies. Yes, exactly, and so few examples that maybe make this clear. We had a lot of things that run on the side that don't require an immediate response to customers. Those are the things that we were able to break down bring them up as services. We've also used, obviously, Microsoft Service bus. So we used, you know, a an enterprise queue. So we switch from receiving services directly and doing responses directly to having an enterprise bus that will receive, you know, the requests from customers. In that case would be a tendance request or authentication requests, and then we have a number of workers those are as your functions, that will jump in and pick up those requests in process them. So we do a lot of things in processing, you know, today, other than just authentication. So authentication that would just come maybe in the middle of the process, after we've identified the person, we've verified that he's really who he claimed he is. You know, there are things that we you know, do things like, oh, let's scan the face in see if they're using a mask or not. In those are things that have been yeah, I mean it's like, you know, some of our customers called us and said, Hey, look, you have a custom device. We like it, but we we actually want to...

...know if the people who came into the office are you are wearing a mask. So adding those extra processing on top of the services that we do will definitely mean delays to customers. So with the ability to have, you know, everything broken down into queues and then having different types of workers jump in and do different types of processing, we were able to add those functions to come in and do, for example, use vision Apis to do analysis and see if there's a mask or not. So the flow would be if there is no mask, we would send an alert, which will be another q that we will admit to and that would send text messages and we used told you for text messages today and email and push notifications. So breaking down those external services, or not very immediate important services in attendance into functions, those smaller functions have, you know, help us with scale so we can increase them when we need when we have load, and then they are completely decoupled. If there is a failure reaching a database, then we retry until the the message is Dq and everything is good. So this way we can, you know, we can prevent you know, we have a durable, sustainable system, even in high load scenarios. One of the biggest problems we faced when we had, you know, bigger customers. So we work with a number of school districts. Those tend to have a lot of users, large number of users, and then that means a large number of devices. So some of them may have like up to eighty or ninety devices spread out physically two different locations. So it's very hard for an Admin to go walk to each device and see what's going on if there is a problem. So we created a device management and this really falls under kind of IET. So all of those devices are connected to the Internet, all of them are reporting back, you know, the attendance. But if one device is out of date, so we push firm more updates constantly through our apis because we do add additional functionality or we do patching and so on. So it's not possible to...

...go physically to each device and do that. So what we've done we had two things implemented. One is I'm online, so each device will ping our server every minute just to announce that it's online. And this way, if a device is offline for some reason because of power outage, somebody has misplaced the device or it's built, it's broken, you can see that on the dashboard. The problem with this is we have tons of requests coming in just to say I'm online, and that used to bombard our database, our services. So that was a good example of where we use the durable function. So we created an entity. That entity, each entity instance, represents the device and it would store the pings. So not only it sorts the last thing, like we used to do in in our database, now we store the last things for the last thirty days and all of that is stored outside the database. So because that information is not important. If we lose it, we don't care. In customer doesn't care. It's not part of their you know, their data. And in also we don't really do back up understore for this kind of storage. It's more of a temporary storage for us, but it's still, you know, stored permanently for the last third days. The good thing about this is we alleviated the load problem. And you know number two, because customers uses are use our devices, usually using Wi fi. It's very common that we receive a support call that says, Hey, this device is not sending data, I don't know what's going on with it, and we can easily look at the last thirty days pains and we tell them, Hey, look, we see that device was disconnected from this time to this time, so you may have a Wi fi problem and then they can resolve it easily. It was really hard for us to tell them, hey, it might be Wifi, it might be our service was down. So it's really hard to trouble shoot when we don't have data. So now having those entities, it made it easier for us to help customers find connectivity problems, for example. Right, that's that's really interesting. So when I'm thinking about that, you know, I'm thinking about perhaps, yes, some of these problems that might come up. Can you talk us through your...

...instrumentation and what sort of other tools are you using to monitor and manage your platform? Yeah, so instrumentation is key. I think it's one of the things we've done well in the system compared to other parts. Instrumentation is important for our system because we have a number of rest apis. We have an identity system and then we have an attendance tracking system and then we have three different APPS. Those are the personal apps that you can download on on the APP store that you can use as a user to see your attendance maybe do a tendance from your phone. And all we also have devices. So it's not on common that you start a request, let's say, in one device where you get your pictures canned or you use an rfid card, and then that goes to the system and gets recorded and then, you know, you get a notification on your phone that says you've been signed in. So because in operation can cross all these systems, you know, starting from your own device to maybe sometimes to our device, all the way through the services and sometimes going to a third party like tool you to send a message when we have a problem, it's very challenging to understand where the problem is. So, first off, what we're using mainly for instrumentation today? As a service, we're using Microsoft as your appensites. So appensites is the monitoring tool and instrumentation for applications that runs and Microsoft Azure and it does integrate very well with APP services, so with APPs hosted an APP service. It does have an ASDK that runs with different types of platform. So there's an Stik for Angler, there's an Stik for dotnet, there is one for PHP and so on, Java and so on. So it's very easy for us to plug it in and that would give you the basic so that would give you essentially logging for all web requests that you're receiving, so you can see when the request happened, what ipodistants coming from, what time it happened and so on.

So what we had to do on top of it? We had to use the APPENSITE'S API to add a correlation ID that we've added in every operation that we create. So this way we can provide this ability to trace any operation from device all the way to the APIS. So today, when we have a severity one exception, so basically an exception that's not handled that was raised by the system, our engineering team gets notified by email and we get that correlation ID that we can go and use. You know, appen sites, which has a queer language that allows you to queery that correlation ID and then it will show you all these different calls that was involved. You know, this is a podcast. If we can show things, I would show you that. It would show you a graph that. Basically they call it end to end trouble shooting. That's a future and appen sites where you can see that request started from this service which runs on the phone and then, if it's gone to your Api this time, and then it called all these dependencies. So it can call, for example, a database, which is caused most D be in that case, and it then it calls storage. So if failure happen in storage, that bubbled up to, you know, the call that went to the database. The bubbled up all the way to the attendance then all the way to the device. It appears to be a device problem, but if you know, you know, when you dig down using that correlation Id, you can see it's a storage problem. So that really helps us tremendously to be able to respond. You know, two problems in do trouble shooting faster. Interesting. So what are your thoughts on, say, some of these other application performance monitoring tools like your DNA traces or you know, I know observability is now what their observability platforms is. A lot of the language I'm hearing like, how does say APP insights compared to those tools and platforms? Yeah, no, dining trace. We it's great. I mean I think there are a number of platforms out there that they do very similar things. They do monitoring, they do instrumentation, I think all...

...of them they will give you the bare minimum for tracing, but you always find that you, depending on what you're building, you always have to embed instrumentation in your day to day development. So you always have to add extra things that would allow you to provide better traceability and then in augment what you get from the platform and make trouble shooting easier. You know, the reason we chose APP insites is because of cost. So you know, it is very cost effective, you know, compared to Dina Trace. I think for our size that made sense. We definitely look at other tools and we had requests from developers where they wanted to use other, easier tools. But because, like I said, it was sort of easy for us to integrate with our stack and then we had good implementation with it was working just fine. I mean the other thing you may think about is how can you maintain availability, because they Amin availability instrumentation. You know, they kind of work together. If you have an outage, then that might be caught, like unplanned outage. Obviously. Then that might be caused by a problem. Then you want to trouble shoot it using your instrumentation tools, monitoring and so on. So what we liked about appensites is does have that ability to so it has availability monitoring, so you can set it to pink your endpoints and give it, you know, certain rules that will check on and it does that every minute and if if you have an outage, it would call a phone number. It would also send an email and send a text. So combining these together in one platform was very important for us. Is Very user friendly. You can set it up from the asure portal and there's not really a huge cost for us to do to receive those alerts. Yeah, yeah, like it makes a lot of sense as you're just starting out to use some of those native services with Azure. So you know, great insights here. I think you know leaders at Highgros software companies are getting a lot out of this.

What other advice do you have? For leaders that we haven't touched on yet. I think if if you're building one of the things that we learned during the last couple of years, if your solution consists of hardware and software. You know what helped us a lot is we, you know, coming in as a software company originally, we always think from a software perspective and we try to even convert those hardware problems into software as much as possible. So a good example is, you know, we have this thermal unit, which it's a terminal. It's a ten point two inch screen that you can place in an organization that allows you to do a tendans tracking and in order to do testing for the firmware that runs on that particular device and then do releases for that particular device. It'd be really hard for us to use like, you know, have units with testers and then also have units with the support engineers and then use like manual methods to do testing as well as support. So what we've done is we created the software simulator that would simulate using the device, which basically helped us, obviously with testing, so you don't really have to have a device with you everywhere, especially when offices were closed. down. Everybody was working remote. It was really challenging to make those things available. So I mean my advice is, you know, try to solve things with software as much as possible. I think consuming, or maybe spending or investing time on test automation is also very key. So we were really we had a great test engineer that does, you know, the test automation packages for us and we were literally testing everything with code. I mean this is a side from unit test. So you know, obviously you're right. Your unit test is develop for that will help you with the divops. You know, makes the process of validating problems early quicker and then you can...

...do releases. But then that doesn't replace test automation. So just to give you a quick example, we win so phones are really, really advanced these days. They take very, very big, nice images and we one of the things we provide to early education schools is taking pictures and share them with parents. So sometimes they would take very large pictures and some of these platforms, you know, have the picture stored not in the same orientation that it was taken, so be flipped. So we have are one of our test automation scripts would be a will automate creating pictures on devices that are connected to a machine and those are the agents that we build. So one of one of the great things I like about Amazon web services they made the ability to create virtual Mac images. So this way I can I don't really have to buy all these MAC machines to build, you know, an IOS APP or do do tests for is APPs. I just basically go spin up a dedicated host in Amazon and that would have my test images. That runs Mac, that runs macos. So we would run, you know automation that Scans the images and just besides of the images flipped or not. So we were able to automate, you know, the testing for all of those scenarios, just using scripts. That really provides US price with the agility to be able to do quick fixes and releases. Yeah, yeah, I like that. Solve things with software wherever possible. So I mad. What's the vision for facecraft? We've heard about your story. We've heard about the last eighteen months and a little bit of the pivot to, you know, shaving needs for your customers around where the temperature checking with with covid watch. The plan for the next two or three years? Where do you see the company going. So the long term vision is to try to augment devices that existing in organizations and, you know, software to help them with those front door activities. You know, those would be signing in a visitor or doing attendance or maybe recognizing...

...customers, like in the commercial space. So if we look at the breakdown of our customers today, we have about thirty one percent our offices. So basically that could be, you know, a software office, it could be a company that does accounting or whatever, any office kind of job, and then twenty five percent our schools, and then we have a good portion that are factories and then thirteen percent our support facilities. So the market were trying to break into in help is basically commercial market, you know, essentially stores, retail stores, you know, cafes and restaurants. And what we think that can be very helpful is our product that we're working on we which we call a face graph in a box. So face graph in the box is essentially going to be a small computing unit that you can plug into your power in a company and it would have wi fi that can connect to your own premise network and it can be attached to the cameras on the premise. So what it does? It basically is a small processing units that will take whatever you allow it to take from information. Let's let's say you have cameras on premise, which a lot of companies, like retail stores, have cameras. So we'll take that camera feed and then it would analyze it for any important information like, for example, frames that has human faces. In that case, obviously that's what we care about, and then it will only send those frames to your back and, instance, on face graph. And then what that does to you? Is this going to tell you, hey, look, customer x Sam has walked into the office or walked into the retail store and he likes this drink. So it may pop up in their starbucks APP that Hey Sam, we see you here. Do you want to get your Latte? You know. So the way we think about this is it we think it's not going to work if we tell customers, Hey, look, install all these cameras, put our devices, that's going to be too much, too invasive. But if we just put it in a box, you...

...just plug it in, connects your existing hardware and then connect to your software stack, that would make more sense. That's really interesting me. We work with a number of sort of retailers and fast food companies that are, I think, on that track. You know, they're investing in somebody infrastruct sure in the technology right now, with that type of personalization as the vision. I remember, you know, five, ten years ago, in some of you know my marketing classes, we would speak to that as the vision. Right you walk into a store that someone already knows your name and they already know what you want or end or need. So it's that's really cool vision and I will certainly be following a journey awesome I'm at. It's being a absolute pleasure hearing about your journey and here in some of these insights. Thanks for joining us today. Hey, Russ, I appreciate the time and thank you for having me. Application modernization is sponsored by Red Hat, the world's leading provider of enterprise open source solutions, including high performing Linux, cloud, container and Coupernetti's technologies. Thanks for listening to application modernization, a podcast for high growth software companies. Don't forget to subscribe to the show on your favorite podcast player so you never miss an episode. And if you use apple podcasts, do us a favor and leave a quick rating by tapping the stars. Join US on the next episode to learn more about modernizing your infrastructure and applications for growth. Until next time,.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (23)