githubEdit

Subwork_Sichang

TODOs

Up till now

  • Clean up org.eu.fedcampus.train.Train's state w/ sealed class TrainState.

  • Feature in dyn_flower_android_drf to start training without previously saved parameters.

Up till 2023/7/3

Up till 2023/6/21

  • Studied Flower simulation in preparation for the benchmark platform.

  • Collaborated with Johnny on the dockerization of dyn_flower_android_drf.

  • Collect data fuzzed with LaplaceMechanism on FedCampus_APP in preparation for Federated Analytics.

  • Studied Android background process.

Up till 2023/6/14

  • Integrated org.eu.fedcampus.train into FedCampus_APP.

  • Silly simple_health_kit model that predicts 3 classes of distance from 2 input: steps and calories.

  • dyn_flower_android_drf tracks TrainingDataType and advertise models accordingly.

Up till 2023/6/8

  • Meet with Aicha to overview dyn_flower_android and the customizations it provides for her federated learning.

    • Walk Aicha through the demo and the logic of the process.

  • Meet with Johnny & Tianjun to specify deployment tasks.

  • Integrating training package into FedCampus_APP with Beilong.

  • Start to add telemetry to org.eu.fedcampus.train.

    • Togglable using an argument.

    • Sends client, session, and timing data to backend on training events.

  • Persuaded Beilong to write nice commit messages and format code.

Up till 2023/5/26

Up till 2023/5/18

  • Started dyn_flower_android_drfarrow-up-right

    • Django REST server serve TFLite model files and their information.

    • Android client ask for TFLite model files, download them onto disk, and start FlowerClient using the model downloaded.

      • Implemented to load model anywhere on disk instead of in assets/: org/tensorflow/lite/examples/transfer/api/ExternalModelLoader.kt.

    • Django server launch Flower server in background process.

      • Studied Selery, Huey, and APScheduler, and landed on using multiprocessing directly.

    • Tried saving model parameters in the background Flower process using a custom Strategy, but it does not work because django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet..

    • Spawn a new process for each server requested.

      • Spawn a thread to monitor it so that it could save the parameters to DB.

      • Restore old parameters from DB at Flower server start.

Up till 2023/4/21

  • flower uses protobuf to generate language-specific gRPC stub code.

  • In Android example, the TFLite model is hard-coded:

    • Data parsing functions have data semantics hard-coded and and run at data load.

    • No encoded way to transfer model (.tflite) files.

    • The TFLite API used (TransferLearningModel) is model-agnostic. Potential to easily swap models.

  • Consider an extra connection layer in front of flower.

    • flower has sophisticated gRPC configuration—hard to fiddle with.

    • Keeping flower contained and in one piece is beneficial for future compatibility.

    • Can use Django to spawn flower servers and reverse proxy to them.

    • Consider using Django to spawn flower servers on new ports, and tell clients to connect to them at the ports.

Up till 2023/04/10

  • Study how flower works.

    • The strategy handles most custom logic on server side.

    • Order of connection and training managed/hardcoded.

      • Start training after connection established.

      • No resume.

      • No sleep option.

chevron-rightStack of abstractions for Flower Android demo.hashtag

Server.

  • GrpcBridge in server/grpc_server/grpc_bridge.py, common/serde.

  • GrpcClientProxy(ClientProxy) in server/grpc_server/grpc_client_proxy.py.

  • fit_client in server/server.py.

  • Server in server/server.py.

  • run_server in server/app.py.

gRPC server.

  • FlowerServiceServicer(transport_pb2_grpc.FlowerServiceServicer) in server/grpc_server/flower_service_servicer.py.

  • start_grpc_server in server/grpc_server/grpc_server.py.

Android client gRPC client.

  • FlowerServiceGrpc defined in proto/transport.proto.

  • FlowerServiceRunnable.

  • runGrpc.

Up till 2023/04/04

chevron-rightLooked up mobile ML framework comparisons.hashtag
  • Tried out Flowerarrow-up-right

    • Fairly maintained and well documented.

    • Worked out of the box.

    • Uses TFLite for Android.

    • Uses gRPC.

chevron-rightTried out other mobile machine learning frameworks.hashtag
  • MNN

    Their Android demo is extremely old, buggy, and does not work. The setup is long but simple.

  • TensorFlow Lite

  • Pytorch Mobile

    • Android demo also very old, but does work.

    • Updating their package works.

    • No on-device training support found.

  • FedML

    • Workflow has to run on FedML servers.

    • Web GUI based.

    • Server-side implementation proprietary.

    • Client require full disk access.

    • Client only connects to FedML servers both via HTTPS and MQTT.

  • Set up two Nova 9 for development.

    • It is just easier to set the language to English.

    • Tap About phone > Build number 7 times to enter developer modearrow-up-right.

    • Developer options is in System & updates.

      • Need to turn USB debugging on.

        • Need to log in to HUAWEI ID.

Up till 2023/03/27

  • Studied FedML Android demoarrow-up-right.

    • Requires manually cloning MNN and configuring Cpp toolchain to build.

    • Uses JNI to call Cpp function from Java.

    • FedML wants you to use their platform as a service, register an account, connect your app to them and manage online.

    • Android SDK API documents are basically nonexistent.

  • Read FedML Android SDK codebase because no documentation.

chevron-rightThe stack of abstraction layers for training from bottom to top.Found the bottom by searching for System.loadLibrary.hashtag
  • ai/fedml/edge/nativemobilenn/NativeFedMLClientManager.java is the binding for MNN, the deep learning library in Cpp.

  • ai/fedml/edge/service/TrainingExecutor.java is the higher level API for training.

  • ai/fedml/edge/service/ClientManager.java handles both MQTT communication and training. Still has TODO comments in it.

  • ai/fedml/edge/service/ClientAgentManager.java provides one "documented" method.

  • ai/fedml/edge/service/FedEdgeTrainImpl.java

  • ai/fedml/edge/service/EdgeService.java is made into a Service.

  • ai/fedml/edge/FedEdgeImpl.java runs the service using an Intent.

  • ai/fedml/edge/FedEdgeManager.java

    This is the top APIs in FedML Android SDK, it supports core training engine and related control commands on your Android devices.

  • Investigated FedML platform lock-in and cloud lock-in.

    Unfortunately, FedML Android SDK forces the users to use open.fedml.ai as a proxy for all MQTT traffic.

    • Forced MNN on Android

chevron-rightInvestigated using the same Machine Learning model on different platforms.hashtag

Open Neural Network Exchange (ONNX)arrow-up-right supports major Machine Learning libraries.

Up till 2023/03/12

  • Tried Retrofit and made blocking GET request not in strict mode.

chevron-rightDependency settings.hashtag
chevron-rightUsing RxJava for non-blocking IO.hashtag
chevron-rightJSON structures according to Jiaqi.hashtag
  • POST

  • GET

chevron-rightTemporary solutions to enable local testing on Android emulatorhashtag

Up till 2023/03/09

Up till 2023/03/05

  • Read FedML background materials.

  • Ran FedML demo Python simulation.

  • Tried Android Studio, Kotlin, JetPack Compose.

  • Sketched tech stack plan for the Android platform.

chevron-rightJava + FedML Java API, Django + FedML Python API + PostgreSQL, HTTPS pollhashtag

Jiaqi asked me for a formal tech stack plan for the Android platform, here is my current sketch:

Android client app: Single Java app shipped to the user.

  • Data gathering: UI, user data collection and handling, and HTTPS client in Java.

  • ML: Call FedML's Java API for local training.

Server: Single modular Python server with single database.

  • Python as language of choice to best support ML exploration.

  • ML module:

    • Call FedML's Python API from the server for aggregation.

  • Web module: gather and store data.

    • Django for HTTPS server and database interface (ORM).

    • PostgreSQL for database.

HTTPS does not support broadcasting, and we cannot assume that the clients would always be on. So, I assume that the clients will poll the server for new information ever so often. We only need to implement a REST API or something equivalent for the communication.

  • Looked for full-duplex communication protocol as demanded by Jiaqi.

chevron-rightNeed an external push service.hashtag

For Push APIarrow-up-right, I only found instruction to make push messagesarrow-up-right which is for web apps. Unofficial instructions to make push messages to Android exist on Intercom Developersarrow-up-right and Iterablearrow-up-right, both of which use Firebase for the push service.

My conclusion is that we should consider these after we have a working poll model because they involve external services.

Last updated