Subwork_Sichang

TODOs

Up till now

  • Clean up org.eu.fedcampus.train.Train's state w/ sealed class TrainState.

  • Feature in dyn_flower_android_drf to start training without previously saved parameters.

Up till 2023/7/3

Up till 2023/6/21

  • Studied Flower simulation in preparation for the benchmark platform.

  • Collaborated with Johnny on the dockerization of dyn_flower_android_drf.

  • Collect data fuzzed with LaplaceMechanism on FedCampus_APP in preparation for Federated Analytics.

  • Studied Android background process.

Up till 2023/6/14

  • Integrated org.eu.fedcampus.train into FedCampus_APP.

  • Silly simple_health_kit model that predicts 3 classes of distance from 2 input: steps and calories.

  • dyn_flower_android_drf tracks TrainingDataType and advertise models accordingly.

Up till 2023/6/8

  • Meet with Aicha to overview dyn_flower_android and the customizations it provides for her federated learning.

    • Walk Aicha through the demo and the logic of the process.

  • Meet with Johnny & Tianjun to specify deployment tasks.

  • Integrating training package into FedCampus_APP with Beilong.

  • Start to add telemetry to org.eu.fedcampus.train.

    • Togglable using an argument.

    • Sends client, session, and timing data to backend on training events.

  • Persuaded Beilong to write nice commit messages and format code.

Up till 2023/5/26

  • dyn_flower_android client side:

    • Convert flwr.android_client from Java to Kotlin and use suspend functions.

    • Extract training functionality from MainActivity into org.eu.fedcampus.train to work towards a standalone library.

    • Upgraded org.tensorflow:tensorflow-lite dependency.

    • DB using Room.

      • Check if the models are already downloaded to prevent redownloading.

      • Restore user input on launch.

  • FedCampus_APP:

Up till 2023/5/18

  • Started dyn_flower_android_drf

    • Django REST server serve TFLite model files and their information.

    • Android client ask for TFLite model files, download them onto disk, and start FlowerClient using the model downloaded.

      • Implemented to load model anywhere on disk instead of in assets/: org/tensorflow/lite/examples/transfer/api/ExternalModelLoader.kt.

    • Django server launch Flower server in background process.

      • Studied Selery, Huey, and APScheduler, and landed on using multiprocessing directly.

    • Tried saving model parameters in the background Flower process using a custom Strategy, but it does not work because django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet..

    • Spawn a new process for each server requested.

      • Spawn a thread to monitor it so that it could save the parameters to DB.

      • Restore old parameters from DB at Flower server start.

Up till 2023/4/21

  • flower uses protobuf to generate language-specific gRPC stub code.

  • In Android example, the TFLite model is hard-coded:

    • Data parsing functions have data semantics hard-coded and and run at data load.

    • No encoded way to transfer model (.tflite) files.

    • The TFLite API used (TransferLearningModel) is model-agnostic. Potential to easily swap models.

  • Consider an extra connection layer in front of flower.

    • flower has sophisticated gRPC configuration—hard to fiddle with.

    • Keeping flower contained and in one piece is beneficial for future compatibility.

    • Can use Django to spawn flower servers and reverse proxy to them.

    • Consider using Django to spawn flower servers on new ports, and tell clients to connect to them at the ports.

Up till 2023/04/10

  • Study how flower works.

    • The strategy handles most custom logic on server side.

    • Order of connection and training managed/hardcoded.

      • Start training after connection established.

      • No resume.

      • No sleep option.

Stack of abstractions for Flower Android demo.

Server.

  • GrpcBridge in server/grpc_server/grpc_bridge.py, common/serde.

  • GrpcClientProxy(ClientProxy) in server/grpc_server/grpc_client_proxy.py.

  • fit_client in server/server.py.

  • Server in server/server.py.

  • run_server in server/app.py.

gRPC server.

  • FlowerServiceServicer(transport_pb2_grpc.FlowerServiceServicer) in server/grpc_server/flower_service_servicer.py.

  • start_grpc_server in server/grpc_server/grpc_server.py.

Android client gRPC client.

  • FlowerServiceGrpc defined in proto/transport.proto.

  • FlowerServiceRunnable.

  • runGrpc.

Up till 2023/04/04

Looked up mobile ML framework comparisons.
  • Tried out Flower

    • Fairly maintained and well documented.

    • Worked out of the box.

    • Uses TFLite for Android.

    • Uses gRPC.

Tried out other mobile machine learning frameworks.
  • MNN

    Their Android demo is extremely old, buggy, and does not work. The setup is long but simple.

  • TensorFlow Lite

    • Outdated tutorial.

      • The basics have not changed, though.

    • Up-to-date example.

      • Updated last week.

      • Builds and runs out of the box.

      • Uses Kotlin.

      • Lots of ceremonies doing simple things (1000 lines of Kotlin).

      • Training code is simple (at TransferLearningHelper.kt:training)

  • Pytorch Mobile

    • Android demo also very old, but does work.

    • Updating their package works.

    • No on-device training support found.

  • FedML

    • Workflow has to run on FedML servers.

    • Web GUI based.

    • Server-side implementation proprietary.

    • Client require full disk access.

    • Client only connects to FedML servers both via HTTPS and MQTT.

  • Set up two Nova 9 for development.

    • It is just easier to set the language to English.

    • Tap About phone > Build number 7 times to enter developer mode.

    • Developer options is in System & updates.

      • Need to turn USB debugging on.

        • Need to log in to HUAWEI ID.

Up till 2023/03/27

  • Studied FedML Android demo.

    • Requires manually cloning MNN and configuring Cpp toolchain to build.

    • Uses JNI to call Cpp function from Java.

    • FedML wants you to use their platform as a service, register an account, connect your app to them and manage online.

    • Android SDK API documents are basically nonexistent.

  • Read FedML Android SDK codebase because no documentation.

The stack of abstraction layers for training from bottom to top.Found the bottom by searching for System.loadLibrary.
  • ai/fedml/edge/nativemobilenn/NativeFedMLClientManager.java is the binding for MNN, the deep learning library in Cpp.

  • ai/fedml/edge/service/TrainingExecutor.java is the higher level API for training.

  • ai/fedml/edge/service/ClientManager.java handles both MQTT communication and training. Still has TODO comments in it.

  • ai/fedml/edge/service/ClientAgentManager.java provides one "documented" method.

  • ai/fedml/edge/service/FedEdgeTrainImpl.java

  • ai/fedml/edge/service/EdgeService.java is made into a Service.

  • ai/fedml/edge/FedEdgeImpl.java runs the service using an Intent.

  • ai/fedml/edge/FedEdgeManager.java

    This is the top APIs in FedML Android SDK, it supports core training engine and related control commands on your Android devices.

  • Investigated FedML platform lock-in and cloud lock-in.

    Unfortunately, FedML Android SDK forces the users to use open.fedml.ai as a proxy for all MQTT traffic.

    • Forced MNN on Android

Investigated using the same Machine Learning model on different platforms.

Open Neural Network Exchange (ONNX) supports major Machine Learning libraries.

Up till 2023/03/12

  • Tried Retrofit and made blocking GET request not in strict mode.

Dependency settings.
<!-- AndroidManifest.xml -->
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
// build.gradle
implementation 'com.squareup.retrofit2:retrofit:2.9.0'
implementation 'com.squareup.retrofit2:converter-gson:2.9.0'
implementation 'com.google.code.gson:gson:2.10.1'
Using RxJava for non-blocking IO.
// build.gradle
implementation 'io.reactivex.rxjava3:rxandroid:3.0.2'
implementation 'io.reactivex.rxjava3:rxjava:3.1.5'
Flowable.fromCallable(someIoTaskFunction)
    .subscribeOn(Schedulers.io())
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe(
        // What to do on the main thread after `someIoTaskFunction` returns.
        functionOnSuccess, functionOnFailure));
  • Tried supporting Kotlin in existing Java app and calling Kotlin from Java. Commit. Answer on StackOverflow. Very minimal changes and easy.

  • Set up test repository AndroidClient_django_server_POC.

    • Android client was able to make GET request to the server.

  • Set up fake JSON API. Made requests successfully both in Java test and on Android emulator.

JSON structures according to Jiaqi.
  • POST

    {
        "device_id": 0,
        "send_time": 104224314.342,
        "local_loss": 0.452,
        "local_weights": [0, 24, 5],
        "training_duration": 34.542
    }
  • GET

    {
        "configuration": {
            "learning_rate": 0.1
        },
        "send_time": 104224314.342,
        "global_weights": [34, 65, 7]
    }
Temporary solutions to enable local testing on Android emulator
  • Allow HTTP requests.

    <!-- AndroidManifest.xml -->
    <application android:usesCleartextTraffic="true" …>
    
    </application>
  • Use localhost on emulators according to Network address space.

  • Allow localhost on Django server.

    # django_server/settings.py
    ALLOWED_HOSTS = ["10.0.2.2"]

Up till 2023/03/09

  • Looked for Android HTTPS client resources.

  • Checked out the previous Android app FedC.

    • The app architecture is similar in Java, although the UI uses XML.

    • org.eclipse.paho.client.mqttv3 handles MQTT.

    • The connection blocks the main thread, causing UI lag.

  • Nuked Kotlin and related work out of the plan.

  • Looked for Android developers tutorial for Java instead.

    • Hard to find because most are in Kotlin.

Up till 2023/03/05

  • Read FedML background materials.

  • Ran FedML demo Python simulation.

  • Tried Android Studio, Kotlin, JetPack Compose.

  • Sketched tech stack plan for the Android platform.

Java + FedML Java API, Django + FedML Python API + PostgreSQL, HTTPS poll

Jiaqi asked me for a formal tech stack plan for the Android platform, here is my current sketch:

Android client app: Single Java app shipped to the user.

  • Data gathering: UI, user data collection and handling, and HTTPS client in Java.

  • ML: Call FedML's Java API for local training.

Server: Single modular Python server with single database.

  • Python as language of choice to best support ML exploration.

  • ML module:

    • Call FedML's Python API from the server for aggregation.

  • Web module: gather and store data.

    • Django for HTTPS server and database interface (ORM).

    • PostgreSQL for database.

HTTPS does not support broadcasting, and we cannot assume that the clients would always be on. So, I assume that the clients will poll the server for new information ever so often. We only need to implement a REST API or something equivalent for the communication.

graph TD;
    K(Java Android app)-->|directly call|J(FedML Java API);
    K-->|poll|S(Django Server)
    S-->|respond|K
    S-->|communicate|D(PostgreSQL)
    S-->|use API|M(Python ML module)
    M-->|call|P(FedML Python API)
  • Looked for full-duplex communication protocol as demanded by Jiaqi.

Need an external push service.

For Push API, I only found instruction to make push messages which is for web apps. Unofficial instructions to make push messages to Android exist on Intercom Developers and Iterable, both of which use Firebase for the push service.

My conclusion is that we should consider these after we have a working poll model because they involve external services.

Last updated