Subwork_Sichang

TODOs

Up till now

Clean up org.eu.fedcampus.train.Train's state w/ sealed class TrainState.
Feature in dyn_flower_android_drf to start training without previously saved parameters.

Up till 2023/7/3

Converted Beilong's TFLite on-device regression model training demo and integrated it into dyn_flower_android_drf.
Grand update to dyn_flower_android_drf to use the latest TFLite on-device training following their example. BREAKING! Replace transfer_api with latest TFLite usage.
Updated FedCampus_APP to be synchronized with dyn_flower_android_drf. Match latest dyn_flower_android_drf.

Up till 2023/6/21

Studied Flower simulation in preparation for the benchmark platform.
Collaborated with Johnny on the dockerization of dyn_flower_android_drf.
Collect data fuzzed with LaplaceMechanism on FedCampus_APP in preparation for Federated Analytics.
Studied Android background process.

Up till 2023/6/14

Integrated org.eu.fedcampus.train into FedCampus_APP.
Silly simple_health_kit model that predicts 3 classes of distance from 2 input: steps and calories.
dyn_flower_android_drf tracks TrainingDataType and advertise models accordingly.

Up till 2023/6/8

Meet with Aicha to overview dyn_flower_android and the customizations it provides for her federated learning.
- Walk Aicha through the demo and the logic of the process.
Meet with Johnny & Tianjun to specify deployment tasks.
Integrating training package into FedCampus_APP with Beilong.
- Created MonoRepo to group FedCampus_APP and dyn_flower_android_drf together.
- Imported org.eu.fedcampus.train from dyn_flower_android_drf into FedCampus_APP.
- Tested tflite_convertor/convert_to_tflite.py to verify that it is still able to generate the 5 .tflite files.
  - Failed to make the script run on my ARM Mac after trying multiple Python and Tensorflow versions.
  - Opened an issue on the Flower repo to address the breakage.
  - Got it running on Beilong's x86_64 Windows machine.
Start to add telemetry to org.eu.fedcampus.train.
- Togglable using an argument.
- Sends client, session, and timing data to backend on training events.
Persuaded Beilong to write nice commit messages and format code.

Up till 2023/5/26

dyn_flower_android client side:
- Convert flwr.android_client from Java to Kotlin and use suspend functions.
- Extract training functionality from MainActivity into org.eu.fedcampus.train to work towards a standalone library.
- Upgraded org.tensorflow:tensorflow-lite dependency.
- DB using Room.
  - Check if the models are already downloaded to prevent redownloading.
  - Restore user input on launch.
    Needed Room migration.
    Also needed to store schema.
FedCampus_APP:
- Discuss the need for better commit messages.
- Discuss Kotlin support.
  Currently, Beilong is writing this App in Java because he is more productive in Java. I suggest that we want the null safety from Kotlin and we will probably need to migrate to Kotlin, at least partially, when we want to use suspend functions.

Up till 2023/5/18

Started dyn_flower_android_drf
- Django REST server serve TFLite model files and their information.
- Android client ask for TFLite model files, download them onto disk, and start FlowerClient using the model downloaded.
  - Implemented to load model anywhere on disk instead of in assets/: org/tensorflow/lite/examples/transfer/api/ExternalModelLoader.kt.
- Django server launch Flower server in background process.
  - Studied Selery, Huey, and APScheduler, and landed on using multiprocessing directly.
- Tried saving model parameters in the background Flower process using a custom Strategy, but it does not work because django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet..
- Spawn a new process for each server requested.
  - Spawn a thread to monitor it so that it could save the parameters to DB.
  - Restore old parameters from DB at Flower server start.

Up till 2023/4/21

flower uses protobuf to generate language-specific gRPC stub code.
In Android example, the TFLite model is hard-coded:
- Data parsing functions have data semantics hard-coded and and run at data load.
- No encoded way to transfer model (.tflite) files.
- The TFLite API used (TransferLearningModel) is model-agnostic. Potential to easily swap models.
Consider an extra connection layer in front of flower.
- flower has sophisticated gRPC configuration—hard to fiddle with.
- Keeping flower contained and in one piece is beneficial for future compatibility.
- Can use Django to spawn flower servers and reverse proxy to them.
- Consider using Django to spawn flower servers on new ports, and tell clients to connect to them at the ports.

Up till 2023/04/10

Study how flower works.
- The strategy handles most custom logic on server side.
- Order of connection and training managed/hardcoded.
  - Start training after connection established.
  - No resume.
  - No sleep option.

Stack of abstractions for Flower Android demo.

Server.

GrpcBridge in server/grpc_server/grpc_bridge.py, common/serde.
GrpcClientProxy(ClientProxy) in server/grpc_server/grpc_client_proxy.py.
fit_client in server/server.py.
Server in server/server.py.
run_server in server/app.py.

gRPC server.

FlowerServiceServicer(transport_pb2_grpc.FlowerServiceServicer) in server/grpc_server/flower_service_servicer.py.
start_grpc_server in server/grpc_server/grpc_server.py.

Android client gRPC client.

FlowerServiceGrpc defined in proto/transport.proto.
FlowerServiceRunnable.
runGrpc.

Up till 2023/04/04

Repo: Android MQTT Django POC
Sketch POC of the whole structure without on-device training.
- MQTT client on Django connecting to Johnny's deployed broker following How to Use MQTT in The Django Project.
- MQTT client on Android following example in eclipse/paho.mqtt.java.
  - Subscribe to topic.
  - Publish content in EditText input box.

Looked up mobile ML framework comparisons.

Tensorflow Lite was benchmarked to be a few times faster than PyTorch Mobile. Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices.
A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices:
- No clear winner on neither CPU nor GPU. ncnn is generally fastest.
- PyTorch Mobile did not have GPU support.
On-Device Deep Learning: PyTorch Mobile and TensorFlow Lite:
- PyTorch Mobile and PyTorch share codebase, no conversion problem.
- TFLite and Tensorflow are difference codebase, use care when choosing operators for models.

Tried out Flower
- Fairly maintained and well documented.
- Worked out of the box.
- Uses TFLite for Android.
- Uses gRPC.

Tried out other mobile machine learning frameworks.

MNN
Their Android demo is extremely old, buggy, and does not work. The setup is long but simple.
TensorFlow Lite
- Outdated tutorial.
  - The basics have not changed, though.
- Up-to-date example.
  - Updated last week.
  - Builds and runs out of the box.
  - Uses Kotlin.
  - Lots of ceremonies doing simple things (1000 lines of Kotlin).
  - Training code is simple (at TransferLearningHelper.kt:training)
Pytorch Mobile
- Android demo also very old, but does work.
- Updating their package works.
- No on-device training support found.
FedML
- Workflow has to run on FedML servers.
- Web GUI based.
- Server-side implementation proprietary.
- Client require full disk access.
- Client only connects to FedML servers both via HTTPS and MQTT.

Set up two Nova 9 for development.
- It is just easier to set the language to English.
- Tap About phone > Build number 7 times to enter developer mode.
- Developer options is in System & updates.
  - Need to turn USB debugging on.
    Need to log in to HUAWEI ID.

Up till 2023/03/27

Studied FedML Android demo.
- Requires manually cloning MNN and configuring Cpp toolchain to build.
- Uses JNI to call Cpp function from Java.
- FedML wants you to use their platform as a service, register an account, connect your app to them and manage online.
- Android SDK API documents are basically nonexistent.
Read FedML Android SDK codebase because no documentation.

The stack of abstraction layers for training from bottom to top.Found the bottom by searching for System.loadLibrary.

ai/fedml/edge/nativemobilenn/NativeFedMLClientManager.java is the binding for MNN, the deep learning library in Cpp.
ai/fedml/edge/service/TrainingExecutor.java is the higher level API for training.
ai/fedml/edge/service/ClientManager.java handles both MQTT communication and training. Still has TODO comments in it.
ai/fedml/edge/service/ClientAgentManager.java provides one "documented" method.
ai/fedml/edge/service/FedEdgeTrainImpl.java
ai/fedml/edge/service/EdgeService.java is made into a Service.
ai/fedml/edge/FedEdgeImpl.java runs the service using an Intent.
ai/fedml/edge/FedEdgeManager.java
This is the top APIs in FedML Android SDK, it supports core training engine and related control commands on your Android devices.

Investigated FedML platform lock-in and cloud lock-in.
Unfortunately, FedML Android SDK forces the users to use open.fedml.ai as a proxy for all MQTT traffic.
- Forced MNN on Android

Investigated using the same Machine Learning model on different platforms.

Open Neural Network Exchange (ONNX) supports major Machine Learning libraries.

Deploy ONNX
Their runtimes
Deploying Scikit-Learn Models In Android Apps With ONNX
- The resulting models are small, and the process easy.
- The parameters are hidden inside the model.
- The training is done on Python, only inference is done on Android.
- Gather the common statistics from the ONNX models
- How do you find the quantization parameter inside of the ONNX model resulted in converting already quantized tflite model to ONNX?
onnxruntime cannot train on mobile currently 😢: ONNX Runtime Mobile Training (Android/iOS). We would have to use a mobile framework anyway.

Up till 2023/03/12

Tried Retrofit and made blocking GET request not in strict mode.

Dependency settings.

<!-- AndroidManifest.xml -->
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />

// build.gradle
implementation 'com.squareup.retrofit2:retrofit:2.9.0'
implementation 'com.squareup.retrofit2:converter-gson:2.9.0'
implementation 'com.google.code.gson:gson:2.10.1'

Tried Retrofit with RxAndroid and RxJava for non-blocking requests.

Using RxJava for non-blocking IO.

// build.gradle
implementation 'io.reactivex.rxjava3:rxandroid:3.0.2'
implementation 'io.reactivex.rxjava3:rxjava:3.1.5'

Flowable.fromCallable(someIoTaskFunction)
    .subscribeOn(Schedulers.io())
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe(
        // What to do on the main thread after `someIoTaskFunction` returns.
        functionOnSuccess, functionOnFailure));

Tried supporting Kotlin in existing Java app and calling Kotlin from Java. Commit. Answer on StackOverflow. Very minimal changes and easy.
Set up test repository AndroidClient_django_server_POC.
- Android client was able to make GET request to the server.
Set up fake JSON API. Made requests successfully both in Java test and on Android emulator.

JSON structures according to Jiaqi.

POST

{
    "device_id": 0,
    "send_time": 104224314.342,
    "local_loss": 0.452,
    "local_weights": [0, 24, 5],
    "training_duration": 34.542
}

GET

{
    "configuration": {
        "learning_rate": 0.1
    },
    "send_time": 104224314.342,
    "global_weights": [34, 65, 7]
}

Temporary solutions to enable local testing on Android emulator

Allow HTTP requests.

<!-- AndroidManifest.xml -->
<application android:usesCleartextTraffic="true" …>
        …
</application>

Use localhost on emulators according to Network address space.

Allow localhost on Django server.

# django_server/settings.py
ALLOWED_HOSTS = ["10.0.2.2"]

Up till 2023/03/09

Looked for Android HTTPS client resources.
- Perform network operations overview from Android developers.
- ~~Ktor~~ ~~the Kotlin HTTPS client/server library.~~
- Android HTTPS client Retrofit.
- android/connectivity-samples, a Git repository of code samples.
Checked out the previous Android app FedC.
- The app architecture is similar in Java, although the UI uses XML.
- org.eclipse.paho.client.mqttv3 handles MQTT.
- The connection blocks the main thread, causing UI lag.
Nuked Kotlin and related work out of the plan.
Looked for Android developers tutorial for Java instead.
- Hard to find because most are in Kotlin.

Up till 2023/03/05

Read FedML background materials.
Ran FedML demo Python simulation.
Tried Android Studio, Kotlin, JetPack Compose.
Sketched tech stack plan for the Android platform.

Java + FedML Java API, Django + FedML Python API + PostgreSQL, HTTPS poll

Jiaqi asked me for a formal tech stack plan for the Android platform, here is my current sketch:

Android client app: Single Java app shipped to the user.

Data gathering: UI, user data collection and handling, and HTTPS client in Java.
ML: Call FedML's Java API for local training.

Server: Single modular Python server with single database.

Python as language of choice to best support ML exploration.
ML module:
- Call FedML's Python API from the server for aggregation.
Web module: gather and store data.
- Django for HTTPS server and database interface (ORM).
- PostgreSQL for database.

HTTPS does not support broadcasting, and we cannot assume that the clients would always be on. So, I assume that the clients will poll the server for new information ever so often. We only need to implement a REST API or something equivalent for the communication.

graph TD;
    K(Java Android app)-->|directly call|J(FedML Java API);
    K-->|poll|S(Django Server)
    S-->|respond|K
    S-->|communicate|D(PostgreSQL)
    S-->|use API|M(Python ML module)
    M-->|call|P(FedML Python API)

Looked for full-duplex communication protocol as demanded by Jiaqi.

Need an external push service.

For Push API, I only found instruction to make push messages which is for web apps. Unofficial instructions to make push messages to Android exist on Intercom Developers and Iterable, both of which use Firebase for the push service.

My conclusion is that we should consider these after we have a working poll model because they involve external services.

PreviousAndroid Platform NextJune Evaluation

Last updated 1 year ago