Skip to main content

2 posts tagged with "python"

View All Tags

Magic Wormhole Source Code Analysis

· 8 min read
Huakun Shen
Website Owner

Clients

  1. Python: https://github.com/magic-wormhole/magic-wormhole (official, original)
  2. Rust: https://github.com/magic-wormhole/magic-wormhole.rs (official)
  3. Golang: https://github.com/psanford/wormhole-william.git (non-official)
  4. Golang + Fyne GUI Client: https://github.com/Jacalz/rymdport
  5. Rust + Tauri GUI Client: https://github.com/HuakunShen/wormhole-gui

Documentation

Performance

magic-wormhole can almost always eat the full bandwidth of the network. It's very fast. However, I have observed performance issue on Mac (M1 pro) during sending (not receiving).

See update on issue https://github.com/magic-wormhole/magic-wormhole.rs/issues/224

Sender ComputerSender ClientReceiver ComputerReceiver ClientSpeed
M1 pro MacpythonUbuntu i7 13700Kpython112MB/s
M1 pro MacrustUbuntu i7 13700Kpython73MB/s
M1 pro MacgolangUbuntu i7 13700Kpython117MB/s
Ubuntu i7 13700KpythonM1 pro Macpython115MB/s
Ubuntu i7 13700KrustM1 pro Macpython116MB/s
Ubuntu i7 13700KgolangM1 pro Macpython117MB/s
Ubuntu i7 13700KpythonKali VM (on Mac)python119MB/s
Kali VM (on Mac)pythonUbuntu i7 13700Kpython30MB/s
Ubuntu i7 11800HrustUbuntu i7 13700Kpython116MB/s
Ubuntu i7 13700KrustUbuntu i7 11800Hpython116MB/s

It seems like there is some performance issue with the rust implementation on the sender side.

Workflow

I read the client source code written in Python, Golang and Rust. The Python code is unreadable to me. Some packages like automat and twisted are used. I am not familiar with them and they make the code hard to read or follow. It even took me ~20-30 minutes to find the main function and get the debugger running. The code is not well-organized. It's hard to follow the workflow.

Rust is known for its complexity. It's async and await makes debugger jump everywhere. Variables allocated in heap are hard to track with debugger. Usually only a pointer address is shown.

The Golang version (although non-official) is the easiest to follow. Project structure is clear and simple. Goland's debugger works well. So let's follow the Golang version.

  • After command arguments parsing, everything starts here sendFile(args[0])

  • A wormhole.Client is created c := newClient()

  • The code is retrieved from code, status, err := c.SendFile(ctx, filepath.Base(filename), f, args...)

    • status is a channel (var status chan wormhole.SendResult) that waits for the result of sending file.
    •     s := <-status

      if s.OK {
      fmt.Println("file sent")
      } else {
      bail("Send error: %s", s.Error)
      }

  • Here is Wormhole Client's SendFile() method

    •     func (c *Client) SendFile(ctx context.Context, fileName string, r io.ReadSeeker, opts ...SendOption) (string, chan SendResult, error) {
      if err := c.validateRelayAddr(); err != nil {
      return "", nil, fmt.Errorf("invalid TransitRelayAddress: %s", err)
      }

      size, err := readSeekerSize(r)
      if err != nil {
      return "", nil, err
      }

      offer := &offerMsg{
      File: &offerFile{
      FileName: fileName,
      FileSize: size,
      },
      }

      return c.sendFileDirectory(ctx, offer, r, opts...)
      }
    • offer contains the file name and size.

  • Let's go into sendFileDirectory() method here. Everything happens here.

    • sideId: RandSideID returns a string appropate for use as the Side ID for a client.

      NewClient returns a Rendezvous client. URL is the websocket url of Rendezvous server. SideID is the id for the client to use to distinguish messages in a mailbox from the other client. AppID is the application identity string of the client.

      Two clients can only communicate if they have the same AppID.

      sideID := crypto.RandSideID()
      appID := c.appID()
      rc := rendezvous.NewClient(c.url(), sideID, appID)

      _, err := rc.Connect(ctx)
    • Then a nameplate is generated

      If users provides the code, the mailbox is attached to the code. Otherwise, a new mailbox is created. A mailbox is a channel for communication between two clients. The sender creates a mailbox and sends the code (address of mailbox + key) to the receiver. The receiver uses the code to open the mailbox.

      if options.code == "" {
      // CreateMailbox allocates a nameplate, claims it, and then opens the associated mailbox. It returns the nameplate id string.
      // nameplate is a number string. e.g. 10
      nameplate, err := rc.CreateMailbox(ctx)
      if err != nil {
      return "", nil, err
      }

      // ChooseWords returns 2 words from the wordlist. (e.g. "correct-horse")
      pwStr = nameplate + "-" + wordlist.ChooseWords(c.wordCount())
      } else {
      pwStr = options.code
      nameplate, err := nameplateFromCode(pwStr)
      if err != nil {
      return "", nil, err
      }

      // AttachMailbox opens an existing mailbox and releases the associated nameplate.
      err = rc.AttachMailbox(ctx, nameplate)
      if err != nil {
      return "", nil, err
      }
      }
    • Then a clientProto is created

        clientProto := newClientProtocol(ctx, rc, sideID, appID)

      appID is a constant string. sideID is a random string.

      sideID := crypto.RandSideID() RandSideID returns a string appropate for use as the Side ID for a client.

      Let's see how newClientProtocol works.

        type clientProtocol struct {
      sharedKey []byte
      phaseCounter int
      ch <-chan rendezvous.MailboxEvent
      rc *rendezvous.Client
      spake *gospake2.SPAKE2
      sideID string
      appID string
      }

      func newClientProtocol(ctx context.Context, rc *rendezvous.Client, sideID, appID string) *clientProtocol {
      recvChan := rc.MsgChan(ctx)

      return &clientProtocol{
      ch: recvChan,
      rc: rc,
      sideID: sideID,
      appID: appID,
      }
      }
    • Then enter a go routing (transfer happens here)

      • clinetProto.ReadPake(ctx): block and waiting for receiver to connect (wait for receiver to enter the code)

        ReadPake calls readPlainText to read the event from the mailbox.

        func (cc *clientProtocol) readPlaintext(ctx context.Context, phase string, v interface{}) error {
        var gotMsg rendezvous.MailboxEvent
        select {
        case gotMsg = <-cc.ch:
        case <-ctx.Done():
        return ctx.Err()
        }
        if gotMsg.Error != nil {
        return gotMsg.Error
        }

        if gotMsg.Phase != phase {
        return fmt.Errorf("got unexpected phase while waiting for %s: %s", phase, gotMsg.Phase)
        }

        err := jsonHexUnmarshal(gotMsg.Body, &v)
        if err != nil {
        return err
        }

        return nil
        }

        func (cc *clientProtocol) ReadPake(ctx context.Context) error {
        var pake pakeMsg
        err := cc.readPlaintext(ctx, "pake", &pake)
        if err != nil {
        return err
        }

        otherSidesMsg, err := hex.DecodeString(pake.Body)
        if err != nil {
        return err
        }

        sharedKey, err := cc.spake.Finish(otherSidesMsg)
        if err != nil {
        return err
        }

        cc.sharedKey = sharedKey

        return nil
        }

        pake's body is a string of length 66. otherSidesMsg is []uint8 bytes of length 33.

        Then sharedKey is generated by calling cc.spake.Finish(otherSidesMsg). spake is a SPAKE2 object.

        sharedKey is a 32-byte long byte array.

        So what is pake message read from the mailbox?

        TODO

      • err = collector.waitFor(&answer): Wait for receiver to enter Y to confirm. The answer contains a OK message

      • A cryptor (type=transportCryptor) is created.

          cryptor := newTransportCryptor(conn, transitKey, "transit_record_receiver_key", "transit_record_sender_key")

        recordSize := (1 << 14) // record size: 16384 byte (16kb)
        // chunk
        recordSlice := make([]byte, recordSize-secretbox.Overhead)
        hasher := sha256.New()

        conn is a net.TCPConn TCP connection.

        A readKey and writeKey are generated with hkdf (HMAC-based Extract-and-Expand Key Derivation Function) from transitKey and two strings in newTransportCryptor.

        transitKey is derived from clientProto.sharedKey and appID.

        transitKey := deriveTransitKey(clientProto.sharedKey, appID)

        sharedKey is a 32-byte long key generated by clientProto (a pake.Client).

        func newTransportCryptor(c net.Conn, transitKey []byte, readPurpose, writePurpose string) *transportCryptor {
        r := hkdf.New(sha256.New, transitKey, nil, []byte(readPurpose))
        var readKey [32]byte
        _, err := io.ReadFull(r, readKey[:])
        if err != nil {
        panic(err)
        }

        r = hkdf.New(sha256.New, transitKey, nil, []byte(writePurpose))
        var writeKey [32]byte
        _, err = io.ReadFull(r, writeKey[:])
        if err != nil {
        panic(err)
        }

        return &transportCryptor{
        conn: c,
        prefixBuf: make([]byte, 4+crypto.NonceSize),
        nextReadNonce: big.NewInt(0),
        readKey: readKey,
        writeKey: writeKey,
        }
        }

        recordSize is 16384 byte (16kb), used to read file in chunks.

        hasher is compute file hash while reading file.

      • In the following loop, file is read and sent in chunks.

        r has type io.Reader. Every time 16KB is read.

        cryptor.writeRecord encrypts the bytes and send the bytes.

        for {
        n, err := r.Read(recordSlice)
        if n > 0 {
        hasher.Write(recordSlice[:n])
        err = cryptor.writeRecord(recordSlice[:n]) // send 16KB in each iteration
        if err != nil {
        sendErr(err)
        return
        }
        progress += int64(n)
        if options.progressFunc != nil {
        options.progressFunc(progress, totalSize)
        }
        }
        if err == io.EOF {
        break
        } else if err != nil {
        sendErr(err)
        return
        }
        }

        Let's see how writeRecord works.

        package secretbox ("golang.org/x/crypto/nacl/secretbox") is used to encrypt data.

        d.conn.Write sends the encrypted data out.

        func (d *transportCryptor) writeRecord(msg []byte) error {
        var nonce [crypto.NonceSize]byte

        if d.nextWriteNonce == math.MaxUint64 {
        panic("Nonce exhaustion")
        }

        binary.BigEndian.PutUint64(nonce[crypto.NonceSize-8:], d.nextWriteNonce)
        d.nextWriteNonce++

        sealedMsg := secretbox.Seal(nil, msg, &nonce, &d.writeKey)

        nonceAndSealedMsg := append(nonce[:], sealedMsg...)

        // we do an explit cast to int64 to avoid compilation failures
        // for 32bit systems.
        nonceAndSealedMsgSize := int64(len(nonceAndSealedMsg))

        if nonceAndSealedMsgSize >= math.MaxUint32 {
        panic(fmt.Sprintf("writeRecord too large: %d", len(nonceAndSealedMsg)))
        }

        l := make([]byte, 4)
        binary.BigEndian.PutUint32(l, uint32(len(nonceAndSealedMsg)))

        lenNonceAndSealedMsg := append(l, nonceAndSealedMsg...)

        _, err := d.conn.Write(lenNonceAndSealedMsg)
        return err
        }

S3 SDK with Rust (Manual Credential Configuration)

· 5 min read
Huakun Shen
Website Owner

First of all, I have to say, I am very disappointed with AWS's documentation. They do have many documentation and sample code, but I am still unable to find what I was looking for (easily).

Intro

I was working on a project that requires using Rust to upload files to AWS S3. I wanted to use Rest API to do this, but could not find enough information from the documentation. There is no sample code or something like a postman API doc that allows you to generate client code from a Rest API.

For example, in this API doc on PutObject, https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html.

Authorization:authorization string doesn't mean anything to me. I have access key and secret, but there must be a way to get this authorization string. I am pretty sure it exists, and must be somewhere in the docs. I just couldn't find it. Put a link in the documentation isn't hard. It saves people time from looking through your entire documentation. The purpose of World Wide Web is to link things together, instead of look for things separately and try to assemble in clients' head.

Then I switched to Rust SDK. They have plenty of documentation and sample code; but I got stuck on one problem for a long time. Again, authorization. The documentation and sample code always assume you have the same scenario as they do. They assume you have a ~/.aws/credentials file with your access key id and secret. Sample code always loads credentials automatically from default locations or environment variables, which is fine for a server application. For client-side software, this doesn't hold. I need to explicitly pass credentials to a function to generate a client. This is possible and documented for both Python and Nodejs version of the doc, but not for Rust.

I had to go over so many documentation and sample code to figure out how to do this naive thing. Function from another Rust crate (package) has to be used. aws_types.

Basically, there are many different ways to produce credentials and client; but for someone without prior knowledge about your nasty design, there is no way to know which package I should find what the method needed. If you decide to put things in different packages, then at least provide an obvious link somewhere to indicate "You have the option to do blah blah, read the docs here".

Reading AWS docs (Rust) is like browse information everywhere and try to assemble in my head. Without enough prior knowledge, it's not easy to get things done quickly.

Details

Comparison

When I google "AWS s3 python client credential loading", the first link gives me what I need: Passing credentials as parameters . Took me 10 seconds to find the answer.

For Nodejs, it took me ~10 minutes. To find docs and examples everywhere. This is how I found the solution eventually.

  1. Google "aws s3 create nodejs client with credentials"
  2. Found S3 Client - AWS SDK for JavaScript v3, the JS package API docs
  3. S3Client class API
  4. Constructor API
  5. S3ClientConfig Interface API
  6. Properties -> Credentials Type
  7. AwsCredentialIdentity (Type of credentials property)
  8. Finally found that this is where to pass in accessKeyId, expiration, secretAccessKey, sessionToken.

This is no different from browsing source code. It's important developers has the ability to read source code and API docs. That doesn't mean the docs provider don't need to provide easy access to the most basic functionalities.

At least I could figure out Nodejs solution within 20 minutes. Took me a few hours to figure out the Rust solution.

Final Words

  • Also, why is documentation and examples everywhere? aws.amazon.com, github.com, and external websites like S3 Client - AWS SDK for JavaScript v3, (different for every language).
  • It's OK to have external API docs as each language have their own platforms. Like rust docs for rust crates.
  • But you should have a central place for links to everywhere and a easy-to-use search utility.
  • Could you put everything in one place and provide a search utility to search everything?
  • Like what you have in JS API docs
  • If your example is on GitHub, it's not that to search through the source code