How to decode JSON string in PHP or Java

I am using the cloud string variable to transmit 622 bytes of binary data. On encode I make sure I am only using byte values 1-255 to not compromise the string. My question is now how to decode the JSON string back to the binary data. This are some of the JSON encoded return messages:

{
  "cmd": "VarReturn",
  "name": "data",
  "result": "����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������dddddedddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddeddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddedddddddddddddddddddddddddddddddddddeddddddddddddddddd",
  "coreInfo": {
    "last_app": "",
    "last_heard": "2015-01-26T18:49:16.850Z",
    "connected": true,
    "deviceID": "54ff6c066672524835431267"
  }

the length of the string is 979 characters. how Do I convert it back to the original 622 bytes?

Have a look at this thread with a similar problem

From that post:

After doing some searching it seems that the byte string is being encoded into (most likely) UTF-8, can only handle values up to 128 for a single byte.

From Python 2.7.9 docs:

UTF-8 uses the following rules:
1) If the code point is <128, it’s represented by the corresponding byte value.
2) If the code point is between 128 and 0x7ff, it’s turned into two byte values between 128 and 255.
3) Code points >0x7ff are turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255.

not sure how to apply this and it does not match what I get below. Is there a lib function doing this? Here is another sample:

    {
      "cmd": "VarReturn",
      "name": "data",
  "result": "\u0001\u0001\u0002\u0003\u0003\u0004\u0005\u0005\u0006\u0007\b\b\t\n\n\u000b\f\r\r\u000e\u000f\u000f\u0010\u0011\u0011\u0012\u0013\u0014\u0014\u0015\u0016\u0016\u0017\u0018\u0019\u0019\u001a\u001b\u001b\u001c\u001d\u001e\u001e\u001f  !\"\"#$%%&''()**+,,-../01123345667889:;;<==>??@ABBCDDEFGGHIIJKKLMNNOPPQRSSTUUVWXXYZZ[\\\\]^__`aabcddeffghhijkklmmnoppqrrstuuvwwxyyz{||}~~�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������~~}||{zyyxwwvuutsrrqpponmmlkkjihhgffeddcbaa`__^]\\\\[ZZYXXWVUUTSSRQPPONNMLKKJIIHGGFEDDCBBA@??>==<;;:98876654332110/..-,,+**)(''&%%$#\"\"!  \u001f\u001e\u001e\u001d\u001c\u001b\u001b\u001a\u0019\u0019\u0018\u0017\u0016\u0016\u0015\u0014\u0014\u0013\u0012\u0011\u0011\u0010\u000f\u000f\u000e\r\r\f\u000b\n\n\t\b\b\u0007\u0006\u0005\u0005\u0004\u0003\u0003\u0002\u0001",
      "coreInfo": {
        "last_app": "",
        "last_heard": "2015-01-26T21:41:16.796Z",
        "connected": true,
        "deviceID": "54ff6c066672524835431267"
      }

It seems it starts with u\0001 till u\001f and then uses the actual ASCII character?

I now generated a string with 255 characters 0x01- 0xff and requested it back from the cloud. Very strange results, here are the real values reaching the socket:

\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~������������� ... truncated

and now displayed as byte values in brackets:

 \(92) u(117) 0(48) 0(48) 0(48) 1(49) \(92) u(117) 0(48) 0(48) 0(48) 2(50) \(92) u(117) 0(48) 0(48) 0(48) 3(51) \(92) u(117) 0(48) 0(48) 0(48) 4(52) \(92) u(117) 0(48) 0(48) 0(48) 5(53) \(92) u(117) 0(48) 0(48) 0(48) 6(54) \(92) u(117) 0(48) 0(48) 0(48) 7(55) \(92) b(98) \(92) t(116) \(92) n(110) \(92) u(117) 0(48) 0(48) 0(48) b(98) \(92) f(102) \(92) r(114) \(92) u(117) 0(48) 0(48) 0(48) e(101) \(92) u(117) 0(48) 0(48) 0(48) f(102) \(92) u(117) 0(48) 0(48) 1(49) 0(48) \(92) u(117) 0(48) 0(48) 1(49) 1(49) \(92) u(117) 0(48) 0(48) 1(49) 2(50) \(92) u(117) 0(48) 0(48) 1(49) 3(51) \(92) u(117) 0(48) 0(48) 1(49) 4(52) \(92) u(117) 0(48) 0(48) 1(49) 5(53) \(92) u(117) 0(48) 0(48) 1(49) 6(54) \(92) u(117) 0(48) 0(48) 1(49) 7(55) \(92) u(117) 0(48) 0(48) 1(49) 8(56) \(92) u(117) 0(48) 0(48) 1(49) 9(57) \(92) u(117) 0(48) 0(48) 1(49) a(97) \(92) u(117) 0(48) 0(48) 1(49) b(98) \(92) u(117) 0(48) 0(48) 1(49) c(99) \(92) u(117) 0(48) 0(48) 1(49) d(100) \(92) u(117) 0(48) 0(48) 1(49) e(101) \(92) u(117) 0(48) 0(48) 1(49) f(102)  (32) !(33) \(92) "(34) #(35) $(36) %(37) &(38) '(39) ((40) )(41) *(42) +(43) ,(44) -(45) .(46) /(47) 0(48) 1(49) 2(50) 3(51) 4(52) 5(53) 6(54) 7(55) 8(56) 9(57) :(58) ;(59) <(60) =(61) >(62) ?(63) @(64) A(65) B(66) C(67) D(68) E(69) F(70) G(71) H(72) I(73) J(74) K(75) L(76) M(77) N(78) O(79) P(80) Q(81) R(82) S(83) T(84) U(85) V(86) W(87) X(88) Y(89) Z(90) [(91) \(92) \(92) ](93) ^(94) _(95) `(96) a(97) b(98) c(99) d(100) e(101) f(102) g(103) h(104) i(105) j(106) k(107) l(108) m(109) n(110) o(111) p(112) q(113) r(114) s(115) t(116) u(117) v(118) w(119) x(120) y(121) z(122) {(123) |(124) }(125) ~(126) (127) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ﾿(191) ᄑ(189) ￯(239) ... truncated

all values above 127 are lost and made into the sequence (239)(191)(189). This is really bad because I wanted to use this to transmit binary data.
Can this be fixed somehow?

You binary data is getting encoded:

\u0001 = 0x01
\u0002 = 0x02

\b = 0x08
\t = 0x09
\n = 0x0A
etc.

You should encode your binary data before sending it using only printable ASCII characters. There are a number of schemes for doing this but base64 encoding is probably the most common. That way the other side of your connection should have an easy time converting.

@bko: thanks for confirming that the string cloud function only transmits the lower 7-bit ASCII table. Using base64 is in my case not the ideal option because it uses only 6-bits. I will encode my data into 0x01-0x7f then.

Hello,
When you send a String through the spark cloud it gets encoded to UTF-8. This encoding only works for byte values between 0x00 and 0x7f (or 127 in decimal). When it encodes a value higher than 127, it turns it into multiple bytes to conform to the standard.

If possible, I would highly suggest you use a TCP server or TCP Client (depending on what you need). These are really easy to setup, are much quicker than using the cloud, and do not encode anything; Which means you can use any byte values you see fit.

Hope this helps!

Unfortunalety all values higher than 127 are mapped to the same sequence (239)(191)(189) as written further up in my post. TCP is only an option for the local network and will not work in the WAN. I need a solution that works outside the local network as well. It is kludgy but I am now using 7 bit encoding and can live with that for now.