Markov movie critic - part 2 - tokenization
We’ll continue the plan to rate movies with Markov chains.
This time we’ll tokenize the input.
We’ll continue the plan to rate movies with Markov chains.
This time we’ll tokenize the input.
Testing is an important part of writing an application. There are many decisions to make about what, how and when to test. It often helps me to think of the costs and values of (potential) tests to reason or talk about them.
This text will explain what these costs and values are and some of the guidelines I derived from this.
When playing around in Polymer, we encounter iron signals. Besides normal javascript events, this gives us a lot of options for dealing with events. Lets try it all out! We can catch events using the on-event attribute:
1
This is the first of a series of posts. Where we will use machine learning to rate movies. For this task we're not going to watch all the movies. I assume it's good enough to just read the plot. We'll use Markov chains to rate the movies and as an added bonus we can also generate new movie plots for awesome (or terrible) movies. In this first part we'll get the data and change it into a more usable format. We can use the data from IMDB, which is published on ftp://ftp.fu-berlin.de/pub/misc/movies/database/. Of interest are the plots and the ratings.
Plots look like this:
Today I'll show how you can create a simple stubserver with Drakov. If you do some frontend programming, you've probably already installed npm (Node Package Manager), otherwise here is how you install that. Then with npm you can install Drakov.
$ sudo npm install -g drakov
In object-oriented languages, like Java, this refers to the instance of the class where you run the method. In javascript this is often also the case, but not always. In this post we'll explore some situations. And I give some tips on how to deal with them. The normal case. The function eat is defined on carrot. And we simply run it. this will refer to the enclosing object, which is carrot, with name "carrot".
var carrot = {
name: "carrot",
eat: function () {
console.log(this);
console.log("eating " + this.name);
}
};
carrot.eat(); //result: eating carrot
When you want to limit the amount of messages an actor gets, you can use the throttler in akka-contrib. This will let you limit the max transactions per second(tps). It will queue up the surplus. Here I'll describe another way. I'll reject all the surplus messages. This has the advantage that the requester knows it's sending too much and can act on that. Both methods have their advantages. And both have limits, since they still require resources to queue or reject the messages. In Akka we can create an Actor that sends messages through to the target actor, or rejects them when it exceeds the specified tps.
object ThrottleActor {
object OneSecondLater
object Accepted
object ExceededMaxTps
}
import ThrottleActor._
class ThrottleActor (target: ActorRef, maxTps: Int) extends Actor with ActorLogging {
implicit val executionContext: ExecutionContext = context.dispatcher
context.system.scheduler.schedule(1.second, 1.second, self, OneSecondLater)
var messagesThisSecond: Int = 0
def receive = {
case OneSecondLater =>
log.info(s"OneSecondLater ${DateTime.now} $messagesThisSecond requests.")
messagesThisSecond = 0
case message if messagesThisSecond >= maxTps =>
sender ! ExceededMaxTps
messagesThisSecond += 1
log.info(s"ExceededMaxTps ${DateTime.now} $messagesThisSecond requests.")
case message =>
sender ! Accepted
target ! message
messagesThisSecond += 1
}
}
Iterating over a map is slightly more complex than over other collections, because a Map is the combination of 2 collections. The keys and the values.
val m: Map[Int, String] = Map(1 -> "a", 2 -> "b")
m.keys // = Iterable[Int] = Set(1, 2)
m.values // = Iterable[String] = MapLike("a", "b")
When we have multiple Options and only want to do something when they're all set. In this example we have a property file with multiple configurations for one thing. A host and a port, we only want to use them if they're both set.
//The individual properties
val stubHost: Option[String] = Some("host")
val stubPort: Option[Int] = Some(8090)
//The case class I'll turn them into
case class StubConfig(host: String, port: Int)
We can use the Spray JSON parser for uses other than a REST API. We add spray-json to our dependencies. Our build.gradle:
apply plugin: 'scala'
version = '1.0'
repositories {
mavenCentral()
}
dependencies {
compile group: 'io.spray', name: 'spray-json_2.11', version: '1.3.1'
}
Both the JVM and keytool have problems dealing with keystores without a password. If you try to get a listing of the keystore it will think you didn't provide a password and output falsehoods:
$ keytool -list -storetype pkcs12 -keystore keystoreWithoutPassword.p12
Enter keystore password:
***************** WARNING WARNING WARNING *****************
* The integrity of the information stored in your keystore *
* has NOT been verified! In order to verify its integrity, *
* you must provide your keystore password. *
***************** WARNING WARNING WARNING *****************
Keystore type: PKCS12
Keystore provider: SunJSSE
Your keystore contains 1 entry
tammo, Oct 14, 2015, SecretKeyEntry,
In a previous blog post we made an API with spray. Now we're going to load test it. For this, we will use http://gatling.io/#/. In a scala class we can write exactly what and how we want to run the test. In this test, we will do a post to our API and create a new robot called C3PO. We will do this 1000 times per second and keep doing this for 10 seconds. For a total of 10000 C3POs! RobotsLoadTest.scala:
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
class RobotsLoadTest extends Simulation {
val baseUrl = "http://localhost:8080" //We need to have our API running here
val httpProtocol = http
.baseURL(baseUrl)
.inferHtmlResources()
.acceptEncodingHeader("gzip,deflate")
.contentTypeHeader("application/json")
.userAgentHeader("Apache-HttpClient/4.1.1 (java 1.5)")
val s = scenario("Simulation")
.exec(http("request_0")
.post("/robots")
.body(StringBody("""{
| "name": "C3PO",
| "amountOfArms": 2
|}""".stripMargin))
)
setUp(s.inject(constantUsersPerSec(1000) during(10 seconds))).protocols(httpProtocol)
}