知思辨行

2018年2月12日星期一

General flow of TensorFlow algorithms

1. Import or generate datasets

All of the machine-learning algorithms will depend on datasets. Datasets can either be generated, or chosen to use an outside source. Sometimes it is better to rely on generated data if you main aim is just to know the expected outcome. Most of the time, several public datasets can be found on the internet.

2. Transform and normalize data

Normally, the input datasets do not come in the shape TensorFlow would expect, so we need to transform them to the TensorFlow accepted shape. The data is usually not in the correct dimension or type that the algorithms are expecting. We will have to transform our data before we can use it. Most algorithms also expect normalized data too.

TensorFlow has built-in functions that can normalize the data as follows:

data = tf.nn.batch_norm_with_global_normalization(…)

3. Partition datasets into train, test, and validation sets

We generally want to test our algorithms on different sets that we have trained on. Also, many algorithms require hyperparameter tuning, so we set aside a validation set for determining the best set of hyperparameters.

4. Set algorithm parameters (hyperparameters)

Our algorithms usually have a set of parameters that we hold constant throughout the procedure. For example, it can be the number of iterations, the learning rate, or other fixed parameters of our choosing. It is considered good form to initialize these together to keep the consistancy through examples, like:

learning_rate = 0.01

batch_size = 100

iterations = 1000

5. Initialize variables and placeholders

TensorFlow depends strongly on knowing what it can and cannot modify. It will modify/adjust the variables and weight/bias during optimization to minimize a loss function. To accomplish this, we feed in data through placeholders. We need to initialize both of these variables and placeholders

with size and type, so that TensorFlow knows what to expect. TensorFlow also needs to know the type of data to expect eg. float32, float64 and float16 . Note that more bytes in precision result in slower algorithms, but the less results in less precision. See the following code as a small example:

a_var = tf.constant(42)

x_input = tf.placeholder(tf.float32, [None, input_size])

y_input = tf.placeholder(tf.float32, [None, num_classes])

6. Define the model structure

After we have the data, and have initialized our variables and placeholders, we have to define the model. This is done by building a computational graph. TensorFlow chooses what operations and values must be the variables and placeholders to arrive at our model outcomes. For example a linear

model looks like:

y_pred = tf.add(tf.mul(x_input, weight_matrix), b_matrix)

7. Declare the loss functions

After defining the model, we must be able to evaluate the output. This is where we declare the loss function. The loss function is very important as it tells us how far off our predictions are from the actual values. The different types of loss functions are explored in greater detail later such as implementing the Back Propagation recipe in a TensorFlow way like below:

loss = tf.reduce_mean(tf.square(y_actual – y_pred))

8. Initialize and train the model

Now that we have everything in place, we need to create an instance of our graph, feed in the data through the placeholders, and let TensorFlow change the variables to better predict our training data. One way to initialize the computational graph is:

tf.Session(graph=graph) as session:

...

session.run(...)

...

which is same as:

session = tf.Session(graph=graph)

session.run(...)

9. Evaluate the model

Once we have built and trained the model, we should evaluate the model by looking at how well it does with new data through some specified criteria. We evaluate on the train and test set and these evaluations will allow us to see if the model is underfit or overfit.

10. Tune hyperparameters

Most of the time, we will want to go back and change some of the hyperparamters, based on the model performance. We then repeat the previous steps with different hyperparameters and evaluate the model on the validation set.

11. Deploy/predict new outcomes

It is also important to know how to make predictions on new, unseen, data. We can do this with all of our models, once we have them trained.

How TensorFlow works:

In TensorFlow, we have to set up the data, variables, placeholders, and model before we tell the program to train and change the variables to improve the predictions. TensorFlow accomplishes this through the computational graphs. These computational graphs are a directed graphs with no recursion, which allows for computational parallelism. We create a loss function for TensorFlow to minimize. TensorFlow accomplishes this by modifying the variables in the computational graph. Tensorflow knows how to modify the variables because it keeps track of the computations in the model and automatically computes the gradients for every variable. Because of this, we can see how easy it can be to make changes and try different data sources.

From << TensorFlow Machine Learning Cookbook >>

2018年1月2日星期二

Veeam Backup for Microsoft Office365 - version 1.0.0.912

Installation Process of Veeam Backup for Microsoft Office365 (version 1.0.0.912)

- install the software version is now 1.0.0.912
- install software license
- install KB2331 fix to resolve the error "exchange version is not supported"
- add all organisations
- add mailbox backup job for each organisation and schedule time to run, at this stage In-Place Archive mailbox also takes 1 seat of license
- record the Veeam Reposity settings, retention settings
- configure email alerting for jobs
- create a summary of all jobs running

Q: will re-installation cause the previous repository(same folder) being disposed?

2017年12月31日星期日

DeepXplore Env Setup on macOS Sierra 10.12

A brief guide to run the code of DeepXplore:

1. Install Python

2. install virtualenv

3. In virtual env:

Install TensorFlow
install keras
activate keras for virtualenv to generate the keras.json file
set keras backend to be tensorflow
install Pillow
install h5py
install opencv-python
install Mimicus

4. download or clone the DeepXplore project from GitHub, and the Drebin data

5. Run below commands under each dataset directory:

MINIST: python gen_diff.py blackout 1 0.1 10 20 50 0

ImageNet: python gen_diff.py occl 1 0.1 10 20 50 0

Driving: python gen_diff.py light 1 0.1 10 20 50 0

PDF: python gen_diff.py 2 0.1 10 20 50 0

Drebin: python gen_diff.py 1 0.5 20 50 0

6. remember to check github's issues section for project updates.

some Fedora commands:
# yum install python-virtualenv
# virtualenv --python /usr/bin/python2.7 env (this will create a dir called env under your virtualenv environment, fedora default is ~)
# . env/bin/activate (activate the created env)
## to activate keras and get the keras.json file
$ python (this will activate keras)
>>> import keras
>>> quit()
## sme packages might be missing when install mimicus
# yum install freetype-devel
# python -m pip install pkgconfig

2017年12月30日星期六

Run J2EE with AWS EC2 & RDS

Process of setting up a web app running with Wildfly and PostgreSQL using AWS EC2 & RDS.

1. create an AWS account, and login (all AWS service uses below are applicable to free-tier users)

select your country & region to make sure the billing is correct

2. setup an EC2 instance

Instance Type: t2.micro
OS: Centos 7
Subnet: choose anyone in the dropdown list, but remember to pick the same one when setting up the subsequent RDS instance
Storage/Disk: select Magnetic or SSD, check the option of "Delete on Termination" to avoid extra charges
Security Group: allow minimum rules for access, port numbers needed: 22, 8080, 5432
Key Pair: download and change permission to 400
Connection: using ssh with key auth

3. setup the RDS instance

Instance Type: PostgreSQL (enable free-tier use only)
Storage: 20GB by default
DB instance identifier
Master user name (username of a postgres db user)
Availability: public
Zone/Subnet: same as EC2 instance's
Security Group: allow access of IPs that needs connection to this DB, eg. current IP, EC2's IP/Subnets(eg. 172.31.0.0/16), port 5432
Database Name
finish the setup
edit the instance properties -> Parameter Group -> New -> put a group name -> changing the property of "max_prepared_transactions" to the value of 300
connect to the RDS instance using ssh to test the db, command: psql --host=[your RDS instance's DNS] --dbname=[your database's name] --user=[your database's user]
you may need to reboot the RDS instance to apply any immediate changes

4. Configure the EC2 instance, install software

connect to EC2 using ssh & key pair using the centos user
switch to root user
change the instance timezone to reflect your region
install Java 8, setup $JAVA_HOME
install Apache Maven, setup mvn variables
install Wildfly 11
install git and folk the destined repository

5. Configure the git repository, and run it with Wildfly as a J2EE Project

change the start_webapp.sh to let Wildfly's variables inside reflect your instance's configurations, mainly $JBOSS_HOME
change Wildfly's standablone.xml for a datasource setting reflecting your RDS instance's, this includes the RDS's instance DNS, DB name, DB connection user name and password
change the start_webapp.sh add -b 0.0.0.0 to allow public access to Wildfly server
start Wildfly server using start_webapp.sh, and visit your project's home page through browser on port 8080

6. Stop the EC2 instance & RDS instance,

you may also delete these in case further charges apply, remember to check your billing dash board for fees & credits incurred.