2019独角兽企业重金招聘Python工程师标准>>>
Unsafe类在jdk 源码的多个类中用到,这个类的提供了一些绕开JVM的更底层功能,基于它的实现可以提高效率。但是,它是一把双刃剑:正如它的名字所预示的那样,它是Unsafe的,它所分配的内存需要手动free(不被GC回收)。Unsafe类,提供了JNI某些功能的简单替代:确保高效性的同时,使事情变得更简单。
这篇文章主要是以下文章的整理、翻译。
http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/
1. Unsafe API的大部分方法都是native实现,它由105个方法组成,主要包括以下几类:
(1)Info相关。主要返回某些低级别的内存信息:addressSize(), pageSize()
(2)Objects相关。主要提供Object和它的域操纵方法:allocateInstance(),objectFieldOffset()
(3)Class相关。主要提供Class和它的静态域操纵方法:staticFieldOffset(),defineClass(),defineAnonymousClass(),ensureClassInitialized()
(4)Arrays相关。数组操纵方法:arrayBaseOffset(),arrayIndexScale()
(5)Synchronization相关。主要提供低级别同步原语(如基于CPU的CAS(Compare-And-Swap)原语):monitorEnter(),tryMonitorEnter(),monitorExit(),compareAndSwapInt(),putOrderedInt()
(6)Memory相关。直接内存访问方法(绕过JVM堆直接操纵本地内存):allocateMemory(),copyMemory(),freeMemory(),getAddress(),getInt(),putInt()
2. Unsafe类实例的获取
Unsafe类设计只提供给JVM信任的启动类加载器所使用,是一个典型的单例模式类。它的实例获取方法如下:
public static Unsafe getUnsafe() {Class cc = sun.reflect.Reflection.getCallerClass(2);if (cc.getClassLoader() != null)throw new SecurityException("Unsafe");return theUnsafe;
}
非启动类加载器直接调用Unsafe.getUnsafe()方法会抛出SecurityException(具体原因涉及JVM类的双亲加载机制)。
解决办法有两个,其一是通过JVM参数-Xbootclasspath指定要使用的类为启动类,另外一个办法就是java反射了。
Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
Unsafe unsafe = (Unsafe) f.get(null);
通过将private单例实例暴力设置accessible为true,然后通过Field的get方法,直接获取一个Object强制转换为Unsafe。在IDE中,这些方法会被标志为Error,可以通过以下设置解决:
Preferences -> Java -> Compiler -> Errors/Warnings ->
Deprecated and restricted API -> Forbidden reference -> Warning
3. Unsafe类“有趣”的应用场景
(1)绕过类初始化方法。当你想要绕过对象构造方法、安全检查器或者没有public的构造方法时,allocateInstance()方法变得非常有用。
class A {private long a; // not initialized valuepublic A() {this.a = 1; // initialization}public long a() { return this.a; }
}
以下是构造方法、反射方法和allocateInstance()的对照
A o1 = new A(); // constructor
o1.a(); // prints 1A o2 = A.class.newInstance(); // reflection
o2.a(); // prints 1A o3 = (A) unsafe.allocateInstance(A.class); // unsafe
o3.a(); // prints 0
allocateInstance()根本没有进入构造方法,在单例模式时,我们似乎看到了危机。
(2)内存修改
内存修改在c语言中是比较常见的,在Java中,可以用它绕过安全检查器。
考虑以下简单准入检查规则:
class Guard {private int ACCESS_ALLOWED = 1;public boolean giveAccess() {return 42 == ACCESS_ALLOWED;}
}
在正常情况下,giveAccess总会返回false,但事情不总是这样
Guard guard = new Guard();
guard.giveAccess(); // false, no access// bypass
Unsafe unsafe = getUnsafe();
Field f = guard.getClass().getDeclaredField("ACCESS_ALLOWED");
unsafe.putInt(guard, unsafe.objectFieldOffset(f), 42); // memory corruptionguard.giveAccess(); // true, access granted
通过计算内存偏移,并使用putInt()方法,类的ACCESS_ALLOWED被修改。在已知类结构的时候,数据的偏移总是可以计算出来(与c++中的类中数据的偏移计算是一致的)。
(3)实现类似C语言的sizeOf()函数
通过结合Java反射和objectFieldOffset()函数实现一个C-like sizeOf()函数。
public static long sizeOf(Object o) {Unsafe u = getUnsafe();HashSet fields = new HashSet();Class c = o.getClass();while (c != Object.class) {for (Field f : c.getDeclaredFields()) {if ((f.getModifiers() & Modifier.STATIC) == 0) {fields.add(f);}}c = c.getSuperclass();}// get offsetlong maxSize = 0;for (Field f : fields) {long offset = u.objectFieldOffset(f);if (offset > maxSize) {maxSize = offset;}}return ((maxSize/8) + 1) * 8; // padding
}
算法的思路非常清晰:从底层子类开始,依次取出它自己和它的所有超类的非静态域,放置到一个HashSet中(重复的只计算一次,Java是单继承),然后使用objectFieldOffset()获得一个最大偏移,最后还考虑了对齐。
在32位的JVM中,可以通过读取class文件偏移为12的long来获取size。
public static long sizeOf(Object object){return getUnsafe().getAddress(normalize(getUnsafe().getInt(object, 4L)) + 12L);
}
其中normalize()函数是一个将有符号int转为无符号long的方法
private static long normalize(int value) {if(value >= 0) return value;return (0L >>> 32) & value;
}
两个sizeOf()计算的类的尺寸是一致的。最标准的sizeOf()实现是使用java.lang.instrument,但是,它需要指定命令行参数-javaagent。
(4)实现Java浅复制
标准的浅复制方案是实现Cloneable接口或者自己实现的复制函数,它们都不是多用途的函数。通过结合sizeOf()方法,可以实现浅复制。
static Object shallowCopy(Object obj) {long size = sizeOf(obj);long start = toAddress(obj);long address = getUnsafe().allocateMemory(size);getUnsafe().copyMemory(start, address, size);return fromAddress(address);
}
以下的toAddress()和fromAddress()分别将对象转换到它的地址以及相反操作。
static long toAddress(Object obj) {Object[] array = new Object[] {obj};long baseOffset = getUnsafe().arrayBaseOffset(Object[].class);return normalize(getUnsafe().getInt(array, baseOffset));
}static Object fromAddress(long address) {Object[] array = new Object[] {null};long baseOffset = getUnsafe().arrayBaseOffset(Object[].class);getUnsafe().putLong(array, baseOffset, address);return array[0];
}
以上的浅复制函数可以应用于任意java对象,它的尺寸是动态计算的。
(5)消去内存中的密码
密码字段存储在String中,但是,String的回收是受到JVM管理的。最安全的做法是,在密码字段使用完之后,将它的值覆盖。
Field stringValue = String.class.getDeclaredField("value");
stringValue.setAccessible(true);
char[] mem = (char[]) stringValue.get(password);
for (int i=0; i < mem.length; i++) {mem[i] = '?';
}
(6)动态加载类
标准的动态加载类的方法是Class.forName()(在编写jdbc程序时,记忆深刻),使用Unsafe也可以动态加载java 的class文件。
byte[] classContents = getClassContent();
Class c = getUnsafe().defineClass(null, classContents, 0, classContents.length);c.getMethod("a").invoke(c.newInstance(), null); // 1
getClassContent()方法,将一个class文件,读取到一个byte数组。private static byte[] getClassContent() throws Exception {File f = new File("/home/mishadoff/tmp/A.class");FileInputStream input = new FileInputStream(f);byte[] content = new byte[(int)f.length()];input.read(content);input.close();return content;
}
动态加载、代理、切片等功能中可以应用。
(7)包装受检异常为运行时异常。
getUnsafe().throwException(new IOException());
当你不希望捕获受检异常时,可以这样做(并不推荐)。
(8)快速序列化
标准的java Serializable速度很慢,它还限制类必须有public无参构造函数。Externalizable好些,它需要为要序列化的类指定模式。流行的高效序列化库,比如kryo依赖于第三方库,会增加内存的消耗。可以通过getInt(),getLong(),getObject()等方法获取类中的域的实际值,将类名称等信息一起持久化到文件。kryo有使用Unsafe的尝试,但是没有具体的性能提升的数据。(http://code.google.com/p/kryo/issues/detail?id=75)
(9)在非Java堆中分配内存
使用java 的new会在堆中为对象分配内存,并且对象的生命周期内,会被JVM GC管理。
class SuperArray {private final static int BYTE = 1;private long size;private long address;public SuperArray(long size) {this.size = size;address = getUnsafe().allocateMemory(size * BYTE);}public void set(long i, byte value) {getUnsafe().putByte(address + i * BYTE, value);}public int get(long idx) {return getUnsafe().getByte(address + idx * BYTE);}public long size() {return size;}
}
Unsafe分配的内存,不受Integer.MAX_VALUE的限制,并且分配在非堆内存,使用它时,需要非常谨慎:忘记手动回收时,会产生内存泄露;非法的地址访问时,会导致JVM崩溃。在需要分配大的连续区域、实时编程(不能容忍JVM延迟)时,可以使用它。java.nio使用这一技术。
(10)Java并发中的应用
通过使用Unsafe.compareAndSwap()可以用来实现高效的无锁数据结构。
class CASCounter implements Counter {private volatile long counter = 0;private Unsafe unsafe;private long offset;public CASCounter() throws Exception {unsafe = getUnsafe();offset = unsafe.objectFieldOffset(CASCounter.class.getDeclaredField("counter"));}@Overridepublic void increment() {long before = counter;while (!unsafe.compareAndSwapLong(this, offset, before, before + 1)) {before = counter;}}@Overridepublic long getCounter() {return counter;}
}
通过测试,以上数据结构与java的原子变量的效率基本一致,Java原子变量也使用Unsafe的compareAndSwap()方法,而这个方法最终会对应到cpu的对应原语,因此,它的效率非常高。这里有一个实现无锁HashMap的方案(http://www.azulsystems.com/about_us/presentations/lock-free-hash ,这个方案的思路是:分析各个状态,创建拷贝,修改拷贝,使用CAS原语,自旋锁),在普通的服务器机器(核心<32),使用ConcurrentHashMap(JDK8以前,默认16路分离锁实现,JDK8中ConcurrentHashMap已经使用无锁实现)明显已经够用。
//下面是sun.misc.Unsafe.java类源码
package sun.misc;
import java.lang.reflect.Field;
/**** This class should provide access to low-level operations and its* use should be limited to trusted code. Fields can be accessed using* memory addresses, with undefined behaviour occurring if invalid memory* addresses are given.* 这个类提供了一个更底层的操作并且应该在受信任的代码中使用。可以通过内存地址* 存取fields,如果给出的内存地址是无效的那么会有一个不确定的运行表现。* * @author Tom Tromey (tromey@redhat.com)* @author Andrew John Hughes (gnu_andrew@member.fsf.org)*/
public class Unsafe
{// Singleton class.private static Unsafe unsafe = new Unsafe();/**** Private default constructor to prevent creation of an arbitrary* number of instances.* 使用私有默认构造器防止创建多个实例*/private Unsafe(){}/**** Retrieve the singleton instance of <code>Unsafe</code>. The calling* method should guard this instance from untrusted code, as it provides* access to low-level operations such as direct memory access.* 获取<code>Unsafe</code>的单例,这个方法调用应该防止在不可信的代码中实例,* 因为unsafe类提供了一个低级别的操作,例如直接内存存取。* * @throws SecurityException if a security manager exists and prevents* access to the system properties.* 如果安全管理器不存在或者禁止访问系统属性*/public static Unsafe getUnsafe(){SecurityManager sm = System.getSecurityManager();if (sm != null)sm.checkPropertiesAccess();return unsafe;}/**** Returns the memory address offset of the given static field.* The offset is merely used as a means to access a particular field* in the other methods of this class. The value is unique to the given* field and the same value should be returned on each subsequent call.* 返回指定静态field的内存地址偏移量,在这个类的其他方法中这个值只是被用作一个访问* 特定field的一个方式。这个值对于 给定的field是唯一的,并且后续对该方法的调用都应该* 返回相同的值。** @param field the field whose offset should be returned.* 需要返回偏移量的field* @return the offset of the given field.* 指定field的偏移量*/public native long objectFieldOffset(Field field);/**** Compares the value of the integer field at the specified offset* in the supplied object with the given expected value, and updates* it if they match. The operation of this method should be atomic,* thus providing an uninterruptible way of updating an integer field.* 在obj的offset位置比较integer field和期望的值,如果相同则更新。这个方法* 的操作应该是原子的,因此提供了一种不可中断的方式更新integer field。* * @param obj the object containing the field to modify.* 包含要修改field的对象* @param offset the offset of the integer field within <code>obj</code>.* <code>obj</code>中整型field的偏移量* @param expect the expected value of the field.* 希望field中存在的值* @param update the new value of the field if it equals <code>expect</code>.* 如果期望值expect与field的当前值相同,设置filed的值为这个新值* @return true if the field was changed.* 如果field的值被更改*/public native boolean compareAndSwapInt(Object obj, long offset,int expect, int update);/**** Compares the value of the long field at the specified offset* in the supplied object with the given expected value, and updates* it if they match. The operation of this method should be atomic,* thus providing an uninterruptible way of updating a long field.* 在obj的offset位置比较long field和期望的值,如果相同则更新。这个方法* 的操作应该是原子的,因此提供了一种不可中断的方式更新long field。* * @param obj the object containing the field to modify.* 包含要修改field的对象 * @param offset the offset of the long field within <code>obj</code>.* <code>obj</code>中long型field的偏移量* @param expect the expected value of the field.* 希望field中存在的值* @param update the new value of the field if it equals <code>expect</code>.* 如果期望值expect与field的当前值相同,设置filed的值为这个新值* @return true if the field was changed.* 如果field的值被更改*/public native boolean compareAndSwapLong(Object obj, long offset,long expect, long update);/**** Compares the value of the object field at the specified offset* in the supplied object with the given expected value, and updates* it if they match. The operation of this method should be atomic,* thus providing an uninterruptible way of updating an object field.* 在obj的offset位置比较object field和期望的值,如果相同则更新。这个方法* 的操作应该是原子的,因此提供了一种不可中断的方式更新object field。* * @param obj the object containing the field to modify.* 包含要修改field的对象 * @param offset the offset of the object field within <code>obj</code>.* <code>obj</code>中object型field的偏移量* @param expect the expected value of the field.* 希望field中存在的值* @param update the new value of the field if it equals <code>expect</code>.* 如果期望值expect与field的当前值相同,设置filed的值为这个新值* @return true if the field was changed.* 如果field的值被更改*/public native boolean compareAndSwapObject(Object obj, long offset,Object expect, Object update);/**** Sets the value of the integer field at the specified offset in the* supplied object to the given value. This is an ordered or lazy* version of <code>putIntVolatile(Object,long,int)</code>, which* doesn't guarantee the immediate visibility of the change to other* threads. It is only really useful where the integer field is* <code>volatile</code>, and is thus expected to change unexpectedly.* 设置obj对象中offset偏移地址对应的整型field的值为指定值。这是一个有序或者* 有延迟的<code>putIntVolatile</cdoe>方法,并且不保证值的改变被其他线程立* 即看到。只有在field被<code>volatile</code>修饰并且期望被意外修改的时候* 使用才有用。* * @param obj the object containing the field to modify.* 包含需要修改field的对象* @param offset the offset of the integer field within <code>obj</code>.* <code>obj</code>中整型field的偏移量* @param value the new value of the field.* field将被设置的新值* @see #putIntVolatile(Object,long,int)*/public native void putOrderedInt(Object obj, long offset, int value);/**** Sets the value of the long field at the specified offset in the* supplied object to the given value. This is an ordered or lazy* version of <code>putLongVolatile(Object,long,long)</code>, which* doesn't guarantee the immediate visibility of the change to other* threads. It is only really useful where the long field is* <code>volatile</code>, and is thus expected to change unexpectedly.* 设置obj对象中offset偏移地址对应的long型field的值为指定值。这是一个有序或者* 有延迟的<code>putLongVolatile</cdoe>方法,并且不保证值的改变被其他线程立* 即看到。只有在field被<code>volatile</code>修饰并且期望被意外修改的时候* 使用才有用。* * @param obj the object containing the field to modify.* 包含需要修改field的对象* @param offset the offset of the long field within <code>obj</code>.* <code>obj</code>中long型field的偏移量* @param value the new value of the field.* field将被设置的新值* @see #putLongVolatile(Object,long,long)*/public native void putOrderedLong(Object obj, long offset, long value);/**** Sets the value of the object field at the specified offset in the* supplied object to the given value. This is an ordered or lazy* version of <code>putObjectVolatile(Object,long,Object)</code>, which* doesn't guarantee the immediate visibility of the change to other* threads. It is only really useful where the object field is* <code>volatile</code>, and is thus expected to change unexpectedly.* 设置obj对象中offset偏移地址对应的object型field的值为指定值。这是一个有序或者* 有延迟的<code>putObjectVolatile</cdoe>方法,并且不保证值的改变被其他线程立* 即看到。只有在field被<code>volatile</code>修饰并且期望被意外修改的时候* 使用才有用。** @param obj the object containing the field to modify.* 包含需要修改field的对象* @param offset the offset of the object field within <code>obj</code>.* <code>obj</code>中long型field的偏移量* @param value the new value of the field.* field将被设置的新值*/public native void putOrderedObject(Object obj, long offset, Object value);/**** Sets the value of the integer field at the specified offset in the* supplied object to the given value, with volatile store semantics.* 设置obj对象中offset偏移地址对应的整型field的值为指定值。支持volatile store语义* * @param obj the object containing the field to modify.* 包含需要修改field的对象* @param offset the offset of the integer field within <code>obj</code>.* <code>obj</code>中整型field的偏移量* @param value the new value of the field.* field将被设置的新值*/public native void putIntVolatile(Object obj, long offset, int value);/**** Retrieves the value of the integer field at the specified offset in the* supplied object with volatile load semantics.* 获取obj对象中offset偏移地址对应的整型field的值,支持volatile load语义。* * @param obj the object containing the field to read.* 包含需要去读取的field的对象* @param offset the offset of the integer field within <code>obj</code>.* <code>obj</code>中整型field的偏移量*/public native int getIntVolatile(Object obj, long offset);/**** Sets the value of the long field at the specified offset in the* supplied object to the given value, with volatile store semantics.* 设置obj对象中offset偏移地址对应的long型field的值为指定值。支持volatile store语义** @param obj the object containing the field to modify.* 包含需要修改field的对象* @param offset the offset of the long field within <code>obj</code>.* <code>obj</code>中long型field的偏移量* @param value the new value of the field.* field将被设置的新值* @see #putLong(Object,long,long)*/public native void putLongVolatile(Object obj, long offset, long value);/**** Sets the value of the long field at the specified offset in the* supplied object to the given value.* 设置obj对象中offset偏移地址对应的long型field的值为指定值。* * @param obj the object containing the field to modify.* 包含需要修改field的对象* @param offset the offset of the long field within <code>obj</code>.* <code>obj</code>中long型field的偏移量* @param value the new value of the field.* field将被设置的新值* @see #putLongVolatile(Object,long,long)*/public native void putLong(Object obj, long offset, long value);/**** Retrieves the value of the long field at the specified offset in the* supplied object with volatile load semantics.* 获取obj对象中offset偏移地址对应的long型field的值,支持volatile load语义。* * @param obj the object containing the field to read.* 包含需要去读取的field的对象* @param offset the offset of the long field within <code>obj</code>.* <code>obj</code>中long型field的偏移量* @see #getLong(Object,long)*/public native long getLongVolatile(Object obj, long offset);/**** Retrieves the value of the long field at the specified offset in the* supplied object.* 获取obj对象中offset偏移地址对应的long型field的值* * @param obj the object containing the field to read.* 包含需要去读取的field的对象* @param offset the offset of the long field within <code>obj</code>.* <code>obj</code>中long型field的偏移量* @see #getLongVolatile(Object,long)*/public native long getLong(Object obj, long offset);/**** Sets the value of the object field at the specified offset in the* supplied object to the given value, with volatile store semantics.* 设置obj对象中offset偏移地址对应的object型field的值为指定值。支持volatile store语义* * @param obj the object containing the field to modify.* 包含需要修改field的对象* @param offset the offset of the object field within <code>obj</code>.* <code>obj</code>中object型field的偏移量* @param value the new value of the field.* field将被设置的新值* @see #putObject(Object,long,Object)*/public native void putObjectVolatile(Object obj, long offset, Object value);/**** Sets the value of the object field at the specified offset in the* supplied object to the given value.* 设置obj对象中offset偏移地址对应的object型field的值为指定值。* * @param obj the object containing the field to modify.* 包含需要修改field的对象* @param offset the offset of the object field within <code>obj</code>.* <code>obj</code>中object型field的偏移量* @param value the new value of the field.* field将被设置的新值* @see #putObjectVolatile(Object,long,Object)*/public native void putObject(Object obj, long offset, Object value);/**** Retrieves the value of the object field at the specified offset in the* supplied object with volatile load semantics.* 获取obj对象中offset偏移地址对应的object型field的值,支持volatile load语义。* * @param obj the object containing the field to read.* 包含需要去读取的field的对象* @param offset the offset of the object field within <code>obj</code>.* <code>obj</code>中object型field的偏移量*/public native Object getObjectVolatile(Object obj, long offset);/**** Returns the offset of the first element for a given array class.* To access elements of the array class, this value may be used along with* with that returned by * <a href="#arrayIndexScale"><code>arrayIndexScale</code></a>,* if non-zero.* 获取给定数组中第一个元素的偏移地址。* 为了存取数组中的元素,这个偏移地址与<a href="#arrayIndexScale"><code>arrayIndexScale* </code></a>方法的非0返回值一起被使用。* @param arrayClass the class for which the first element's address should* be obtained.* 第一个元素地址被获取的class* @return the offset of the first element of the array class.* 数组第一个元素 的偏移地址* @see arrayIndexScale(Class)*/public native int arrayBaseOffset(Class arrayClass);/**** Returns the scale factor used for addressing elements of the supplied* array class. Where a suitable scale factor can not be returned (e.g.* for primitive types), zero should be returned. The returned value* can be used with * <a href="#arrayBaseOffset"><code>arrayBaseOffset</code></a>* to access elements of the class.* 获取用户给定数组寻址的换算因子.一个合适的换算因子不能返回的时候(例如:基本类型),* 返回0.这个返回值能够与<a href="#arrayBaseOffset"><code>arrayBaseOffset</code>* </a>一起使用去存取这个数组class中的元素* * @param arrayClass the class whose scale factor should be returned.* @return the scale factor, or zero if not supported for this array class.*/public native int arrayIndexScale(Class arrayClass);/**** Releases the block on a thread created by * <a href="#park"><code>park</code></a>. This method can also be used* to terminate a blockage caused by a prior call to <code>park</code>.* This operation is unsafe, as the thread must be guaranteed to be* live. This is true of Java, but not native code.* 释放被<a href="#park"><code>park</code></a>创建的在一个线程上的阻塞.这个* 方法也可以被使用来终止一个先前调用<code>park</code>导致的阻塞.* 这个操作操作时不安全的,因此线程必须保证是活的.这是java代码不是native代码。* @param thread the thread to unblock.* 要解除阻塞的线程*/public native void unpark(Thread thread);/**** Blocks the thread until a matching * <a href="#unpark"><code>unpark</code></a> occurs, the thread is* interrupted or the optional timeout expires. If an <code>unpark</code>* call has already occurred, this also counts. A timeout value of zero* is defined as no timeout. When <code>isAbsolute</code> is* <code>true</code>, the timeout is in milliseconds relative to the* epoch. Otherwise, the value is the number of nanoseconds which must* occur before timeout. This call may also return spuriously (i.e.* for no apparent reason).* 阻塞一个线程直到<a href="#unpark"><code>unpark</code></a>出现、线程* 被中断或者timeout时间到期。如果一个<code>unpark</code>调用已经出现了,* 这里只计数。timeout为0表示永不过期.当<code>isAbsolute</code>为true时,* timeout是相对于新纪元之后的毫秒。否则这个值就是超时前的纳秒数。这个方法执行时* 也可能不合理地返回(没有具体原因)* * @param isAbsolute true if the timeout is specified in milliseconds from* the epoch.* 如果为true timeout的值是一个相对于新纪元之后的毫秒数* @param time either the number of nanoseconds to wait, or a time in* milliseconds from the epoch to wait for.* 可以是一个要等待的纳秒数,或者是一个相对于新纪元之后的毫秒数直到* 到达这个时间点*/public native void park(boolean isAbsolute, long time);
}